Analytics#
Flyte is ideal for data cleaning, statistical summarization, and plotting
because with flytekit
you can leverage the rich Python ecosystem of data
processing and visualization tools.
Cleaning Data#
In this example, we’re going to analyze some covid vaccination data:
import pandas as pd
import plotly
import plotly.graph_objects as go
from flytekit import Deck, task, workflow, Resources
@task(requests=Resources(mem="1Gi"))
def clean_data() -> pd.DataFrame:
"""Clean the dataset."""
df = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
filled_df = (
df.sort_values(["people_vaccinated"], ascending=False)
.groupby("location")
.first()
.reset_index()
)[["location", "people_vaccinated", "population", "date"]]
return filled_df
As you can see, we’re using pandas
for data processing, and in the task
below we use plotly
to create a choropleth map of the percent of a country’s
population that has received at least one COVID-19 vaccination.
Rendering Plots#
We can use Flyte Decks for rendering a static HTML report
of the map. In this case, we normalize the people_vaccinated
by the
population
count of each country:
@task(disable_deck=False)
def plot(df: pd.DataFrame):
"""Render a Choropleth map."""
df["text"] = df["location"] + "<br>" + "Last updated on: " + df["date"]
fig = go.Figure(
data=go.Choropleth(
locations=df["location"],
z=df["people_vaccinated"].astype(float) / df["population"].astype(float),
text=df["text"],
locationmode="country names",
colorscale="Blues",
autocolorscale=False,
reversescale=False,
marker_line_color="darkgray",
marker_line_width=0.5,
zmax=1,
zmin=0,
)
)
fig.update_layout(
title_text=(
"Percent population with at least one dose of COVID-19 vaccine"
),
geo_scope="world",
geo=dict(
showframe=False, showcoastlines=False, projection_type="equirectangular"
),
)
Deck("Choropleth Map", plotly.io.to_html(fig))
@workflow
def analytics_workflow():
"""Prepare a data analytics workflow."""
plot(df=clean_data())
Running this workflow, we get an interative plot, courtesy of plotly
:
analytics_workflow()