Analytics#
Flyte is ideal for data cleaning, statistical summarization, and plotting
because with flytekit
you can leverage the rich Python ecosystem of data
processing and visualization tools.
Cleaning Data#
In this example, we’re going to analyze some covid vaccination data:
import pandas as pd
import plotly
import plotly.graph_objects as go
from flytekit import Deck, task, workflow, Resources
@task(requests=Resources(mem="1Gi"))
def clean_data() -> pd.DataFrame:
"""Clean the dataset."""
df = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
filled_df = (
df.sort_values(["people_vaccinated"], ascending=False)
.groupby("location")
.first()
.reset_index()
)[["location", "people_vaccinated", "population", "date"]]
return filled_df
2023-03-23 20:37:01.145753: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-23 20:37:01.335443: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
As you can see, we’re using pandas
for data processing, and in the task
below we use plotly
to create a choropleth map of the percent of a country’s
population that has received at least one COVID-19 vaccination.
Rendering Plots#
We can use Flyte Decks for rendering a static HTML report
of the map. In this case, we normalize the people_vaccinated
by the
population
count of each country:
@task(disable_deck=False)
def plot(df: pd.DataFrame):
"""Render a Choropleth map."""
df["text"] = df["location"] + "<br>" + "Last updated on: " + df["date"]
fig = go.Figure(
data=go.Choropleth(
locations=df["location"],
z=df["people_vaccinated"].astype(float) / df["population"].astype(float),
text=df["text"],
locationmode="country names",
colorscale="Blues",
autocolorscale=False,
reversescale=False,
marker_line_color="darkgray",
marker_line_width=0.5,
zmax=1,
zmin=0,
)
)
fig.update_layout(
title_text=(
"Percent population with at least one dose of COVID-19 vaccine"
),
geo_scope="world",
geo=dict(
showframe=False, showcoastlines=False, projection_type="equirectangular"
),
)
Deck("Choropleth Map", plotly.io.to_html(fig))
@workflow
def analytics_workflow():
"""Prepare a data analytics workflow."""
plot(df=clean_data())
Running this workflow, we get an interative plot, courtesy of plotly
:
analytics_workflow()