Analytics#
Flyte is ideal for data cleaning, statistical summarization, and plotting
because with flytekit
you can leverage the rich Python ecosystem of data
processing and visualization tools.
Cleaning Data#
In this example, weโre going to analyze some covid vaccination data:
import pandas as pd
import plotly
import plotly.graph_objects as go
from flytekit import Deck, task, workflow, Resources
@task(requests=Resources(mem="1Gi"))
def clean_data() -> pd.DataFrame:
"""Clean the dataset."""
df = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
filled_df = (
df.sort_values(["people_vaccinated"], ascending=False)
.groupby("location")
.first()
.reset_index()
)[["location", "people_vaccinated", "population", "date"]]
return filled_df
As you can see, weโre using pandas
for data processing, and in the task
below we use plotly
to create a choropleth map of the percent of a countryโs
population that has received at least one COVID-19 vaccination.
Rendering Plots#
We can use Flyte Decks for rendering a static HTML report
of the map. In this case, we normalize the people_vaccinated
by the
population
count of each country:
@task(disable_deck=False)
def plot(df: pd.DataFrame):
"""Render a Choropleth map."""
df["text"] = df["location"] + "<br>" + "Last updated on: " + df["date"]
fig = go.Figure(
data=go.Choropleth(
locations=df["location"],
z=df["people_vaccinated"].astype(float) / df["population"].astype(float),
text=df["text"],
locationmode="country names",
colorscale="Blues",
autocolorscale=False,
reversescale=False,
marker_line_color="darkgray",
marker_line_width=0.5,
zmax=1,
zmin=0,
)
)
fig.update_layout(
title_text=(
"Percent population with at least one dose of COVID-19 vaccine"
),
geo_scope="world",
geo=dict(
showframe=False, showcoastlines=False, projection_type="equirectangular"
),
)
Deck("Choropleth Map", plotly.io.to_html(fig))
@workflow
def analytics_workflow():
"""Prepare a data analytics workflow."""
plot(df=clean_data())
/home/docs/checkouts/readthedocs.org/user_builds/flytecookbook/envs/latest/lib/python3.11/site-packages/flytekit/core/base_task.py:450: FutureWarning: disable_deck was deprecated in 1.10.0, please use enable_deck instead
warnings.warn("disable_deck was deprecated in 1.10.0, please use enable_deck instead", FutureWarning)
Running this workflow, we get an interative plot, courtesy of plotly
:
analytics_workflow()
Name | Wall Time(s) | Process Time(s) |
---|---|---|
Translate literal to python value | 0.000015 | 0.000013 |
Execute user level code | 3.759147 | 2.420174 |
Translate the output to literals | 0.349885 | 0.125454 |
Translate literal to python value | 0.076039 | 0.013439 |
Execute user level code | 0.145021 | 0.054118 |
Translate the output to literals | 0.000007 | 0.000007 |
Note:
- if the time duration is too small(< 1ms), it may be difficult to see on the time line graph.
- For accurate execution time measurements, users should refer to wall time and process time.
location | people_vaccinated | population | date | text | |
---|---|---|---|---|---|
0 | Afghanistan | 1.889700e+07 | 4.112877e+07 | 2023-11-26 | Afghanistan<br>Last updated on: 2023-11-26 |
1 | Africa | 5.549983e+08 | 1.426737e+09 | 2023-11-19 | Africa<br>Last updated on: 2023-11-19 |
2 | Albania | 1.349255e+06 | 2.842318e+06 | 2023-09-10 | Albania<br>Last updated on: 2023-09-10 |
3 | Algeria | 7.840131e+06 | 4.490323e+07 | 2022-04-24 | Algeria<br>Last updated on: 2022-04-24 |
4 | American Samoa | NaN | 4.429500e+04 | 2020-01-05 | American Samoa<br>Last updated on: 2020-01-05 |
... | ... | ... | ... | ... | ... |
250 | Western Sahara | NaN | 5.760050e+05 | 2022-04-20 | Western Sahara<br>Last updated on: 2022-04-20 |
251 | World | 5.630409e+09 | 7.975105e+09 | 2024-03-14 | World<br>Last updated on: 2024-03-14 |
252 | Yemen | 1.050112e+06 | 3.369661e+07 | 2023-11-26 | Yemen<br>Last updated on: 2023-11-26 |
253 | Zambia | 1.171156e+07 | 2.001767e+07 | 2023-06-25 | Zambia<br>Last updated on: 2023-06-25 |
254 | Zimbabwe | 6.437808e+06 | 1.632054e+07 | 2022-10-09 | Zimbabwe<br>Last updated on: 2022-10-09 |
Custom Flyte Deck Renderers#
You can also create your own custom Flyte Deck renderers to visualize data with any plotting/visualization library of your choice, as long as you can render HTML for the objects of interest.
Important
Prefer other data processing frameworks? Flyte ships with Polars, Dask, Modin, Spark, Vaex, and DBT integrations.
If you need to connect to a database, Flyte provides first-party support for AWS Athena, Google Bigquery, Snowflake, SQLAlchemy, and SQLite3.