Analytics

Flyte is ideal for data cleaning, statistical summarization, and plotting because with flytekit you can leverage the rich Python ecosystem of data processing and visualization tools.

Cleaning data

In this example, we are going to analyze some covid vaccination data:

import pandas as pd
import plotly
import plotly.graph_objects as go
from flytekit import Deck, task, workflow, Resources


@task(requests=Resources(mem="1Gi"))
def clean_data() -> pd.DataFrame:
    """Clean the dataset."""
    df = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
    filled_df = (
        df.sort_values(["people_vaccinated"], ascending=False)
        .groupby("location")
        .first()
        .reset_index()
    )[["location", "people_vaccinated", "population", "date"]]
    return filled_df

As you can see, we’re using pandas for data processing, and in the task below we use plotly to create a choropleth map of the percent of a country’s population that has received at least one COVID-19 vaccination.

Rendering plots

We can use Flyte Decks for rendering a static HTML report of the map. In this case, we normalize the people_vaccinated by the population count of each country:

@task(disable_deck=False)
def plot(df: pd.DataFrame):
    """Render a Choropleth map."""
    df["text"] = df["location"] + "<br>" + "Last updated on: " + df["date"]
    fig = go.Figure(
        data=go.Choropleth(
            locations=df["location"],
            z=df["people_vaccinated"].astype(float) / df["population"].astype(float),
            text=df["text"],
            locationmode="country names",
            colorscale="Blues",
            autocolorscale=False,
            reversescale=False,
            marker_line_color="darkgray",
            marker_line_width=0.5,
            zmax=1,
            zmin=0,
        )
    )

    fig.update_layout(
        title_text=(
          "Percent population with at least one dose of COVID-19 vaccine"
        ),
        geo_scope="world",
        geo=dict(
            showframe=False, showcoastlines=False, projection_type="equirectangular"
        ),
    )
    Deck("Choropleth Map", plotly.io.to_html(fig))


@workflow
def analytics_workflow():
    """Prepare a data analytics workflow."""
    plot(df=clean_data())

Running this workflow, we get an interactive plot, courtesy of plotly:

analytics_workflow()

Custom Flyte deck renderers

You can also create your own custom Flyte Deck renderers to visualize data with any plotting/visualization library of your choice, as long as you can render HTML for the objects of interest.

Important

Prefer other data processing frameworks? Flyte ships with Polars, Dask, Modin, Spark, Vaex, and DBT integrations.

If you need to connect to a database, Flyte provides first-party support for AWS Athena, Google Bigquery, Snowflake, SQLAlchemy, and SQLite3.