Getting Started#
Introduction to Flyte#
Flyte is a workflow orchestrator that seamlessly unifies data, machine learning, and analytics stacks for building robust and reliable applications.
This introduction provides a quick overview of how to get Flyte up and running on your local machine.
Installation#
Prerequisites
Install Docker and ensure that you have the Docker daemon running.
Flyte supports any OCI-compatible container
technology (like Podman,
LXD, and
Containerd) when running tasks on a Flyte cluster, but
for the purpose of this guide, flytectl
uses Docker to spin up a local
Kubernetes cluster so that you can interact with it on your machine.
First install flytekit, Flyteโs Python SDK and Scikit-learn.
pip install flytekit scikit-learn
Then install flytectl, which the command-line interface for interacting with a Flyte backend.
brew install flyteorg/homebrew-tap/flytectl
curl -sL https://ctl.flyte.org/install | sudo bash -s -- -b /usr/local/bin
Creating a Workflow#
The first workflow weโll create is a simple model training workflow that consists of three steps that will:
๐ท Get the classic wine dataset using sklearn.
๐ Process the data that simplifies the 3-class prediction problem into a binary classification problem by consolidating class labels
1
and2
into a single class.๐ค Train a
LogisticRegression
model to learn a binary classifier.
First, weโll define three tasks for each of these steps. Create a file called
example.py
and copy the following code into it.
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression
import flytekit.extras.sklearn
from flytekit import task, workflow
@task
def get_data() -> pd.DataFrame:
"""Get the wine dataset."""
return load_wine(as_frame=True).frame
@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
"""Simplify the task from a 3-class to a binary classification problem."""
return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1))
@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
"""Train a model on the wine dataset."""
features = data.drop("target", axis="columns")
target = data["target"]
return LogisticRegression(max_iter=3000, **hyperparameters).fit(features, target)
As we can see in the code snippet above, we defined three tasks as Python
functions: get_data
, process_data
, and train_model
.
In Flyte, tasks are the most basic unit of compute and serve as the building blocks ๐งฑ for more complex applications. A task is a function that takes some inputs and produces an output. We can use these tasks to define a simple model training workflow:
@workflow
def training_workflow(hyperparameters: dict) -> LogisticRegression:
"""Put all of the steps together into a single workflow."""
data = get_data()
processed_data = process_data(data=data)
return train_model(
data=processed_data,
hyperparameters=hyperparameters,
)
Note
A task can also be an isolated piece of compute that takes no inputs and produces no output, but for the most part to do something useful a task is typically written with inputs and outputs.
A workflow is also defined as a Python function, and it specifies the flow of data between tasks and, more generally, the dependencies between tasks ๐.
Running Flyte Workflows in Python#
You can run the workflow in example.py
on a local Python by using pyflyte
,
the CLI that ships with flytekit
.
pyflyte run example.py training_workflow \
--hyperparameters '{"C": 0.1}'
Running Workflows in a Flyte Cluster#
You can also use pyflyte run
to execute workflows on a Flyte cluster.
To do so, first spin up a local demo cluster. flytectl
uses Docker to create
a local Kubernetes cluster and minimal Flyte backend that you can use to run
the example above:
Important
Before you start the local cluster, make sure that you allocate a minimum of
4 CPUs
and 3 GB
of memory in your Docker daemon. If youโre using the
Docker Desktop, you can
do this easily by going to:
Settings > Resources > Advanced
Then set the CPUs and Memory sliders to the appropriate levels.
flytectl demo start
Expected Output:
๐จโ๐ป Flyte is ready! Flyte UI is available at http://localhost:30080/console ๐ ๐ ๐
โ๏ธ Run the following command to export sandbox environment variables for accessing flytectl
export FLYTECTL_CONFIG=~/.flyte/config-sandbox.yaml
๐ Flyte sandbox ships with a Docker registry. Tag and push custom workflow images to localhost:30000
๐ The Minio API is hosted on localhost:30002. Use http://localhost:30080/minio/login for Minio console
Important
Make sure to export the FLYTECTL_CONFIG=~/.flyte/config-sandbox.yaml
environment
variable in your shell.
Then, run the workflow on the Flyte cluster with pyflyte run
using the
--remote
flag:
pyflyte run --remote example.py training_workflow \
--hyperparameters '{"C": 0.1}'
Expected Output: A URL to the workflow execution on your demo Flyte cluster:
Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/<execution_name> to see execution in the console.
Where <execution_name>
is a unique identifier for the workflow execution.
Inspect the Results#
Navigate to the URL produced by pyflyte run
. This will take you to
FlyteConsole, the web UI used to manage Flyte entities such as tasks,
workflows, and executions.
Note
There are a few features about FlyteConsole worth pointing out in the GIF above:
The default execution view shows the list of tasks executing in sequential order.
The right-hand panel shows metadata about the task execution, including logs, inputs, outputs, and task metadata.
The Graph view shows the execution graph of the workflow, providing visual information about the topology of the graph and the state of each node as the workflow progresses.
On completion, you can inspect the outputs of each task, and ultimately, the overarching workflow.
Summary#
๐ Congratulations! In this introductory guide, you:
๐ Created a Flyte script, which trains a binary classification model.
๐ Spun up a demo Flyte cluster on your local system.
๐ Ran a workflow locally and on a demo Flyte cluster.
Whatโs Next?#
Follow the rest of the sections in the documentation to get a better understanding of the key constructs that make Flyte such a powerful orchestration tool ๐ช.
Recommendation
If youโre new to Flyte we recommend that you go through the Flyte Fundamentals and Core Use Cases section before diving into the other sections of the documentation.
A brief tour of the Flyteโs main concepts and development lifecycle |
|
An overview of core uses cases for data, machine learning, and analytics practitioners. |
|
A comprehensive view of Flyteโs functionality for data scientists, ML engineers, data engineers, and data analysts. |
|
End-to-end examples of Flyte for data/feature engineering, machine learning, bioinformatics, and more. |
|
Guides for platform engineers to deploy a Flyte cluster on your own infrastructure. |