Creating a Flyte Project#
So far we’ve been dealing with fairly simple workflows composed of a handful of tasks, all of which can fit into a single Python script. In this guide, you’ll learn how to organize a Flyte project so that it can scale to a larger codebase.
Prerequisites
Install
flytekit
andflytectl
according to the introduction guide instructions.Install
git
.
A Flyte project is essentially a directory containing workflows, internal Python source code, configuration, and other artifacts needed to package up your code so that it can be run on a Flyte cluster.
pyflyte
, the CLI tool that ships with flytekit
, comes with an init
command
that you can use to quickly initialize a Flyte project according to the
recommended file structure.
pyflyte init my_project
cd my_project
git init # initialize a git repository
Project Structure#
If you examine my_project
, you’ll see the following file structure:
my_project
├── Dockerfile # Docker image
├── LICENSE
├── README.md
├── docker_build.sh # Docker build helper script
├── requirements.txt # Python dependencies
└── workflows
├── __init__.py
└── example.py # Example Flyte workflows
Note
You can create your own conventions and file structure for your Flyte projects.
The pyflyte init
command simply provides a good starting point.
In the rest of this guide we’ll orient you to all the important pieces of the minimal Flyte project template.
Create a Virtual Environment#
We recommend creating a virtual environment for your Flyte project so you that you can isolate its dependencies:
python -m venv ~/venvs/my_project
source ~/venvs/my_project/bin/activate
pip install -r requirements.txt
Note
You can also use other tools like miniconda to create a virtual environment.
Example Workflows#
The workflows/example.py
module contains a simple set of tasks and workflows
that you can use to make sure that everything’s working as expected:
python workflows/example.py
See Workflow
"""A simple Flyte example."""
import typing
from flytekit import task, workflow
@task
def say_hello(name: str) -> str:
"""A simple Flyte task to say "hello".
The @task decorator allows Flyte to use this function as a Flyte task, which
is executed as an isolated, containerized unit of compute.
"""
return f"hello {name}!"
@task
def greeting_length(greeting: str) -> int:
"""A task the counts the length of a greeting."""
return len(greeting)
@workflow
def wf(name: str = "union") -> typing.Tuple[str, int]:
"""Declare workflow called `wf`.
The @workflow decorator defines an execution graph that is composed of tasks
and potentially sub-workflows. In this simple example, the workflow is
composed of just one task.
There are a few important things to note about workflows:
- Workflows are a domain-specific language (DSL) for creating execution
graphs and therefore only support a subset of Python's behavior.
- Tasks must be invoked with keyword arguments
- The output variables of tasks are Promises, which are placeholders for
values that are yet to be materialized, not the actual values.
"""
greeting = say_hello(name=name)
greeting_len = greeting_length(greeting=greeting)
return greeting, greeting_len
if __name__ == "__main__":
# Execute the workflow, simply by invoking it like a function and passing in
# the necessary parameters
print(f"Running wf() { wf(name='passengers') }")
Expected output
Running wf() DefaultNamedTupleOutput(o0='hello passengers!', o1=17)
Python Dependencies#
You can specify additional Python dependencies in your project by updating the
requirements.txt
file. This gives you the flexibility to use any
pip-installable package that your project may need.
Note
We recommend using pip-compile to manage the requirements of your project.
See requirements.txt
flytekit>=1.5.0
pandas
scikit-learn
Dockerfile#
The minimal Flyte project ships with a Dockerfile
that defines the
system requirements for running your tasks and workflows. You can customize this
image to suit your needs:
See Dockerfile
FROM python:3.8-slim-buster
WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
RUN apt-get update && apt-get install -y build-essential
ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"
# Install Python dependencies
COPY requirements.txt /root
RUN pip install -r /root/requirements.txt
# Copy the actual code
COPY . /root
# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
What’s Next?#
In summary, this guide took you through the recommended way of initializing and structuring a larger Flyte codebase. In the next guide, we’ll walk through how to package and register your tasks and workflows so that they can be executed on a Flyte cluster.