Deploying Workflows - Registration

Locally, Flytekit relies on the Python interpreter to execute both tasks and workflows. To leverage the full power of Flyte, we recommend using a deployed backend of Flyte. Flyte can be run on any Kubernetes cluster (e.g. a local cluster like kind), in a cloud environment, or on-prem. This process of deploying your workflows to a Flyte cluster is called as Registration. It involves the following steps,

  1. Writing code, SQL etc

  2. Providing packaging in the form of Docker images, for code, when needed. Some cases you dont need packaging, because the code itself is portable - example SQL, or the task references a remote service - Sagemaker Builtin algorithms, or the code can be safely transferred over

  3. Alternatively, package with Fast Registration

  4. Register the serialized workflows and tasks

Using remote Flyte gives you the ability to:

  • Use caching to avoid calling the same task with the same inputs (for the same version)

  • Portability: You can reference pre-registered entities under any domain or project within your workflow code

  • Sharable executions: you can easily share links to your executions with your teammates

Please refer to the Getting Started for details on getting started with the Flyte installation.

Build your Dockerfile

  1. First commit your changes. Some of the steps below default to referencing the git sha.

  2. Run make serialize. This will build the image tagged with just flytecookbook:<sha>, no registry will be prefixed. See the image building section below for additional information.

  3. Build a container image that holds your code.

 1FROM python:3.8-slim-buster
 2LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks
 3
 4WORKDIR /root
 5ENV VENV /opt/venv
 6ENV LANG C.UTF-8
 7ENV LC_ALL C.UTF-8
 8ENV PYTHONPATH /root
 9
10# This is necessary for opencv to work
11RUN apt-get update && apt-get install -y libsm6 libxext6 libxrender-dev ffmpeg build-essential
12
13# Install the AWS cli separately to prevent issues with boto being written over
14RUN pip3 install awscli
15
16ENV VENV /opt/venv
17# Virtual environment
18RUN python3 -m venv ${VENV}
19ENV PATH="${VENV}/bin:$PATH"
20
21# Install Python dependencies
22COPY core/requirements.txt /root
23RUN pip install -r /root/requirements.txt
24
25# Copy the makefile targets to expose on the container. This makes it easier to register
26COPY in_container.mk /root/Makefile
27COPY core/sandbox.config /root
28
29# Copy the actual code
30COPY core /root/core
31
32# This tag is supplied by the build script and will be used to determine the version
33# when registering tasks, workflows, and launch plans
34ARG tag
35ENV FLYTE_INTERNAL_IMAGE $tag

Note

core is the directory being considered in the above Dockerfile.

Serialize your workflows and tasks

Getting your tasks, workflows, and launch plans to run on a Flyte platform is effectively a two-step process. Serialization is the first step of that process. It is the translation of all your Flyte entities defined in Python, into Flyte IDL entities, defined in protobuf.

Once you’ve built a Docker container image with your updated code changes, you can use the predefined make target to easily serialize your tasks:

make serialize

This runs the pyflyte serialize command to convert your workflow and task definitions to registerable protos. The make target is a handy wrapper around the following:

pyflyte -c sandbox.config --pkgs core serialize --in-container-config-path /root/sandbox.config --local-source-root ${CURDIR} --image ${FULL_IMAGE_NAME}:${VERSION} workflows -f _pb_output/
  • the -c is the path to the config definition on your machine. This config specifies SDK default attributes.

  • the --pkgs arg points to the packages within the

  • --local-source-root which contains the code copied over into your Docker container image that will be serialized (and later, executed)

  • --in-container-config-path maps to the location within your Docker container image where the above config file will be copied over too

  • --image is the non-optional fully qualified name of the container image housing your code

In-container serialization

Notice that the commands above are run locally, _not_ inside the container. Strictly speaking, to be rigourous, serialization should be done within the container for the following reasons.

  1. It ensures that the versions of all libraries used at execution time on the Flyte platform, are the same that are used during serialization.

  2. Since serialization runs part of flytekit, it helps ensure that your container is set up correctly.

Take a look at this make target to see how it’s done.

make serialize

Register your Workflows and Tasks

Once you’ve serialized your workflows and tasks to proto, you’ll need to register them with your deployed Flyte installation. Again, you can make use of the included make target like so:

OUTPUT_DATA_PREFIX=s3://my-s3-bucket/raw_data FLYTE_HOST=flyte.example.com make register

making sure to appropriately substitute the correct output data location (to persist workflow execution outputs) along with the URL to your hosted Flyte deployment.

Under the hood this recipe again supplies some defaults you may find yourself wishing to customize. Specifically, this recipe calls:

flyte-cli register-files -p flytetester -d development -v ${VERSION} --kubernetes-service-account demo        --output-location-prefix s3://my-s3-bucket/raw_data -h flyte.example.com _pb_output/*

Of interest are the following args:

  • -p specifies the project to register your entities. This project itself must already be registered on your Flyte deployment.

  • -d specifies the domain to register your entities. This domain must already be configured in your Flyte deployment

  • -v is a unique string used to identify this version of entities registered under a project and domain.

  • If required, you can specify a kubernetes-service-account or assumable_iam_role which your tasks will run with.

Fast(er) iteration

Re-building a new Docker container image for every code change you make can become cumbersome and slow. If you’re making purely code changes that do not require updating your container definition, you can make use of fast serialization and registration to speed up your iteration process and reduce the time it takes to upload new entity versions and development code to your hosted Flyte deployment.

First, run the fast serialization target:

make fast_serialize

And then the fast register target:

OUTPUT_DATA_PREFIX=s3://my-s3-bucket/raw_data FLYTE_HOST=flyte.example.com ADDL_DISTRIBUTION_DIR=s3://my-s3-bucket/archives make register

and just like that you can update your code without requiring a rebuild of your container!

As fast registration serializes code from your local workstation and uploads it to the hosted flyte deployment, make sure to specify the following arguments correctly to ensure that the changes are picked up when the workflow is run.

  • pyflyte serialize has a --local-source-root option which specifies which code is uploaded during the fast registration step. This ensures that the files you want to modify are serialized. This is optional and should be used when your code lies outside of your current working directory.

  • flyte-cli fast-register-files has a --dest-dir option which specifies which folder (in the container) the fast serialization will dump the code in at execution time. This ensures that the running workflow loads the code changes that were uploaded via fast registration.

Building Images

If you are just iterating locally, there is no need to push your Docker image. For Docker for Desktop at least, locally built images will be available for use in its K8s cluster.

If you would like to later push your image to a registry (Dockerhub, ECR, etc.), you can run,

REGISTRY=docker.io/corp make all_docker_push

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery