Deploying Workflows - Registration#

Locally, Flytekit relies on the Python interpreter to execute tasks and workflows. To leverage the full power of Flyte, we recommend using a deployed backend of Flyte. Flyte can be run on any Kubernetes cluster (for example, a local cluster like kind), in a cloud environment, or on-prem. This process of deploying your workflows to a Flyte cluster is known as registration. It involves the following steps:

  1. Writing code, SQL etc;

  2. Providing packaging in the form of Docker images for code when needed. In some cases packaging isn’t needed, because the code itself is portable- for example SQL, or the task references a remote service - SageMaker Builtin algorithms, or the code can be safely transferred over;

  3. Alternatively, packaging with deployment-fast-registration;

  4. Registering the serialized workflows and tasks.

Using remote Flyte provides:

  • Caching: To avoid calling the same task with the same inputs (for the same version);

  • Portability: To reference pre-registered entities under any domain or project within your workflow code;

  • Shareable executions: To easily share links of your executions with your teammates.

Refer to the Getting Started for details on Flyte installation.

Build Your Dockerfile#

  1. Commit your changes. Some of the steps below default to referencing the git sha.

  2. Run pyflyte register. This command compiles all Flyte entities and sends it to the backend as specified by your config file.

  3. Build a container image that holds your code.

 1FROM python:3.8-slim-buster
 2LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks
 3
 4WORKDIR /root
 5ENV VENV /opt/venv
 6ENV LANG C.UTF-8
 7ENV LC_ALL C.UTF-8
 8ENV PYTHONPATH /root
 9
10# This is necessary for opencv to work
11RUN apt-get update && apt-get install -y libsm6 libxext6 libxrender-dev ffmpeg build-essential
12
13# Install the AWS cli separately to prevent issues with boto being written over
14RUN pip3 install awscli
15
16ENV VENV /opt/venv
17# Virtual environment
18RUN python3 -m venv ${VENV}
19ENV PATH="${VENV}/bin:$PATH"
20
21# Install Python dependencies
22COPY core/requirements.txt /root
23RUN pip install -r /root/requirements.txt
24
25# Copy the makefile targets to expose on the container. This makes it easier to register
26COPY in_container.mk /root/Makefile
27COPY core/sandbox.config /root
28
29# Copy the actual code
30COPY core /root/core
31
32# This tag is supplied by the build script and will be used to determine the version
33# when registering tasks, workflows, and launch plans
34ARG tag
35ENV FLYTE_INTERNAL_IMAGE $tag

Note

In the above Dockerfile, core directory is considered.

Package Your Workflows and Tasks#

Getting your tasks, workflows, and launch plans to run on a Flyte platform is a two-step process. Serialization is the first step of that process. It produces registerable protobuf files for your tasks and templates. For every task, one protobuf file is produced which represents one TaskTemplate object. For every workflow, one protofbuf file is produced which represents a WorkflowClosure object. The second step is to compress the folder into a zip file. Once you’ve built a Docker container image with your updated code changes, you can use the pyflyte package command to complete both the steps, that is:

  1. Serialize your tasks;

  2. Compress the folder to a zip file.

pyflyte --pkgs <parent-package>.<child-package-with-the-workflow> package --image somedocker.com/myimage:someversion123

where

  • --pkgs arg points to the packages within the directory.

  • --image is a non-optional fully qualified name of the container image housing your code.

Register Your Workflows and Tasks#

Once you’ve packaged your workflows and tasks to proto, you’ll need to register them with your deployed Flyte installation. You can register your workflows and tasks using pyflyte register command. This command fast regsisters by default. It compiles all your Flyte entities defined in Python, and sends these entities to the backend specified by your config file. It can be understood as combination of pyflyte package and flytectl register commands.

pyflyte register -p project_name -d domain_name -i xyz.io/docker:latest -o output_directory -d tar_file_directory --service-account account_name --raw-data-prefix offloaded_data_location -v version

where

  • -p specifies the project to register your entities. This project itself must already be registered on your Flyte deployment.s

  • -d specifies the domain to register your entities. This domain must already be configured in your Flyte deployment.

  • -i specifies the fully qualified tag for a docker image. It is optional, and if not specified, the default image is used.

  • -o specifies the directory where the zip file (containing protobuf definitions) is written to.

  • -d specifies the directory inside the image where the tar file (containing the code) is copied to.

  • :code:-service-account specifies the account used when creating launch plans. It is optional.

  • -v is a unique string that identifies the version of your entities to be registered under a project and domain.

Let us also understand how the combination of the pyflyte package and flytectl register commands works.

pyflyte --pkgs <parent-package>.<child-package-with-the-workflow> package --image somedocker.com/myimage:someversion123
flytectl register files _pb_output/* -p flytetester -d development --version ${VERSION}  --k8sServiceAccount demo --outputLocationPrefix s3://my-s3-bucket/raw_data --config path/to/config/yaml

where

  • -p specifies the project to register your entities. This project itself must already be registered on your Flyte deployment.

  • -d specifies the domain to register your entities. This domain must already be configured in your Flyte deployment

  • --version is a unique string used to identify the version of your entities to be registered under a project and domain.

  • If required, you can specify a --k8sServiceAccount and --assumableIamRole which your tasks will run with.

Fast Registration#

Re-building a new Docker container image for every code change you make is cumbersome and slow. If you’re making purely code changes that do not require updating your container definition, you can make use of fast serialization and registration to speed up your iteration process and reduce the time it takes to upload new entity versions and development code to your hosted Flyte deployment.

First, run the fast serialization target:

pyflyte --pkgs core package --image core:v1 --fast --force

where

  • --fast flag enables fast packaging, that is, it allows a no container build to deploy flyte tasks and workflows.

  • --force flag helps override existing output files. If an output file exists, the package exits with an error.

Next, use pyflyte register which fast registers the target:

pyflyte register -p project_name -d domain_name --output s3://my-s3-bucket/raw_data

And just like that, you can update your code without requiring a rebuild of your container!

As fast registration serializes code from your local workstation and uploads it to the hosted flyte deployment, make sure to specify the following arguments correctly to ensure that the changes are picked up when the workflow is run.

  • pyflyte has a flag --pkgs that specifies the code to be packaged. The fast flag picks up the code from the local machine and provides it for execution without building and pushing a container.

  • pyflyte also has a flag --image to specify the Docker image that has already been built.

Building Images#

If you are iterating locally, you don’t need to push your Docker image. For Docker Desktop, locally built images are available for use in its K8s cluster.

If you wish to push your image to a registry (Dockerhub, ECR, etc.) later, run:

REGISTRY=docker.io/corp make all_docker_push

Pulling Images from a Private Container Image Registry#

You can use different private container registries (AWS ECR, Docker Hub, GitLab Container Registry). Ensure that you have the command line tools and login information associated with the registry. An imagePullSecret is required to pull a private image.

A general trivia while using these private registries has been discussed below.

  1. Using the default service account or a new service account.

    1. Add the authorization token and name to the default/new service account.

    2. Ensure that the service account you are using for authentication has permissions to access the container registry.

    3. Add your imagePullSecrets to this service account.

    4. Use this default/new service account to login into the private registry and pull the image.

OR

  1. Using a private registry in Docker Hub.

    1. Use a custom pod template to create a pod. This template is automatically added to every pod that Flyte creates.

    2. Add your imagePullSecrets to this custom pod template.

    3. Update FlytePropeller about the pod created in the previous step.

    4. FlytePropeller adds imagePullSecrets (and other customization for the pod) to the PodSpec which would look similar to this manifest.

    5. The pods with their keys can log in and access the images in the private registry.

Once you set up the token to authenticate with the private registry, you can pull images from them.

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery