Multiple Container Images in a Single Workflow#

When working locally, it is typically preferable to install all requirements of your project locally (maybe in a single virtual environment). It gets complicated when you want to deploy your code to a remote environment. This is because most tasks in Flyte (function tasks) get deployed using a Docker Container.

A docker container allows you to create an expected environment for your tasks. Though it is completely possible to build a single Container Image that contains all your dependencies, it is complicated to achieve this in practice.

The following are reasons why it is complicated and not recommended:

  1. All dependencies in one container increases the size of the container image

  2. Some task executions like Spark, Sagemaker-based Training, and Deep Learning using GPUs need specific runtime configurations. For example,

    • Spark needs JavaVirtualMachine to be installed and Spark entrypoints to be set

    • NVIDIA drivers and other corresponding libraries need to be installed to use GPUs for deep learning. However, these are not required for a CPU.

    • Sagemaker expects the entrypoint to be specifically designed to accept its parameters

  3. Building a singular image may increase the build time for the image itself

Note

Flyte (Service) by default does not require that a Workflow is bound to a single Container Image. Flytekit offers a simple interface to easily alter the images that should be associated per task, yet keeping the local execution simple for the user.

For every flytekit.PythonFunctionTask type task or simply a task that is decorated with the @task decorator, users can supply rules of how the container image should be bound. By default, flytekit will associate one container image with all tasks. This image is called the default image. To alter the image, users should use the container_image parameter available in the flytekit.task() decorator. Any one of the following is an acceptable

  1. Image reference is specified, but the version is derived from the default images version container_image="docker.io/redis:{{.image.default.version}},

  2. Both the FQN and the version are derived from the default image container_image="{{.image.default.fqn}}:spark-{{.image.default.version}},

The images themselves are parameterizable in the config in the following format:

{{.image.<name>.<attribute>}}

  • name refers to the name of the image in the image configuration. The name default is a reserved keyword and will automatically apply to the default image name for this repository.

  • fqn refers to the fully qualified name of the image. For example it includes the repository and domain url of the image. E.g. docker.io/my_repo/xyz.

  • version refers to the tag of the image. E.g. latest, or python-3.8 etc. If the container_image is not specified then the default configured image for the project is used.

Note

The default image (name + version) is always {{.image.default.fqn}}:{{.image.default.version}}

Lets declare a task that uses an image that is derived from the defaults FQN and version

For another example where this is done please refer to Converting a Spark DataFrame to a Pandas DataFrame

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery