Flyte Plugins

Flyte is designed to be highly extensible. Flyte can be extended in multiple ways

  1. Flytekit only plugins: Plugins that are like executing a python function in a container

  2. Flyte backend global plugins: Plugins that are independent of the SDK and enable backend capabilities in Flyte and are global for the entire deployment

  3. Flyte custom container executions: Execute arbitrary containers - data is loaded into the container as files and read out of the containers. One can write c++ code, bash scripts and any containerized program

  4. Bring your own SDK: the community would love to help you with your own ideas of building a new SDK. Ideas include - golang, javascript/nodejs etc

Available SDKs:

  1. Flytekit is the Python SDK for writing Flyte tasks and workflows and is optimized for Machine Learning pipelines and ETL workloads

  2. Flytekit-Java is the Java/SCALA SDK optimized for ETL and data processing workloads

What are Flytekit [python] only plugins?

Flytekit plugins are simple plugins that can be implemented purely in python, unit tested locally and allow extending Flytekit functionality. These plugins can be anything and for comparison can be thought of like Airflow Operators. Data is automatically marshalled and unmarshalled into and out of the plugin and mostly users should implement flytekit.core.base_task.PythonTask API, defined in flytekit. This tutorial will illustrate how a plugin can be implemented with the help of an example.

Flytekit Plugins are lazily loaded and can be released independently like libraries. We follow a convention to name the plugin like flytekitplugins-*, where * implies the capability. For example flytekitplugins-papermill enables users to author flytekit tasks using Papermill

Examples of flytekit only plugins:

  1. Papermill implementation flytekitplugins-papermill

  2. SQLite3 implementation SQLite3 Queries

What are Backend Plugins?

Flyte backend plugins are more involved and implementation needs writing code in Golang that gets plugged into the Flyte backend engine. These plugins are statically loaded into the FlytePropeller. The contract for the plugin can be encoded in any serialization format - e.g. JSON, OpenAPI, protobuf. The community in general prefers using protobuf. Once the backend plugin is implemented, any language SDK can be implemented to provide a specialized interface for the user.

Examples

  1. Sagemaker

  2. K8s Spark

Native Backend Plugins

Native Backend Plugins are plugins that can be executed without any external service dependencies. The compute is orchestrated by Flyte itself, within its provisioned kubernetes clusters. Some examples of native plugins are

  1. Python functions

  2. K8s Containerized Spark

  3. Array Tasks

  4. Pod Tasks

  5. K8s native distributed Pytorch training using Kubeflow Pytorch Operator

  6. K8s native distributed Tensorflow training using Kubeflow TF Operator

External Service Backend Plugins

  1. AWS Sagemaker Training

  2. AWS Batch

  3. Qubole Hive