Integrations

Flyte is designed to be highly extensible and can be customized in multiple ways.

Flytekit Plugins

Flytekit plugins are simple plugins that can be implemented purely in python, unit tested locally and allow extending Flytekit functionality. These plugins can be anything and for comparison can be thought of like Airflow Operators.

SQL

Execute SQL queries as tasks.

Validate data with great_expectations.

Execute Jupyter Notebooks with papermill.

Validate pandas dataframes with pandera.

Version your SQL database with dolt.

Native Backend Plugins

Native Backend Plugins are plugins that can be executed without any external service dependencies because the compute is orchestrated by Flyte itself, within its provisioned kubernetes clusters.

Execute K8s pods for arbitrary workloads.

Run Spark jobs on a K8s Cluster.

Run distributed pytorch training jobs using Kubeflow.

Run distributed deep learning training jobs using Horovod and MPI.

External Service Backend Plugins

As the term suggests, external service backend plugins relies on external services like AWS Sagemaker, Hive or Snowflake for handling the workload defined in the Flyte task that use the respective plugin.

Train models with built-in or define your own custom algorithms.

Train Pytorch models using Sagemaker, with support for distributed training.

Execute queries using AWS Athena

Run Hive jobs in your workflows.

Run Snowflake jobs in your workflows.

Custom Container Tasks

Because Flyte uses executable docker containers as the smallest unit of compute, you can write custom tasks with the flytekit.ContainerTask via the flytekit SDK.

Execute arbitrary containers: You can write c++ code, bash scripts and any containerized program.

SDKs for Writing Tasks and Workflows

The community would love to help you with your own ideas of building a new SDK. Currently the available SDKs are:

The Python SDK for Flyte.

The Java/Scala SDK for Flyte.