Integrations#

Flyte is designed to be highly extensible and can be customized in multiple ways.

Note

Want to contribute an example? Check out the Example Contribution Guide.

Flytekit Plugins#

Flytekit plugins are simple plugins that can be implemented purely in python, unit tested locally and allow extending Flytekit functionality. These plugins can be anything and for comparison can be thought of like Airflow Operators.

SQL

Execute SQL queries as tasks.

Great Expectations

Validate data with great_expectations.

Papermill

Execute Jupyter Notebooks with papermill.

Pandera

Validate pandas dataframes with pandera.

Modin

Scale pandas workflows with modin.

Dolt

Version your SQL database with dolt.

DBT

Run and test your dbt pipelines in Flyte.

WhyLogs

whylogs: the open standard for data logging.

MLFlow

mlflow: the open standard for model tracking.

ONNX

Convert ML models to ONNX models seamlessly.

DuckDB

Run analytical queries using DuckDB.

Native Backend Plugins#

Native Backend Plugins are the plugins that can be executed without any external service dependencies because the compute is orchestrated by Flyte itself, within its provisioned Kubernetes clusters.

K8s Pods

Execute K8s pods for arbitrary workloads.

K8s Cluster Dask Jobs

Run Dask jobs on a K8s Cluster.

K8s Cluster Spark Jobs

Run Spark jobs on a K8s Cluster.

Kubeflow PyTorch

Run distributed PyTorch training jobs using Kubeflow.

Kubeflow TensorFlow

Run distributed TensorFlow training jobs using Kubeflow.

MPI Operator

Run distributed deep learning training jobs using Horovod and MPI.

Ray Task

Run Ray jobs on a K8s Cluster.

Flyte agents#

Flyte agents are long-running, stateless services that receive execution requests via gRPC and initiate jobs with appropriate external or internal services. Each agent service is a Kubernetes deployment that receives gRPC requests from FlytePropeller when users trigger a particular type of task. (For example, the BigQuery agent handles BigQuery tasks.) The agent service then initiates a job with the appropriate service. If you don’t see the agent you need below, see “Developing agents” to learn how to develop a new agent.

Airflow agent

Run Airflow jobs in your workflows with the Airflow agent.

BigQuery agent

Run BigQuery jobs in your workflows with the BigQuery agent.

ChatGPT agent

Run ChatGPT jobs in your workflows with the ChatGPT agent.

Databricks

Run Databricks jobs in your workflows with the Databricks agent.

Memory Machine Cloud

Execute tasks using the MemVerge Memory Machine Cloud agent.

Sensor

Run sensor jobs in your workflows with the sensor agent.

Snowflake

Run Snowflake jobs in your workflows with the Snowflake agent.

External Service Backend Plugins#

As the term suggests, external service backend plugins rely on external services like Hive for handling the workload defined in the Flyte task that uses the respective plugin.

AWS Sagemaker: Model Training plugin

Train models with built-in or define your own custom algorithms.

AWS Sagemaker: Pytorch Training plugin

Train Pytorch models using Sagemaker, with support for distributed training.

AWS Athena plugin

Execute queries using AWS Athena

AWS Batch plugin

Running tasks and workflows on AWS batch service

Flyte Interactive

Execute tasks using Flyte Interactive to debug.

Hive plugin

Run Hive jobs in your workflows.

SDKs for Writing Tasks and Workflows#

The community would love to help you with your own ideas of building a new SDK. Currently the available SDKs are:

flytekit

The Python SDK for Flyte.

flytekit-java

The Java/Scala SDK for Flyte.

Flyte Operators#

Flyte can be integrated with other orchestrators to help you leverage Flyte’s constructs natively within other orchestration tools.

Airflow

Trigger Flyte executions from Airflow.