Integrations#

Flyte is designed to be highly extensible and can be customized in multiple ways.

Note

Want to contribute an integration example? Check out the Tutorials and integration examples contribution guide.

Flytekit plugins#

Flytekit plugins can be implemented purely in Python, unit tested locally, and allow extending Flytekit functionality. For comparison, these plugins can be thought of like Airflow operators.

Comet

comet-ml: Comet’s machine learning platform.

DBT

Run and test your dbt pipelines in Flyte.

Dolt

Version your SQL database with dolt.

DuckDB

Run analytical queries using DuckDB.

Great Expectations

Validate data with great_expectations.

MLFlow

mlflow: the open standard for model tracking.

Modin

Scale pandas workflows with modin.

Neptune

neptune: Neptune is the MLOps stack component for experiment tracking.

NIM

Serve optimized model containers with NIM.

Ollama

Serve fine-tuned LLMs with Ollama in a Flyte workflow.

ONNX

Convert ML models to ONNX models seamlessly.

Pandera

Validate pandas dataframes with pandera.

Papermill

Execute Jupyter Notebooks with papermill.

SQL

Execute SQL queries as tasks.

Weights and Biases

wandb: Machine learning platform to build better models faster.

WhyLogs

whylogs: the open standard for data logging.

Using Flytekit plugins

Data is automatically marshalled and unmarshalled in and out of the plugin. Users should mostly implement the PythonTask API defined in Flytekit.

Flytekit plugins are lazily loaded and can be released independently like libraries. The naming convention is flytekitplugins-*, where * indicates the package to be integrated into Flytekit. For example, flytekitplugins-papermill enables users to author Flytekit tasks using Papermill.

You can find the plugins maintained by the core Flyte team here.

Native backend plugins#

Native backend plugins can be executed without any external service dependencies because the compute is orchestrated by Flyte itself, within its provisioned Kubernetes clusters.

Kubeflow PyTorch

Run distributed PyTorch training jobs using Kubeflow.

Kubeflow TensorFlow

Run distributed TensorFlow training jobs using Kubeflow.

Kubernetes pods

Execute Kubernetes pods for arbitrary workloads.

Kubernetes cluster Dask jobs

Run Dask jobs on a Kubernetes Cluster.

Kubernetes cluster Spark jobs

Run Spark jobs on a Kubernetes Cluster.

MPI Operator

Run distributed deep learning training jobs using Horovod and MPI.

Ray

Run Ray jobs on a K8s Cluster.

Flyte agents#

Flyte agents are long-running, stateless services that receive execution requests via gRPC and initiate jobs with appropriate external or internal services. Each agent service is a Kubernetes deployment that receives gRPC requests from FlytePropeller when users trigger a particular type of task. (For example, the BigQuery agent handles BigQuery tasks.) The agent service then initiates a job with the appropriate service. If you don’t see the agent you need below, see “Developing agents” to learn how to develop a new agent.

AWS SageMaker Inference agent

Deploy models and create, as well as trigger inference endpoints on AWS SageMaker.

Airflow agent

Run Airflow jobs in your workflows with the Airflow agent.

BigQuery agent

Run BigQuery jobs in your workflows with the BigQuery agent.

ChatGPT agent

Run ChatGPT jobs in your workflows with the ChatGPT agent.

Databricks agent

Run Databricks jobs in your workflows with the Databricks agent.

Memory Machine Cloud agent

Execute tasks using the MemVerge Memory Machine Cloud agent.

OpenAI Batch

Submit requests for asynchronous batch processing on OpenAI.

PERIAN Job Platform agent

Execute tasks on PERIAN Job Platform.

Sensor agent

Run sensor jobs in your workflows with the sensor agent.

Snowflake agent

Run Snowflake jobs in your workflows with the Snowflake agent.

External service backend plugins#

As the term suggests, these plugins rely on external services to handle the workload defined in the Flyte task that uses the plugin.

AWS Athena

Execute queries using AWS Athena

AWS Batch

Running tasks and workflows on AWS batch service

Flyte Interactive

Execute tasks using Flyte Interactive to debug.

Hive

Run Hive jobs in your workflows.

Enabling backend plugins

To enable a backend plugin, you must add the ID of the plugin to the enabled plugins list. The enabled-plugins is available under the tasks > task-plugins section of FlytePropeller’s configuration. The plugin configuration structure is defined here. An example of the config follows:

tasks:
  task-plugins:
    enabled-plugins:
      - container
      - sidecar
      - k8s-array
    default-for-task-types:
      container: container
      sidecar: sidecar
      container_array: k8s-array

Finding the ID of the backend plugin

To find the ID of the backend plugin, look at the source code of the plugin. For examples, in the case of Spark, the value of ID is used here, defined as spark.

SDKs for writing tasks and workflows#

The community would love to help you build new SDKs. Currently, the available SDKs are:

flytekit

The Python SDK for Flyte.

flytekit-java

The Java/Scala SDK for Flyte.

Flyte operators#

Flyte can be integrated with other orchestrators to help you leverage Flyte’s constructs natively within other orchestration tools.

Airflow

Trigger Flyte executions from Airflow.