Backend plugins

Tags: Extensibility, Contribute, Intermediate

This guide will take you through the why and how of writing a backend plugin for Flyte.

To recap, here are a few examples of why you would want to implement a backend plugin:

  1. We want to add a new capability to the Flyte Platform, for example we might want to:

    • Talk to a new service like AWS SageMaker, Snowflake, Redshift, Athena, BigQuery, etc.

    • Orchestrate a set of containers in a new way like Spark, Flink, Distributed training on Kubernetes (usually using a Kubernetes operator).

    • Use a new container orchestration engine like AWS Batch/ECS, Hashicorp’ Nomad

    • Use a completely new runtime like AWS Lambda, KNative, etc.

  2. You want to retain the capability to update the plugin implementation and roll out new changes and fixes without affecting the users code or requiring them to update versions of their plugins.

  3. You want the same plugin to be accessible across multiple language SDK’s.

Note

Talking to a new service can be done using flytekit extensions and usually is the better way to get started. But, once matured, most of these extensions are better to be migrated to the backend. For the rest of the cases, it is possible to extend flytekit to achieve these scenarios, but this is less desirable, because of the associated overhead of first launching a container that launches these jobs downstream.

Basics

In this section we’ll go through the components of a backend plugin using the K8s Spark plugin as a reference. A Flyte backend extension consists of 3 parts: interface specification, flytekit plugin implementation, and flytepropeller plugin implementation.

Interface specification

Usually Flyte extensions need information that is not covered by a Flyte TaskTemplate. The TaskTemplate consists of a the interface, task_type identifier, some metadata and other fields.

Note

An important field to note here is custom. The custom field is essentially an unstructured JSON. This makes it possible to extend a task-template beyond the default supported targets container.

The motivation of the `custom`` field is to marshal a JSON structure that specifies information beyond what a regular TaskTemplate can capture. The actual structure of the JSON is known only to the implemented backend-plugin and the SDK components. The core Flyte platform, does not understand of look into the specifics of this structure.

It is highly recommended to use an interface definition language like Protobuf, OpenAPISpec etc to declare specify the structure of the JSON. From here, on we refer to this as the Plugin Specification.

Note

For Spark we decided to use Protobuf to specify the plugin as can be seen here. Note it isn’t necessary to have the Plugin structure specified in flyteidl, but we do it for simplicity, ease of maintenance alongside the core platform, and convenience leveraging existing tooling to generate code for protobuf.

Flytekit plugin implementation

Now that you have a specification, we have to implement a method to generate this new TaskTemplate, with the special custom field. Also, this is where the UX design comes into play. You want to write the best possible interface in the SDK that users are delighted to use. The end goal is to create the TaskTemplate with the Custom field populated with the actual JSON structure.

We will currently refer to the Python flytekit SDK as an example for extending and implementing the SDK.

The SDK task should be implemented as an extension of flytekit.core.base_task.PythonTask, or more commonly flytekit.PythonFunctionTask. In the case of Spark, we extend the flytekit.PythonFunctionTask, as shown here.

The SparkTask is implemented as a regular flytekit plugin, with one exception: the custom field is now actually the SparkJob protocol buffer. When serializing a task, flytekit base classes will automatically invoke the get_custom method.

FlytePropeller backend plugin

The backend plugin is where the actual logic of the execution is implemented. The backend plugin uses the Flyte PluginMachinery interface to implement a plugin which can be one of the following supported types:

  1. Kubernetes operator Plugin: The demo in the video below shows two examples of K8s backend plugins: flytekit Athena & Spark, and Flyte K8s Pod & Spark.

  2. A Web API plugin: Async or Sync.

  3. Core Plugin: if none of the above fits