Databricks Plugin Setup#

This guide gives an overview of how to set up Databricks in your Flyte deployment.

Add Flyte chart repo to Helm

helm repo add flyteorg https://flyteorg.github.io/flyte

Setup the Cluster#

Start the sandbox cluster

flytectl demo start

Generate flytectl config

flytectl config init

Upload an entrypoint.py to dbfs or s3. Spark driver node run this file to override the default command in the dbx job.

Specify Plugin Configuration#

Create a file named values-override.yaml and add the following config to it:

configmap:
  enabled_plugins:
    # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig)
    tasks:
      # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig)
      task-plugins:
        # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend
        # plugins
        enabled-plugins:
          - container
          - sidecar
          - k8s-array
          - databricks
        default-for-task-types:
          container: container
          sidecar: sidecar
          container_array: k8s-array
          spark: databricks
databricks:
  enabled: True
  plugin_config:
    plugins:
      databricks:
        entrypointFile: dbfs:///FileStore/tables/entrypoint.py
        databricksInstance: dbc-a53b7a3c-614c

Get an API Token#

Create a Databricks account and follow the docs for creating an access token.

Then, create a Instance Profile for the Spark cluster, it allows the spark job to access your data in the s3 bucket.

Add the Databricks access token to FlytePropeller:

kubectl edit secret -n flyte flyte-secret-auth

The configuration should look as follows:

apiVersion: v1
data:
  FLYTE_DATABRICKS_API_TOKEN: <ACCESS_TOKEN>
  client_secret: Zm9vYmFy
kind: Secret
metadata:
  annotations:
    meta.helm.sh/release-name: flyte
    meta.helm.sh/release-namespace: flyte
...

Where you need to replace <ACCESS_TOKEN> with your access token.

Upgrade the Flyte Helm release#

helm upgrade -n flyte -f https://raw.githubusercontent.com/flyteorg/flyte/master/charts/flyte-core/values-sandbox.yaml -f values-override.yaml flyteorg/flyte-core