Databricks Plugin Setup#

This guide gives an overview of how to set up Databricks in your Flyte deployment.

  1. Add Flyte chart repo to Helm

helm repo add flyteorg https://flyteorg.github.io/flyte
  1. Setup the cluster

  • Start the sandbox cluster

    flytectl sandbox start
    
  • Generate Flytectl sandbox config

    flytectl config init
    
  • Make sure you have up and running flyte cluster in AWS / GCP

  • Make sure you have correct kubeconfig and selected the correct kubernetes context

  • make sure you have the correct flytectl config at ~/.flyte/config.yaml

  1. Upload an entrypoint.py to dbfs or s3. Spark driver node run this file to override the default command in the dbx job.

  2. Create a file named values-override.yaml and add the following config to it:

configmap:
  enabled_plugins:
    # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig)
    tasks:
      # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig)
      task-plugins:
        # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend
        # plugins
        enabled-plugins:
          - container
          - sidecar
          - k8s-array
          - databricks
        default-for-task-types:
          container: container
          sidecar: sidecar
          container_array: k8s-array
          spark: databricks
databricks:
  enabled: True
  plugin_config:
    plugins:
      databricks:
        entrypointFile: dbfs:///FileStore/tables/entrypoint.py
        databricksInstance: dbc-a53b7a3c-614c
  1. Create a Databricks account and follow the docs for creating an Access token.

  2. Create a Instance Profile for the Spark cluster, it allows the spark job to access your data in the s3 bucket.

  3. Add Databricks access token to FlytePropeller.

Note

Refer to the access token to understand setting up the Databricks access token.

kubectl edit secret -n flyte flyte-secret-auth

The configuration will look as follows:

apiVersion: v1
data:
  FLYTE_DATABRICKS_API_TOKEN: <ACCESS_TOKEN>
  client_secret: Zm9vYmFy
kind: Secret
metadata:
  annotations:
    meta.helm.sh/release-name: flyte
    meta.helm.sh/release-namespace: flyte
...

Replace <ACCESS_TOKEN> with your access token.

  1. Upgrade the Flyte Helm release.

helm upgrade -n flyte -f https://raw.githubusercontent.com/flyteorg/flyte/master/charts/flyte-core/values-sandbox.yaml -f values-override.yaml flyteorg/flyte-core