Databricks#
Flyte backend can be connected with Databricks service. Once enabled it can allow you to submit a spark job to Databricks platform. This section will provide how to use the Databricks Plugin using flytekit python.
Installation#
The flytekit Databricks plugin is bundled into its Spark plugin, so to use, simply run the following:
pip install flytekitplugins-spark
How to Build Your Dockerfile for Spark on Databricks#
Using Spark on Databricks is extremely easy and provides full versioning using the custom-built Spark container. The built container can also execute regular Spark tasks.
For Spark, the image must use a base image built by Databricks and the workflow code must copy to /databricks/driver
1FROM databricksruntime/standard:11.3-LTS
2LABEL org.opencontainers.image.source=https://github.com/flyteorg/flytesnacks
3# To build this dockerfile, run "make docker_build".
4
5ENV VENV /opt/venv
6ENV LANG C.UTF-8
7ENV LC_ALL C.UTF-8
8ENV PYTHONPATH /databricks/driver
9ENV PATH="/databricks/python3/bin:$PATH"
10USER 0
11
12RUN sudo apt-get update && sudo apt-get install -y make build-essential libssl-dev git
13
14# Install custom package
15RUN /databricks/python3/bin/pip install awscli
16WORKDIR /opt
17RUN curl https://sdk.cloud.google.com > install.sh
18RUN bash /opt/install.sh --install-dir=/opt
19
20# Install Python dependencies
21COPY databricks/requirements.txt /databricks/driver/requirements.txt
22RUN /databricks/python3/bin/pip install -r /databricks/driver/requirements.txt
23
24WORKDIR /databricks/driver
25# Copy the makefile targets to expose on the container. This makes it easier to register.
26# Delete this after we update CI
27COPY databricks/in_container.mk /databricks/driver/Makefile
28
29# Delete this after we update CI to not serialize inside the container
30COPY databricks/sandbox.config /databricks/driver
31
32# Copy the actual code
33COPY databricks/ /databricks/driver/databricks/
34
35# This tag is supplied by the build script and will be used to determine the version
36# when registering tasks, workflows, and launch plans
37ARG tag
38ENV FLYTE_INTERNAL_IMAGE $tag
Configuring the backend to get Databricks plugin working#
Make sure to add “databricks” in
tasks.task-plugins.enabled-plugin
in enabled_plugins.yamlAdd Databricks access token to Flytepropeller. here to see more detail to create Databricks access token.
kubectl edit secret -n flyte flyte-propeller-auth
Configuration will be like below
apiVersion: v1
data:
FLYTE_DATABRICKS_API_TOKEN: <ACCESS_TOKEN>
kind: Secret
metadata:
annotations:
meta.helm.sh/release-name: flyte
meta.helm.sh/release-namespace: flyte
...