Contributing code#

๐Ÿงฑ Component reference#

To understand how the below components interact with each other, refer to Understand the lifecycle of a workflow.

Note

With the exception of flytekit, the below components are maintained in the flyte monorepo.

Dependency graph between various flyteorg repos

The dependency graph between various flyte repos#

flyte#

Repo

Purpose: Deployment, Documentation, and Issues

Languages: RST

flyteidl#

Repo

Purpose: Flyte workflow specification is in protocol buffers which forms the core of Flyte

Language: Protobuf

Guidelines: Refer to the README

flytepropeller#

Repo | Code Reference

Purpose: Kubernetes-native operator

Language: Go

Guidelines:

  • Check for Makefile in the root repo

  • Run the following commands:
    • make generate

    • make test_unit

    • make lint

  • To compile, run make compile

flyteadmin#

Repo | Code Reference

Purpose: Control Plane

Language: Go

Guidelines:

  • Check for Makefile in the root repo

  • If the service code has to be tested, run it locally:
    • make compile

    • make server

  • To seed data locally:
    • make compile

    • make seed_projects

    • make migrate

  • To run integration tests locally:
    • make integration

    • (or to run in containerized dockernetes): make k8s_integration

flytekit#

Repo

Purpose: Python SDK & Tools

Language: Python

Guidelines: Refer to the Flytekit Contribution Guide

flyteconsole#

Repo

Purpose: Admin Console

Language: Typescript

Guidelines: Refer to the README

datacatalog#

Repo | Code Reference

Purpose: Manage Input & Output Artifacts

Language: Go

flyteplugins#

Repo | Code Reference

Purpose: Flyte Plugins

Language: Go

Guidelines:

  • Check for Makefile in the root repo

  • Run the following commands:
    • make generate

    • make test_unit

    • make lint

flytestdlib#

Repo

Purpose: Standard Library for Shared Components

Language: Go

flytectl#

Repo

Purpose: A standalone Flyte CLI

Language: Go

Guidelines: Refer to the FlyteCTL Contribution Guide

๐Ÿ”ฎ Development Environment Setup Guide#

This guide provides a step-by-step approach to setting up a local development environment for flyteidl, flyteadmin, flyteplugins, flytepropeller, flytekit , flyteconsole, datacatalog, and flytestdlib.

The video below is a tutorial on how to set up a local development environment for Flyte.

Requirements#

This guide has been tested and used on AWS EC2 with an Ubuntu 22.04 image. The following tools are required:

Content#

How to setup dev environment for flyteidl, flyteadmin, flyteplugins, flytepropeller, datacatalog and flytestdlib?#

1. Install flytectl

Flytectl is a portable and lightweight command-line interface to work with Flyte.

# Step 1: Install the latest version of flytectl
curl -sL https://ctl.flyte.org/install | bash
# flyteorg/flytectl info checking GitHub for latest tag
# flyteorg/flytectl info found version: 0.6.39 for v0.6.39/Linux/x86_64
# flyteorg/flytectl info installed ./bin/flytectl

# Step 2: Export flytectl path based on the previous log "flyteorg/flytectl info installed ./bin/flytectl"
export PATH=$PATH:/home/ubuntu/bin # replace with your path

2. Build a k3s cluster that runs minio and postgres Pods.

Minio is an S3-compatible object store that will be used later to store task output, input, etc.
Postgres is an open-source object-relational database that will later be used by flyteadmin/dataCatalog to store all Flyte information.
# Step 1: Start k3s cluster, create Pods for postgres and minio. Note: We cannot access Flyte UI yet! but we can access the minio console now.
flytectl demo start --dev
# ๐Ÿ‘จโ€๐Ÿ’ป Flyte is ready! Flyte UI is available at http://localhost:30080/console ๐Ÿš€ ๐Ÿš€ ๐ŸŽ‰
# โ‡๏ธ Run the following command to export demo environment variables for accessing flytectl
#         export FLYTECTL_CONFIG=/home/ubuntu/.flyte/config-sandbox.yaml
# ๐Ÿ‹ Flyte sandbox ships with a Docker registry. Tag and push custom workflow images to localhost:30000
# ๐Ÿ“‚ The Minio API is hosted on localhost:30002. Use http://localhost:30080/minio/login for Minio console

# Step 2: Export FLYTECTL_CONFIG as the previous log indicated.
FLYTECTL_CONFIG=/home/ubuntu/.flyte/config-sandbox.yaml

# Step 3: The kubeconfig will be automatically copied to the user's main kubeconfig (default is `/.kube/config`) with "flyte-sandbox" as the context name.
# Check that we can access the K3s cluster. Verify that postgres and minio are running.
kubectl get pod -n flyte
# NAME                                                  READY   STATUS    RESTARTS   AGE
# flyte-sandbox-docker-registry-85745c899d-dns8q        1/1     Running   0          5m
# flyte-sandbox-kubernetes-dashboard-6757db879c-wl4wd   1/1     Running   0          5m
# flyte-sandbox-proxy-d95874857-2wc5n                   1/1     Running   0          5m
# flyte-sandbox-minio-645c8ddf7c-sp6cc                  1/1     Running   0          5m
# flyte-sandbox-postgresql-0                            1/1     Running   0          5m

3. Run all Flyte components (flyteadmin, flytepropeller, datacatalog, flyteconsole, etc) in a single binary.

The Flyte repository includes Go code that integrates all Flyte components into a single binary.

# Step 1: Clone flyte repo
git clone https://github.com/flyteorg/flyte.git
cd flyte

# Step 2: Build a single binary that bundles all the Flyte components.
# The version of each component/library used to build the single binary are defined in `go.mod`.
sudo apt-get -y install jq # You may need to install jq
make clean # (Optional) Run this only if you want to run the newest version of flyteconsole
make go-tidy
make compile

# Step 3: Prepare a namespace template for the cluster resource controller.
# The configuration file "flyte-single-binary-local.yaml" has an entry named cluster_resources.templatePath.
# This entry needs to direct to a directory containing the templates for the cluster resource controller to use.
# We will now create a simple template that allows the automatic creation of required namespaces for projects.
# For example, with Flyte's default project "flytesnacks", the controller will auto-create the following namespaces:
# flytesnacks-staging, flytesnacks-development, and flytesnacks-production.
mkdir $HOME/.flyte/sandbox/cluster-resource-templates/
echo "apiVersion: v1
kind: Namespace
metadata:
  name: '{{ namespace }}'" > $HOME/.flyte/sandbox/cluster-resource-templates/namespace.yaml

# Step 4: Running the single binary.
# The POD_NAMESPACE environment variable is necessary for the webhook to function correctly.
# You may encounter an error due to `ERROR: duplicate key value violates unique constraint`. Running the command again will solve the problem.
POD_NAMESPACE=flyte flyte start --config flyte-single-binary-local.yaml
# All logs from flyteadmin, flyteplugins, flytepropeller, etc. will appear in the terminal.

4. Build single binary with your own code.

The following instructions provide guidance on how to build single binary with your customized code under the flyteadmin as an example.

  • Note Although weโ€™ll use flyteadmin as an example, these steps can be applied to other Flyte components or libraries as well. {flyteadmin} below can be substituted with other Flyte components/libraries: flyteidl, flyteplugins, flytepropeller, datacatalog, or flytestdlib.

  • Note If you want to learn how flyte compiles those components and replace the repositories, you can study how go mod edit works.

# Step 1: Install Go. Flyte uses Go 1.19, so make sure to switch to Go 1.19.
export PATH=$PATH:$(go env GOPATH)/bin
go install golang.org/dl/go1.19@latest
go1.19 download
export GOROOT=$(go1.19 env GOROOT)
export PATH="$GOROOT/bin:$PATH"

# You may need to install goimports to fix lint errors.
# Refer to https://pkg.go.dev/golang.org/x/tools/cmd/goimports
go install golang.org/x/tools/cmd/goimports@latest
export PATH=$(go env GOPATH)/bin:$PATH

# Step 2: Go to the {flyteadmin} repository, modify the source code accordingly.
cd flyte/flyteadmin

# Step 3: Now, you can build the single binary. Go back to Flyte directory.
make go-tidy
make compile
POD_NAMESPACE=flyte flyte start --config flyte-single-binary-local.yaml

5. Test by running a hello world workflow.

# Step 1: Install flytekit
pip install flytekit && export PATH=$PATH:/home/ubuntu/.local/bin

# Step 2: Run a hello world example
pyflyte run --remote https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/hello_world.py  hello_world_wf
# Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/fd63f88a55fed4bba846 to see execution in the console.
# You can go to the [flytesnacks repository](https://github.com/flyteorg/flytesnacks) to see more useful examples.

6. Tear down the k3s cluster after finishing developing.

flytectl demo teardown
# context removed for "flyte-sandbox".
# ๐Ÿงน ๐Ÿงน Sandbox cluster is removed successfully.
# โ‡๏ธ Run the following command to unset sandbox environment variables for accessing flytectl
#        unset FLYTECTL_CONFIG

How to setup dev environment for flytekit?#

1. Set up local Flyte Cluster.

If you are also modifying the code for flyteidl, flyteadmin, flyteplugins, flytepropeller datacatalog, or flytestdlib, refer to the instructions in the previous section to set up a local Flyte cluster.

If not, we can start backends with a single command.

# Step 1: Install the latest version of flytectl, a portable and lightweight command-line interface to work with Flyte.
curl -sL https://ctl.flyte.org/install | bash
# flyteorg/flytectl info checking GitHub for latest tag
# flyteorg/flytectl info found version: 0.6.39 for v0.6.39/Linux/x86_64
# flyteorg/flytectl info installed ./bin/flytectl

# Step 2: Export flytectl path based on the previous log "flyteorg/flytectl info installed ./bin/flytectl"
export PATH=$PATH:/home/ubuntu/bin # replace with your path

# Step 3: Starts the Flyte demo cluster. This will setup a k3s cluster running minio, postgres Pods, and all Flyte components: flyteadmin, flyteplugins, flytepropeller, etc.
# See https://docs.flyte.org/en/latest/flytectl/gen/flytectl_demo_start.html for more details.
flytectl demo start
# ๐Ÿ‘จโ€๐Ÿ’ป Flyte is ready! Flyte UI is available at http://localhost:30080/console ๐Ÿš€ ๐Ÿš€ ๐ŸŽ‰
# โ‡๏ธ Run the following command to export demo environment variables for accessing flytectl
#         export FLYTECTL_CONFIG=/home/ubuntu/.flyte/config-sandbox.yaml
# ๐Ÿ‹ Flyte sandbox ships with a Docker registry. Tag and push custom workflow images to localhost:30000
# ๐Ÿ“‚ The Minio API is hosted on localhost:30002. Use http://localhost:30080/minio/login for Minio console

2. Run workflow locally.

# Step 1: Build a virtual environment for developing Flytekit. This will allow your local changes to take effect when the same Python interpreter runs `import flytekit`.
git clone https://github.com/flyteorg/flytekit.git # replace with your own repo
cd flytekit
virtualenv ~/.virtualenvs/flytekit
source ~/.virtualenvs/flytekit/bin/activate
make setup
pip install -e .

# If you are also developing the plugins, consider the following:

# Installing Specific Plugins:
# If you wish to only use few plugins, you can install them individually.
# Take [Flytekit BigQuery Plugin](https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-bigquery#flytekit-bigquery-plugin) for example:
# You have to go to the bigquery plugin folder and install it.
cd plugins/flytekit-bigquery/
pip install -e .
# Now you can use the bigquery plugin, and the performance is fast.

# (Optional) Installing All Plugins:
# If you wish to install all available plugins, you can execute the command below.
# However, it's not typically recommended because the current version of plugins does not support
# lazy loading. This can lead to a slowdown in the performance of your Python engine.
cd plugins
pip install -e .
# Now you can use all plugins, but the performance is slow.

# Step 2: Modify the source code for flytekit, then run unit tests and lint.
make lint
make test

# Step 3: Run a hello world sample to test locally
pyflyte run https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/hello_world.py hello_world_wf
# Running hello_world_wf() hello world

3. Run workflow in sandbox.

Before running your workflow in the sandbox, make sure youโ€™re able to successfully run it locally. To deploy the workflow in the sandbox, youโ€™ll need to build a Flytekit image. Create a Dockerfile in your Flytekit directory with the minimum required configuration to run a task, as shown below. If your task requires additional components, such as plugins, you may find it useful to refer to the construction of the official flytekit image

FROM python:3.9-slim-buster
USER root
WORKDIR /root
ENV PYTHONPATH /root
RUN apt-get update && apt-get install build-essential -y
RUN apt-get install git -y
# The following line is an example of how to install your modified plugins. In this case, it demonstrates how to install the 'deck' plugin.
# RUN pip install -U git+https://github.com/Yicheng-Lu-llll/flytekit.git@"demo#egg=flytekitplugins-deck-standard&subdirectory=plugins/flytekit-deck-standard" # replace with your own repo and branch
RUN pip install -U git+https://github.com/Yicheng-Lu-llll/flytekit.git@demo # replace with your own repo and branch
ENV FLYTE_INTERNAL_IMAGE "localhost:30000/flytekit:demo" # replace with your own image name and tag

The instructions below explain how to build the image, push the image to the Flyte cluster, and finally submit the workflow.

# Step 1: Ensure you have pushed your changes to the remote repo
# In the flytekit folder
git add . && git commit -s -m "develop" && git push

# Step 2: Build the image
# In the flytekit folder
export FLYTE_INTERNAL_IMAGE="localhost:30000/flytekit:demo" # replace with your own image name and tag
docker build --no-cache -t  "${FLYTE_INTERNAL_IMAGE}" -f ./Dockerfile .

# Step 3: Push the image to the Flyte cluster
docker push ${FLYTE_INTERNAL_IMAGE}

# Step 4: Submit a hello world workflow to the Flyte cluster
cd flytesnacks
pyflyte run --image ${FLYTE_INTERNAL_IMAGE} --remote https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/hello_world.py hello_world_wf
# Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/f5c17e1b5640c4336bf8 to see execution in the console.

How to setup dev environment for flyteconsole?#

1. Set up local Flyte cluster.

Depending on your needs, refer to one of the following guides to setup up the Flyte cluster:

2. Start flyteconsole.

# Step 1: Clone the repo and navigate to the Flyteconsole folder
git clone https://github.com/flyteorg/flyteconsole.git
cd flyteconsole

# Step 2: Install Node.js 18. Refer to https://github.com/nodesource/distributions/blob/master/README.md#using-ubuntu-2.
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - &&\
sudo apt-get install -y nodejs

# Step 3: Install yarn. Refer to https://classic.yarnpkg.com/lang/en/docs/install/#debian-stable.
curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt update && sudo apt install yarn

# Step 4: Add environment variables
export BASE_URL=/console
export ADMIN_API_URL=http://localhost:30080
export DISABLE_AUTH=1
export ADMIN_API_USE_SSL="http"

# Step 5: Generate SSL certificate
# Note, since we will use HTTP, SSL is not required. However, missing an SSL certificate will cause an error when starting Flyteconsole.
make generate_ssl

# Step 6: Install node packages
yarn install
yarn build:types # It is fine if seeing error `Property 'at' does not exist on type 'string[]'`
yarn run build:prod

# Step 7: Start flyteconsole
yarn start

3. Install the Chrome plugin: Moesif Origin & CORS Changer.

We need to disable CORS to load resources.

1. Activate plugin (toggle to "on")
2. Open 'Advanced Settings':
3. set Access-Control-Allow-Credentials: true

4. Go to http://localhost:3000/console/.

How to access Flyte UI, minio, postgres, k3s, and endpoints?#

This section presumes a local Flyte cluster is already setup. If it isnโ€™t, refer to either:

1. Access the Flyte UI.

Flyte UI is a web-based user interface for Flyte that lets you interact with Flyte objects and build directed acyclic graphs (DAGs) for your workflows.

You can access it via http://localhost:30080/console.

2. Access the minio console.

Core Flyte components, such as admin, propeller, and datacatalog, as well as user runtime containers rely on an object store (in this case, minio) to hold files. During development, you might need to examine files such as input.pb/output.pb, or deck.html stored in minio.

Access the minio console at: http://localhost:30080/minio/login. The default credentials are:

  • Username: minio

  • Password: miniostorage

3. Access the postgres.

FlyteAdmin and datacatalog use postgres to store persistent records, and you can interact with postgres on port 30001. Here is an example of using psql to connect:

# Step 1: Install the PostgreSQL client.
sudo apt-get update
sudo apt-get install postgresql-client

# Step 2: Connect to the PostgreSQL server. The password is "postgres".
psql -h localhost -p 30001 -U postgres -d flyte

4. Access the k3s dashboard.

Access the k3s dashboard at: http://localhost:30080/kubernetes-dashboard.

5. Access the endpoints.

Service endpoints are defined in the flyteidl repository under the service directory. You can browse them at here.

For example, the endpoint for the ListTaskExecutions API is:

/api/v1/task_executions/{node_execution_id.execution_id.project}/{node_execution_id.execution_id.domain}/{node_execution_id.execution_id.name}/{node_execution_id.node_id}

You can access this endpoint at:

# replace with your specific task execution parameters
http://localhost:30080/api/v1/task_executions/flytesnacks/development/fe92c0a8cbf684ad19a8/n0?limit=10000