Using Multiple Kubernetes Clusters#
Scaling Beyond Kubernetes#
Tip
As described in the Architecture Overview, the Flyte Control Plane
sends workflows off to the Data Plane
for execution. The data plane fulfills these workflows by launching pods in Kubernetes.
Often, the total compute needs could exceed the limits of a single Kubernetes cluster. To address this, you can deploy the data plane to several isolated Kubernetes clusters. The control plane (FlyteAdmin) can be configured to load-balance workflows across these isolated data planes, protecting you from failure in a single Kubernetes cluster increasing scalability.
To achieve this, first, you have to create additional Kubernetes clusters.
For now, let’s assume you have three Kubernetes clusters and that you can access them all with kubectl
.
Let’s call these clusters cluster1
, cluster2
, and cluster3
.
Next, deploy just the data planes to these clusters. To do this, remove the data plane components from the flyte overlay, and create a new overlay containing only the data plane resources.
Data Plane Deployment#
Add Flyteorg Helm repo
helm repo add flytyteorg https://flyteorg.github.io/flyte
helm repo update
# Get flyte-core helm chart
helm fetch --untar --untardir . flyteorg/flyte-core
cd flyte-core
Install Flyte data plane Helm chart
helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml -f values-eks.yaml -f values-dataplane.yaml --create-namespace flyte --install
helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml -f values-gcp.yaml -f values-dataplane.yaml --create-namespace flyte --install
User and Control Plane Deployment#
Some Flyte deployments may choose to run the control plane separate from the data plane. FlyteAdmin is designed to create Kubernetes resources in one or more Flyte data plane clusters. For the admin to access remote clusters, it needs credentials to each cluster.
In Kubernetes, scoped service credentials are created by configuring a “Role” resource in a Kubernetes cluster. When you attach the role to a “ServiceAccount”, Kubernetes generates a bearer token that permits access. Hence, create a FlyteAdmin ServiceAccount in each data plane cluster to generate these tokens.
When you first create the FlyteAdmin ServiceAccount
in a new cluster, a bearer token is generated and will continue to allow access unless the “ServiceAccount” is deleted.
Hence, you should never delete a ServiceAccount
⚠️.
To feed the credentials to FlyteAdmin, you must retrieve them from your new data plane cluster and upload them to admin (for example, within Lyft, Confidant is used).
The credentials have two parts (“ca cert” and “bearer token”). Find the generated secret via,:
kubectl get secrets -n flyte | grep flyteadmin-token
Once you have the name of the secret, you can copy the ca cert
to your clipboard using the following command:
kubectl get secret -n flyte {secret-name} -o jsonpath='{.data.ca\.crt}' | base64 -D | pbcopy
You can copy the bearer token to your clipboard using the following command:
kubectl get secret -n flyte {secret-name} -o jsonpath='{.data.token}' | base64 -D | pbcopy
Now these credentials need to be included in the control plane.
Create a new file named
secrets.yaml
that looks like:apiVersion: v1 kind: Secret metadata: name: cluster-credentials namespace: flyte type: Opaque data: cluster_1_token: {{ cluster 1 token here }} cluster_1_cacert: {{ cluster 1 cacert here }} cluster_2_token: {{ cluster 2 token here }} cluster_2_cacert: {{ cluster 2 cacert here }} cluster_3_token: {{ cluster 3 token here }} cluster_3_cacert: {{ cluster 3 cacert here }}
Create cluster credentials secret in the control plane cluster.
kubectl apply -f secrets.yaml
Create a file named
values-override.yaml
and add the following config to it:flyteadmin: additionalVolumes: - name: cluster-credentials secret: secretName: cluster-credentials additionalVolumeMounts: - name: cluster-credentials mountPath: /var/run/credentials configmap: clusters: labelClusterMap: team1: - id: cluster_1 weight: 1 team2: - id: cluster_2 weight: 0.5 - id: cluster_3 weight: 0.5 clusterConfigs: - name: "cluster_1" endpoint: {{ your-cluster-1-kubeapi-endpoint.com }} enabled: true auth: type: "file_path" tokenPath: "/var/run/credentials/cluster_1_token" certPath: "/var/run/credentials/cluster_1_cacert" - name: "cluster_2" endpoint: {{ your-cluster-2-kubeapi-endpoint.com }} enabled: true auth: type: "file_path" tokenPath: "/var/run/credentials/cluster_2_token" certPath: "/var/run/credentials/cluster_2_cacert" - name: "cluster_3" endpoint: {{ your-cluster-3-kubeapi-endpoint.com }} enabled: true auth: type: "file_path" tokenPath: "/var/run/credentials/cluster_3_token" certPath: "/var/run/credentials/cluster_3_cacert"
The
configmap
is used to schedule pods in different Kubernetes clusters, and hence, acts like a “load balancer”.team1
andteam2
are the labels, where each label can schedule a pod on multiple clusters depending on the weight.configmap: labelClusterMap: team1: - id: cluster_1 weight: 1 team2: - id: cluster_2 weight: 0.5 - id: cluster_3 weight: 0.5
Lastly, install the Flyte control plane Helm chart.
helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml -f values-aws.yaml -f values-controlplane.yaml -f values-override.yaml --create-namespace flyte --install
helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml -f values-gcp.yaml -f values-controlplane.yaml -f values-override.yaml --create-namespace flyte --install
Configure Execution Cluster Labels#
The next step is to configure project-domain or workflow to schedule on a specific Kubernetes cluster, for which the correct label needs to be added.
Get execution cluster label of the project and domain
flytectl get execution-cluster-label -p flytesnacks -d development --attrFile ecl.yaml
Update the label in ecl.yaml
domain: development project: flytesnacks value: team1
Get execution cluster label of the project and domain
flytectl get execution-cluster-label -p flytesnacks -d development core.control_flow.run_merge_sort.merge_sort --attrFile ecl.yaml
Update the label in ecl.yaml
domain: development project: flytesnacks workflow: core.control_flow.run_merge_sort.merge_sort value: team1
Lastly, update the execution cluster label.
flytectl update execution-cluster-label --attrFile ecl.yaml
With this, the execution of workflows belonging to a specific project-domain or a single workflow will be scheduled on the target label cluster.