Using Multiple Kubernetes Clusters

Scaling Beyond Kubernetes

Tip

As described in the Architecture Overview, the Flyte Control Plane sends workflows off to the Data Plane for execution. The data plane fulfills these workflows by launching pods in Kubernetes.

Often, the total compute needs could exceed the limits of a single Kubernetes cluster. To address this, you can deploy the data plane to several isolated Kubernetes clusters. The control plane (FlyteAdmin) can be configured to load-balance workflows across these isolated data planes, protecting you from failure in a single Kubernetes cluster increasing scalability.

To achieve this, first, you have to create additional Kubernetes clusters. For now, let’s assume you have three Kubernetes clusters and that you can access them all with kubectl. Let’s call these clusters cluster1, cluster2, and cluster3.

Next, deploy just the data planes to these clusters. To do this, remove the data plane components from the flyte overlay, and create a new overlay containing only the data plane resources.

Data Plane Deployment

  • Add Flyteorg Helm repo

helm repo add flytyteorg https://flyteorg.github.io/flyte
helm repo update
# Get flyte-core helm chart
helm fetch --untar --untardir . flyteorg/flyte-core
cd flyte-core
  • Install Flyte data plane Helm chart

helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml -f values-gcp.yaml -f values-dataplane.yaml --create-namespace flyte --install
helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml -f values-aws.yaml -f values-dataplane.yaml --create-namespace flyte --install

User and Control Plane Deployment

Some Flyte deployments may choose to run the control plane separate from the data plane. FlyteAdmin is designed to create Kubernetes resources in one or more Flyte data plane clusters. For the admin to access remote clusters, it needs credentials to each cluster.

In Kubernetes, scoped service credentials are created by configuring a “Role” resource in a Kubernetes cluster. When you attach the role to a “ServiceAccount”, Kubernetes generates a bearer token that permits access. Hence, create a FlyteAdmin ServiceAccount in each data plane cluster to generate these tokens.

When you first create the FlyteAdmin ServiceAccount in a new cluster, a bearer token is generated and will continue to allow access unless the “ServiceAccount” is deleted. Hence, you should never delete a ServiceAccount ⚠️.

To feed the credentials to FlyteAdmin, you must retrieve them from your new data plane cluster and upload them to admin (for example, within Lyft, Confidant is used).

The credentials have two parts (“ca cert” and “bearer token”). Find the generated secret via,:

kubectl get secrets -n flyte | grep flyteadmin-token

Once you have the name of the secret, you can copy the ca cert to your clipboard using the following command:

kubectl get secret -n flyte {secret-name} -o jsonpath='{.data.ca\.crt}' | base64 -D | pbcopy

You can copy the bearer token to your clipboard using the following command:

kubectl get secret -n flyte {secret-name} -o jsonpath='{.data.token}' | base64 -D | pbcopy

Now these credentials need to be included in the control plane.

  • Create a new file named secrets.yaml that looks like:

    apiVersion: v1
    kind: Secret
    metadata:
      name: cluster_credentials
      namespace: flyte
    type: Opaque
    data:
      cluster_1_token: {{ cluster 1 token here }}
      cluster_1_cacert: {{ cluster 1 cacert here }}
      cluster_2_token: {{ cluster 2 token here }}
      cluster_2_cacert: {{ cluster 2 cacert here }}
      cluster_3_token: {{ cluster 3 token here }}
      cluster_3_cacert: {{ cluster 3 cacert here }}
    
  • Create cluster credentials secret in the control plane cluster.

kubectl apply -f secrets.yaml
  • Create a file named values-override.yaml and add the following config to it:

    flyteadmin:
      additionalVolumes:
      - name: cluster_credentials
        secret:
          secretName: cluster_credentials
      additionalVolumeMounts:
      - name: cluster_credentials
        mountPath: /var/run/credentials
    configmap:
      clusters:
       labelClusterMap:
         team1:
         - id: cluster_1
           weight: 1
         team2:
         - id: cluster_2
           weight: 0.5
         - id: cluster_3
           weight: 0.5
       clusterConfigs:
       - name: "cluster_1"
         endpoint: {{ your-cluster-1-kubeapi-endpoint.com }}
         enabled: true
         auth:
            type: "file_path"
            tokenPath: "/var/run/credentials/cluster_1_token"
            certPath: "/var/run/credentials/cluster_1_cacert"
       - name: "cluster_2"
         endpoint: {{ your-cluster-2-kubeapi-endpoint.com }}
         auth:
             enabled: true
             type: "file_path"
             tokenPath: "/var/run/credentials/cluster_2_token"
             certPath: "/var/run/credentials/cluster_2_cacert"
       - name: "cluster_3"
         endpoint: {{ your-cluster-3-kubeapi-endpoint.com }}
         enabled: true
         auth:
             type: "file_path"
             tokenPath: "/var/run/credentials/cluster_3_token"
             certPath: "/var/run/credentials/cluster_3_cacert"
    

    The configmap is used to schedule pods in different Kubernetes clusters, and hence, acts like a “load balancer”. team1 and team2 are the labels, where each label can schedule a pod on multiple clusters depending on the weight.

    configmap:
      labelClusterMap:
        team1:
          - id: cluster_1
            weight: 1
        team2:
          - id: cluster_2
            weight: 0.5
          - id: cluster_3
            weight: 0.5
    
  • Lastly, install the Flyte control plane Helm chart.

helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml -f values-aws.yaml -f values-controlplane.yaml -f values-override.yaml --create-namespace flyte --install
helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml -f values-gcp.yaml -f values-controlplane.yaml -f values-override.yaml --create-namespace flyte --install

Configure Execution Cluster Labels

The next step is to configure project-domain or workflow to schedule on a specific Kubernetes cluster, for which the correct label needs to be added.

  • Get execution cluster label of the project and domain

flytectl get execution-cluster-label -p flytesnacks -d development --attrFile ecl.yaml
  • Update the label in ecl.yaml

    domain: development
    project: flytesnacks
    value: team1
    
  • Get execution cluster label of the project and domain

flytectl get execution-cluster-label -p flytesnacks -d development core.control_flow.run_merge_sort.merge_sort --attrFile ecl.yaml
  • Update the label in ecl.yaml

    domain: development
    project: flytesnacks
    workflow: core.control_flow.run_merge_sort.merge_sort
    value: team1
    
  • Lastly, update the execution cluster label.

flytectl update execution-cluster-label --attrFile ecl.yaml

With this, the execution of workflows belonging to a specific project-domain or a single workflow will be scheduled on the target label cluster.