GCP (GKE) Setup#
Flyte Deployment - Manual GCE/GKE Deployment#
This guide helps you set up Flyte from scratch, on GCE, without using an automated approach. It details step-by-step instructions to go from a bare GCE account to a fully functioning Flyte deployment that members of your company can use.
Prerequisites#
Access to GCE console
A domain name for the Flyte installation like flyte.example.org that allows you to set a DNS A record.
Before you begin, please ensure that you have the following tools installed:
Initialize Gcloud#
Authorize Gcloud sdk to access GCP using your credentials, setup the config for the existing project, and optionally set the default compute zone. Init
gcloud init
Create an Organization (Optional)#
This step is optional if you already have an organization linked to a billing account. Use the following docs to understand the organization creation process in Google cloud: Organization Management.
Get the organization ID to be used for creating the project. Billing should be linked with the organization so that all projects under the org use the same billing account.
gcloud organizations list
Sample output .. code-block:
DISPLAY_NAME ID DIRECTORY_CUSTOMER_ID
example-org 123456789999 C02ewszsz
<id> = ID column value for the organization
export ORG_ID=<id>
Create a GCE Project#
export PROJECT_ID=<my-project>
gcloud projects create $PROJECT_ID --organization $ORG_ID
Of course you can also use an existing project if your account has appropriate permissions to create the required resources.
Set project <my-project> as the default in gcloud or use gcloud init to set this default:
gcloud config set project ${PROJECT_ID}
We assume that <my-project> has been set as the default for all gcloud commands below.
Permissions#
Configure workload identity for Flyte namespace service accounts. This creates the GSA’s which would be mapped to the KSA (kubernetes service account) through annotations and used for authorizing the pods access to the Google cloud services.
Create a GSA for flyteadmin
gcloud iam service-accounts create gsa-flyteadmin
Create a GSA for flytescheduler
gcloud iam service-accounts create gsa-flytescheduler
Create a GSA for datacatalog
gcloud iam service-accounts create gsa-datacatalog
Create a GSA for flytepropeller
gcloud iam service-accounts create gsa-flytepropeller
Create a GSA for cluster resource manager
Production
gcloud iam service-accounts create gsa-production
Staging
gcloud iam service-accounts create gsa-staging
Development
gcloud iam service-accounts create gsa-development
- Create a new role DataCatalogRole with following permissions
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.update
storage.objects.get
- Create a new role FlyteAdminRole with following permissions
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.getIamPolicy
storage.objects.update
iam.serviceAccounts.signBlob
- Create a new role FlyteSchedulerRole with following permissions
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.getIamPolicy
storage.objects.update
- Create a new role FlytePropellerRole with following permissions
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.getIamPolicy
storage.objects.update
- Create a new role FlyteWorkflowRole with following permissions
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list
storage.objects.update
Refer the following role page for more details.
Add IAM policy binding for flyteadmin GSA using FlyteAdminRole.
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member "serviceAccount:gsa-flyteadmin@${PROJECT_ID}.iam.gserviceaccount.com" --role "projects/${PROJECT_ID}/roles/FlyteAdminRole"
Add IAM policy binding for flytescheduler GSA using FlyteSchedulerRole.
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member "serviceAccount:gsa-flytescheduler@${PROJECT_ID}.iam.gserviceaccount.com" --role "projects/${PROJECT_ID}/roles/FlyteSchedulerRole"
Add IAM policy binding for datacatalog GSA using DataCatalogRole.
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member "serviceAccount:gsa-datacatalog@${PROJECT_ID}.iam.gserviceaccount.com" --role "projects/${PROJECT_ID}/roles/DataCatalogRole"
Add IAM policy binding for flytepropeller GSA using FlytePropellerRole.
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member "serviceAccount:gsa-flytepropeller@${PROJECT_ID}.iam.gserviceaccount.com" --role "projects/${PROJECT_ID}/roles/FlytePropellerRole"
Add IAM policy binding for cluster resource manager GSA using FlyteWorkflowRole.
Production
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member "serviceAccount:gsa-production@${PROJECT_ID}.iam.gserviceaccount.com" --role "projects/${PROJECT_ID}/roles/FlyteWorkflowRole"
Staging
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member "serviceAccount:gsa-staging@${PROJECT_ID}.iam.gserviceaccount.com" --role "projects/${PROJECT_ID}/roles/FlyteWorkflowRole"
Development
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member "serviceAccount:gsa-development@${PROJECT_ID}.iam.gserviceaccount.com" --role "projects/${PROJECT_ID}/roles/FlyteWorkflowRole"
Allow the Kubernetes service account to impersonate the Google service account by creating an IAM policy binding the two. This binding allows the Kubernetes Service account to act as the Google service account.
flyteadmin
gcloud iam service-accounts add-iam-policy-binding --role "roles/iam.workloadIdentityUser" --member "serviceAccount:${PROJECT_ID}.svc.id.goog[flyte/flyteadmin]" gsa-flyteadmin@${PROJECT_ID}.iam.gserviceaccount.com
flytepropeller
gcloud iam service-accounts add-iam-policy-binding --role "roles/iam.workloadIdentityUser" --member "serviceAccount:${PROJECT_ID}.svc.id.goog[flyte/flytepropeller]" gsa-flytepropeller@${PROJECT_ID}.iam.gserviceaccount.com
datacatalog
gcloud iam service-accounts add-iam-policy-binding --role "roles/iam.workloadIdentityUser" --member "serviceAccount:${PROJECT_ID}.svc.id.goog[flyte/datacatalog]" gsa-datacatalog@${PROJECT_ID}.iam.gserviceaccount.com
Cluster Resource Manager
We create binding for production, staging and development domains for the Flyte workflows to use.
Production
gcloud iam service-accounts add-iam-policy-binding --role "roles/iam.workloadIdentityUser" --member "serviceAccount:${PROJECT_ID}.svc.id.goog[production/default]" gsa-production@${PROJECT_ID}.iam.gserviceaccount.com
Staging
gcloud iam service-accounts add-iam-policy-binding --role "roles/iam.workloadIdentityUser" --member "serviceAccount:${PROJECT_ID}.svc.id.goog[staging/default]" gsa-staging@${PROJECT_ID}.iam.gserviceaccount.com
Development
gcloud iam service-accounts add-iam-policy-binding --role "roles/iam.workloadIdentityUser" --member "serviceAccount:${PROJECT_ID}.svc.id.goog[development/default]" gsa-development@${PROJECT_ID}.iam.gserviceaccount.com
Create a GKE Cluster#
Create a GKE cluster with VPC-native networking and workload identity enabled. You can enable the GKE workload identity by adding the below lines to this Dockerfile.
FROM ghcr.io/flyteorg/flytekit:py3.8-1.0.3
# Required for gsutil to work with workload-identity
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
Adding the above specified lines (boto.cfg
configuration) to the Dockerfile also authenticates standalone gsutil
. This way, the pod starts without any hiccups.
Navigate to the gcloud console and Kubernetes Engine tab to start creating the k8s cluster.
Ensure that VPC native traffic routing is enabled under Security enable Workload identity and use project default pool which would be ${PROJECT_ID}.svc.id.goog.
It is recommended to create it from the console. This is to make sure the options of VPC-native networking and Workload identity are enabled correctly. There are multiple commands needed to achieve this. If you create it through the console, it will take care of creating and configuring the right resources:
gcloud container clusters create <my-flyte-cluster> \
--workload-pool=${PROJECT_ID}.svc.id.goog
--region us-west1 \
--num-nodes 6
Create the GKE Context#
Initialize your kubecontext to point to GKE cluster using the following command:
gcloud container clusters get-credentials <my-flyte-cluster>
Verify by creating a test namespace:
kubectl create ns test
Create a Cloud SQL Database#
Next, create a relational Cloud SQL for PostgreSQL database. This database will be used by both the primary control plane service (FlyteAdmin) and the Flyte memoization service (Data Catalog). Follow this link to create the cloud SQL instance.
Select PostgreSQL
Provide an Instance ID
Provide a password for the instance <DB_INSTANCE_PASSWD>
Use PostgresSQL13 or higher
Select the Zone based on your availability requirements.
Select customize your instance and enable Private IP in the Connections tab. This is required for private communication between the GKE apps and cloud SQL instance. Follow the steps to create the private connection (default).
Create the SQL instance.
After creation of the instance get the private IP of the database <CLOUD-SQL-IP>.
Create flyteadmin database and flyteadmin user accounts on that instance with <DBPASSWORD>.
- Verify connectivity to the DB from the GKE cluster.
Create a testdb namespace:
kubectl create ns test
Verify the connectivity using a postgres client:
kubectl run pgsql-postgresql-client --rm --tty -i --restart='Never' --namespace testdb --image docker.io/bitnami/postgresql:11.7.0-debian-10-r9 --env="PGPASSWORD=<DBPASSWORD>" --command -- psql testdb --host <CLOUD-SQL-IP> -U flyteadmin -d flyteadmin -p 5432
It is recommended to create it from the console. This is to make sure the private IP connectivity works correctly with the cloud SQL instance. There are multiple commands needed to achieve this. If you create it through the console, it will take care of creating and configuring the right resources.
gcloud sql instances create <my-flyte-db> \
--database-version=POSTGRES_13 \
--cpu=1 \
--memory=3840MB \
--region=us-west1
SSL Certificate#
In order to use SSL (which we need to use gRPC clients), we will need to create an SSL certificate. We use Google-managed SSL certificates.
Save the following certificate resource definition as flyte-certificate.yaml:
apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
name: flyte-certificate
spec:
domains:
- flyte.example.org
Then apply it to your cluster:
kubectl apply -f flyte-certificate.yaml
Note
ManagedCertificate will only work with GKE ingress, For other ingress please use cert-manager
For nginx ingress please use the certificate manager:
Install the cert manager
helm install cert-manager --namespace flyte --version v0.12.0 jetstack/cert-manager
Create cert issuer
apiVersion: cert-manager.io/v1alpha2
kind: Issuer
metadata:
name: letsencrypt-production
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: issue-email-id
privateKeySecretRef:
name: letsencrypt-production
solvers:
- selector: {}
http01:
ingress:
class: nginx
Ingress#
Add the ingress repo
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
Install the nginx-ingress
helm install nginx-ingress ingress-nginx/ingress-nginx
Create the GCS Bucket#
Create <BUCKETNAME> with uniform access:
gsutil mb -b on -l us-west1 gs://<BUCKETNAME>/
Add access permission for the following principals:
Time for Helm#
Installing Flyte#
Add the Flyte helm repo
helm repo add flyteorg https://flyteorg.github.io/flyte
Download the GCP values file
curl https://raw.githubusercontent.com/flyteorg/flyte/master/charts/flyte-core/values-gcp.yaml >values-gcp.yaml
Update values
<RELEASE-NAME> to be used as prefix for ssl certificate secretName
<PROJECT_ID> of your GCP project
<CLOUD-SQL-IP> private IP of cloud sql instance
<DBPASSWORD> of the flyteadmin user created for the cloud sql instance
<BUCKETNAME> of the GCS bucket created
<HOSTNAME> to the flyte FQDN (e.g. flyte.example.org)
(Optional) Configure Flyte project and domain
To restrict projects, update Helm values. By default, Flyte creates three projects: Flytesnacks, Flytetester, and Flyteexample.
# you can define the number of projects as per your need
flyteadmin:
initialProjects:
- flytesnacks
- flytetester
- flyteexamples
To restrict domains, update Helm values again. By default, Flyte creates three domains per project: development, staging and production.
# -- Domain configuration for Flyte project. This enables the specified number of domains across all projects in Flyte.
configmap
domain:
domains:
- id: development
name: development
- id: staging
name: staging
- id: production
name: production
# Update Cluster resource manager only if you are using Flyte resource manager. It will create the required resource in the project-domain namespace.
cluster_resource_manager:
enabled: true
config:
cluster_resources:
customData:
- development:
- projectQuotaCpu:
value: "5"
- projectQuotaMemory:
value: "4000Mi"
- defaultIamRole:
value: "gsa-development@{{ .Values.userSettings.googleProjectId }}.iam.gserviceaccount.com"
- staging:
- projectQuotaCpu:
value: "2"
- projectQuotaMemory:
value: "3000Mi"
- defaultIamRole:
value: "gsa-staging@{{ .Values.userSettings.googleProjectId }}.iam.gserviceaccount.com"
- production:
- projectQuotaCpu:
value: "2"
- projectQuotaMemory:
value: "3000Mi"
- defaultIamRole:
value: "gsa-production@{{ .Values.userSettings.googleProjectId }}.iam.gserviceaccount.com"
Update helm dependencies
helm dep update
Install Flyte
helm install -n flyte -f values-gcp.yaml --create-namespace flyte flyteorg/flyte-core
Verify that all pods have come up correctly
kubectl get pods -n flyte
Get the ingress IP to update the zone and fetch name server records for DNS
kubectl get ingress -n flyte
Uninstalling Flyte#
helm uninstall -n flyte flyte
Upgrading Flyte#
helm upgrade -n flyte -f values-gcp.yaml --create-namespace flyte flyteorg/flyte-core
Connecting to Flyte#
Flyte can be accessed using the UI console or your terminal.
First, find the Flyte endpoint created by the GKE ingress controller.
$ kubectl -n flyte get ingress
Sample O/P
NAME CLASS HOSTS ADDRESS PORTS AGE
flyte <none> <FLYTE-ENDPOINT> 34.136.165.92 80, 443 18m
flyte-grpc <none> <FLYTE-ENDPOINT> 34.136.165.92 80, 443 18m
Connecting to flytectl CLI
Add :<FLYTE-ENDPOINT> to ~/.flyte/config.yaml eg ;
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///<FLYTE-ENDPOINT>
insecure: false
logger:
show-source: true
level: 0
storage:
type: stow
stow:
kind: google
config:
json: ""
project_id: myproject # GCP Project ID
scopes: https://www.googleapis.com/auth/devstorage.read_write
container: mybucket # GCS Bucket Flyte is configured to use
Accessing Flyte Console (web UI)#
Use the https://<FLYTE-ENDPOINT>/console to get access to flyteconsole UI
Ignore the certificate error if using a self-signed cert
Running Workflows#
Docker file changes
Make sure the Dockerfile contains gcloud-sdk installation steps which are needed by Flyte to upload the results.
# Install gcloud for GCP
RUN apt-get install curl --assume-yes
RUN curl -sSL https://sdk.cloud.google.com | bash
ENV PATH $PATH:/root/google-cloud-sdk/bin
Serializing workflows
For running flytecookbook examples on GCP, make sure you have the right registry during serialization. The following example shows if you are using GCP container registry and US-central zone with project name flyte-gcp and repo name flyterep:
REGISTRY=us-central1-docker.pkg.dev/flyte-gcp/flyterepo make serialize
Uploading the image to registry
The following example shows uploading cookbook core examples to GCP container registry. This step must be performed before performing registration of the workflows in Flyte:
docker push us-central1-docker.pkg.dev/flyte-gcp/flyterepo/flytecookbook:core-2bd81805629e41faeaa25039a6e6abe847446356
Registering workflows
Register workflows by pointing to the output folder for the serialization and providing the version to use for the workflow through flytectl:
flytectl register file /Users/<user-name>/flytesnacks/cookbook/core/_pb_output/* -d development -p flytesnacks --version v1
Generating exec spec file for workflow
The following example generates the exec spec file for the latest version of core.flyte_basics.lp.go_greet workflow part of flytecookbook examples:
flytectl get launchplan -p flytesnacks -d development core.flyte_basics.lp.go_greet --latest --execFile lp.yaml
Modifying exec spec files of the workflow for inputs
Modify the exec spec file lp.yaml and modify the inputs for the workflow:
iamRoleARN: ""
inputs:
am: true
day_of_week: "Sunday"
number: 5
kubeServiceAcct: ""
targetDomain: ""
targetProject: ""
version: v1
workflow: core.flyte_basics.lp.go_greet
Creating execution using the exec spec file
flytectl create execution -p flytesnacks -d development --execFile lp.yaml
Sample O/P
execution identifier project:"flytesnacks" domain:"development" name:"f12c787de18304f4cbe7"
Getting the execution details
flytectl get executions -p flytesnacks -d development f12c787de18304f4cbe7
Troubleshooting#
If any pod is not coming up, then describe the pod and check which container or init-containers had an error.
kubectl describe pod/<pod-instance> -n flyte
Then check the logs for the container which failed. E.g.: to check for <init-container> init container type this:
kubectl logs -f <pod-instance> <init-container> -n flyte
Increasing log level for flytectl
Change your logger config to this:
logger:
show-source: true
level: 6
If you have a new ingress IP for your Flyte deployment, you would need to flush DNS cache using this
If you need to get access logs for your buckets then follow this GCP guide
If you get the following error:
ERROR: Policy modification failed. For a binding with condition, run "gcloud alpha iam policies lint-condition" to identify issues in condition.
ERROR: (gcloud.iam.service-accounts.add-iam-policy-binding) INVALID_ARGUMENT: Identity Pool does not exist
this means that you haven’t enabled workload identity on the cluster. Use the following docs.