Configuring access to GPUs

Tags: Deployment, Infrastructure, GPU, Intermediate

Along with the simpler resources like CPU/Memory, you may want to configure and access GPU resources. Flyte allows you to configure the GPU access poilcy for your cluster. GPUs are expensive and it would not be ideal to treat machines with GPUs and machines with CPUs equally. You may want to reserve machines with GPUs for tasks that explicitly request GPUs. To achieve this, Flyte uses the Kubernetes concept of taints and tolerations.

Kubernetes can automatically apply tolerations for extended resources like GPUs using the ExtendedResourceToleration plugin, enabled by default in some cloud environments. Make sure the GPU nodes are tainted with a key matching the resource name, i.e., key: nvidia.com/gpu.

You can also configure Flyte backend to apply specific tolerations. This configuration is controlled under generic k8s plugin configuration as can be found here.

The idea of this configuration is that whenever a task that can execute on Kubernetes requests for GPUs, it automatically adds the matching toleration for that resource (in this case, gpu) to the generated PodSpec. As it follows here, you can configure it to access specific resources using the tolerations for all resources supported by Kubernetes.

Here’s an example configuration:

plugins:
  k8s:
    resource-tolerations:
      - nvidia.com/gpu:
        - key: "key1"
          operator: "Equal"
          value: "value1"
          effect: "NoSchedule"

Getting this configuration into your deployment will depend on how Flyte is deployed on your cluster. If you use the default Opta/Helm route, you’ll need to amend your Helm chart values (example) so that they end up here.