AWS Batch Setup#

This setup document applies to both map tasks and single tasks running on AWS Batch.

Note

For single [non-map] task use, please take note of the additional code when updating Propeller config.

AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS.

Flyte abstracts away the complexity of integrating AWS Batch into users’ workflows. It takes care of packaging inputs, reading outputs, scheduling map tasks, leveraging AWS Batch Job Queues to distribute the load and coordinate priorities.

Set-up AWS Batch#

Follow the guide Running batch jobs at scale for less.

At the end of this step, the AWS Account should have a configured compute environment and one or more AWS Batch Job Queues.

Modify Users’ AWS IAM Role Trust Policy Document#

Follow the guide AWS Batch Execution IAM role.

When running Workflows in Flyte, users have the option to specify a K8s Account and/or an IAM Role to run as. For AWS Batch, an IAM Role must be specified. For every one of these IAM Roles, modify the trust policy to allow ECS to assume the role.

Modify System’s AWS IAM Role policies#

Follow the guide: Granting a user permissions to pass a role to an AWS service.

The recommended way of assigning permissions to flyte components is using OIDC. This involves assigning an IAM Role for every service account used. You will need to find the IAM Role assigned to the flytepropeller’s kubernetes service account. Then modify the policy document to allow the role to pass other roles to AWS Batch.

Update FlyteAdmin Config#

FlyteAdmin needs to be made aware of all the AWS Batch Job Queues and how the system should distribute the load onto them. The simplest setup looks something like this:

flyteadmin:
  roleNameKey: "eks.amazonaws.com/role-arn"
queues:
  # A list of items, one per AWS Batch Job Queue.
  executionQueues:
    # The name of the job queue from AWS Batch
    - dynamic: "tutorial"
      # A list of tags/attributes that can be used to match workflows to this queue.
      attributes:
        - default
  # A list of configs to match project and/or domain and/or workflows to job queues using tags.
  workflowConfigs:
    # An empty rule to match any workflow to the queue tagged as "default"
    - tags:
        - default

If you are using Helm, this block can be added under configMaps.adminServer section here.

An example of a more complex matching config below defines 3 different queues with separate attributes and matching logic based on project/domain/workflowName.

queues:
  executionQueues:
    - dynamic: "gpu_dynamic"
      attributes:
      - gpu
    - dynamic: "critical"
      attributes:
      - critical
    - dynamic: "default"
      attributes:
      - default
  workflowConfigs:
    - project: "my_queue_1"
      domain: "production"
      workflowName: "my_workflow_1"
      tags:
      - critical
    - project: "production"
      workflowName: "my_workflow_2"
      tags:
      - gpu
    - project: "my_queue_3"
      domain: "production"
      workflowName: "my_workflow_3"
      tags:
      - critical
    - tags:
      - default

These settings can also be dynamically altered through flytectl (or flyteAdmin API). Read about the core concept here. Then visit flytectl docs for a guide on how to dynamically update these configs.

Update Flyte Propeller’s Config#

AWS Array Plugin requires some configurations to correctly communicate with AWS Batch Service.

These configurations live within flytepropeller’s configMap. The config should be modified to set the following keys:

plugins:
  aws:
    batch:
      # Must match that set in flyteAdmin's configMap flyteadmin.roleNameKey
      roleAnnotationKey: eks.amazonaws.com/role-arn
    # Must match the desired region to launch these tasks.
    region: us-east-2
tasks:
  task-plugins:
    enabled-plugins:
      # Enable aws_array task plugin.
      - aws_array
    default-for-task-types:
      # Set it as the default handler for array/map tasks.
      container_array: aws_array
      # Make sure to add this line to enable single (non-map) AWS Batch tasks
      aws-batch: aws_array

Launch an Execution on AWS Batch#

Follow this guide to write a workflow with a Map Task.

Serialize and Register the workflow/task to a Flyte backend, then launch an execution either on Flyte Console or with flytectl:

  • Navigate to Flyte Console’s UI (e.g. sandbox) and find the workflow.

  • Click on Launch to open up the launch form.

  • Select IAM Role and enter the full AWS Arn of an IAM Role configured according to the above guide.

  • Submit the form.

As soon as the task starts executing, a link for the AWS Array Job will appear in the log links section in Flyte Console. As individual jobs start getting scheduled, links to their individual cloudWatch log streams will also appear in the UI.

A screenshot of Flyte Console displaying log links for a successful array job.

A screenshot of Flyte Console displaying log links for a successful array job.

A screenshot of Flyte Console displaying log links for a failed array job.

A screenshot of Flyte Console displaying log links for a failed array job.