Map Tasks#

A map task lets you run a pod task or a regular task over a list of inputs within a single workflow node. This means you can run thousands of instances of the task without creating a node for every instance, providing valuable performance gains!

Some use cases of map tasks include:

  • Several inputs must run through the same code logic

  • Multiple data batches need to be processed in parallel

  • Hyperparameter optimization

Let’s look at an example now!

First, we import the libraries.

import typing

from flytekit import Resources, map_task, task, workflow

Next, we define a task that we will use in our map task.

Note

A map task can only accept one input and produce one output.

@task
def a_mappable_task(a: int) -> str:
    inc = a + 2
    stringified = str(inc)
    return stringified

We also define a task to reduce the mapped output to a string.

@task
def coalesce(b: typing.List[str]) -> str:
    coalesced = "".join(b)
    return coalesced

We send a_mappable_task to be repeated across a collection of inputs to the map_task() function. In our example, a of type typing.List[int] is the input. The task a_mappable_task is run for each element in the list.

with_overrides is useful to set resources for individual map task.

@workflow
def my_map_workflow(a: typing.List[int]) -> str:
    mapped_out = map_task(a_mappable_task)(a=a).with_overrides(
        requests=Resources(mem="300Mi"),
        limits=Resources(mem="500Mi"),
        retries=1,
    )
    coalesced = coalesce(b=mapped_out)
    return coalesced

Lastly, we can run the workflow locally!

if __name__ == "__main__":
    result = my_map_workflow(a=[1, 2, 3, 4, 5])
    print(f"{result}")

Map tasks can run on alternate execution backends, such as AWS Batch, which is a provisioned service that can scale to great sizes.

Total running time of the script: ( 0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery