flytekitplugins.kfmpi.MPIJob¶

class flytekitplugins.kfmpi.MPIJob(launcher=<factory>, worker=<factory>, run_policy=<factory>, slots=1, num_launcher_replicas=None, num_workers=None)[source]¶

Configuration for an executable MPI Job. Use this to run distributed training on k8s with MPI

Parameters:

launcher (Launcher) – Configuration for the launcher replica group.
worker (Worker) – Configuration for the worker replica group.
run_policy (RunPolicy | None) – Configuration for the run policy.
slots (int) – The number of slots per worker used in the hostfile.
num_launcher_replicas (int | None) – [DEPRECATED] The number of launcher server replicas to use. This argument is deprecated.
num_workers (int | None) – [DEPRECATED] The number of worker replicas to spawn in the cluster for this job

Methods

Attributes

num_launcher_replicas: int | None = None

num_workers: int | None = None

slots: int = 1

launcher: Launcher

worker: Worker

run_policy: RunPolicy | None