flytekitplugins.kfmpi.MPIJob#

class flytekitplugins.kfmpi.MPIJob(launcher=<factory>, worker=<factory>, run_policy=<factory>, slots=1, num_launcher_replicas=None, num_workers=None)[source]#

Configuration for an executable MPI Job. Use this to run distributed training on k8s with MPI

Parameters
  • launcher (flytekitplugins.kfmpi.task.Launcher) – Configuration for the launcher replica group.

  • worker (flytekitplugins.kfmpi.task.Worker) – Configuration for the worker replica group.

  • run_policy (Optional[flytekitplugins.kfmpi.task.RunPolicy]) – Configuration for the run policy.

  • slots (int) – The number of slots per worker used in the hostfile.

  • num_launcher_replicas (Optional[int]) – [DEPRECATED] The number of launcher server replicas to use. This argument is deprecated.

  • num_workers (Optional[int]) – [DEPRECATED] The number of worker replicas to spawn in the cluster for this job

Return type

None

Methods

Attributes

num_launcher_replicas: Optional[int] = None
num_workers: Optional[int] = None
slots: int = 1
launcher: flytekitplugins.kfmpi.task.Launcher
worker: flytekitplugins.kfmpi.task.Worker
run_policy: Optional[flytekitplugins.kfmpi.task.RunPolicy]