flytekitplugins.kfpytorch.PyTorch

class flytekitplugins.kfpytorch.PyTorch(master=<factory>, worker=<factory>, run_policy=<factory>, num_workers=None)[source]

Configuration for an executable PyTorch Job. Use this to run distributed PyTorch training on Kubernetes. Please notice, in most cases, you should not worry about the configuration of the master and worker groups. The default configuration should work. The only field you should change is the number of workers. Both replicas will use the same image, and the same resources inherited from task function decoration.

Parameters:
  • master (Master) – Configuration for the master replica group.

  • worker (Worker) – Configuration for the worker replica group.

  • run_policy (RunPolicy | None) – Configuration for the run policy.

  • num_workers (int | None) – [DEPRECATED] This argument is deprecated. Use worker.replicas instead.

Methods

Attributes

num_workers: int | None = None
master: Master
worker: Worker
run_policy: RunPolicy | None