PyTorch Distributed#

Tags: Integration, DistributedComputing, MachineLearning, KubernetesOperator, Advanced

The Kubeflow PyTorch plugin leverages the Kubeflow training operator to offer a highly streamlined interface for conducting distributed training using different PyTorch backends.

Install the plugin#

To use the PyTorch plugin, run the following command:

pip install flytekitplugins-kfpytorch

To enable the plugin in the backend, follow instructions outlined in the Configure Kubernetes Plugins guide.

Run the example on the Flyte cluster#

To run the provided example on the Flyte cluster, use the following command:

pyflyte run --remote pytorch_mnist.py \
  pytorch_training_wf