PyTorch Distributed

Tags: Integration, DistributedComputing, MachineLearning, KubernetesOperator, Advanced

The Kubeflow PyTorch plugin leverages the Kubeflow training operator to offer a highly streamlined interface for conducting distributed training using different PyTorch backends.

Install the plugin

To use the PyTorch plugin, run the following command:

pip install flytekitplugins-kfpytorch

To enable the plugin in the backend, follow instructions outlined in the Configure Kubernetes Plugins guide.

Run the example on the Flyte cluster

To run the provided example on the Flyte cluster, use the following command:

pyflyte run --remote pytorch_mnist.py \
  pytorch_training_wf