Executing Distributed Pytorch training jobs on K8s

This plugin uses the Kubeflow Pytorch Operator and provides an extremely simplified interface for executing distributed training using various pytorch backends.

Installation

To use the flytekit distributed pytorch plugin simply run the following:

pip install flytekitplugins-kfpytorch==0.16.0

How to build your Dockerfile for Pytorch on K8s

Note

If using CPU for training then special dockerfile is NOT REQUIRED. If GPU or TPUs are required then, the dockerfile differs only in the driver setup. The following dockerfile is enabled for GPU accelerated training using CUDA

Gallery generated by Sphinx-Gallery