Kubeflow TensorFlow

TensorFlow operator is useful to natively run distributed TensorFlow training jobs on Flyte. It is a wrapper built around Kubeflow’s TensorFlow operator.

Installation

To install the Kubeflow TensorFlow plugin, run the following command:

pip install flytekitplugins-kftensorflow

To enable the plugin in the backend, follow instructions outlined in the K8s Operator guide.

Code

We will write an example that does distributed training using the Kubeflow TensorFlow operator. Before that, let’s look at the compute setup and Dockerfile.

GPU to CPU

GPU support has been enabled in the code by default. If you want to test your code on a CPU, incorporate the following changes:

  • Replace FROM tensorflow/tensorflow:latest-gpu with FROM tensorflow/tensorflow:latest in the Dockerfile

  • Remove the gpu parameter from the Resources definition in the example

Dockerfile

The example uses TensorFlow-GPU image.

FROM tensorflow/tensorflow:latest-gpu
# You can disable GPU support by replacing the above line with:
# FROM tensorflow/tensorflow:latest

LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks

WORKDIR /root
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux

# Install basics
RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub
RUN apt-get update && apt-get install -y make build-essential libssl-dev curl python3-venv

# Install the AWS cli separately to prevent issues with boto being written over
RUN pip install awscli

WORKDIR /opt
RUN curl https://sdk.cloud.google.com > install.sh
RUN bash /opt/install.sh --install-dir=/opt
ENV PATH $PATH:/opt/google-cloud-sdk/bin
WORKDIR /root

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Install wheel after venv is activated
RUN pip3 install wheel

# Install Python dependencies
COPY kftensorflow/requirements.txt /root
RUN pip install -r /root/requirements.txt

# Copy the makefile targets to expose on the container. This makes it easier to register.
COPY in_container.mk /root/Makefile
COPY kftensorflow/sandbox.config /root

# Copy the actual code
COPY kftensorflow/ /root/kftensorflow/

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag

Gallery generated by Sphinx-Gallery