Kubeflow TensorFlow#

TensorFlow operator is useful to natively run distributed TensorFlow training jobs on Flyte. It is a wrapper built around Kubeflow’s TensorFlow operator.

Installation#

To install the Kubeflow TensorFlow plugin, run the following command:

pip install flytekitplugins-kftensorflow

To enable the plugin in the backend, follow instructions outlined in the K8s Operator guide.

Code#

We will write an example that does distributed training using the Kubeflow TensorFlow operator. Before that, let’s look at the compute setup and Dockerfile.

GPU to CPU#

GPU support has been enabled in the code by default. If you want to test your code on a CPU, incorporate the following changes:

  • Replace FROM tensorflow/tensorflow:latest-gpu with FROM tensorflow/tensorflow:latest in the Dockerfile

  • Remove the gpu parameter from the Resources definition in the example

Dockerfile#

The example uses TensorFlow-GPU image.

FROM tensorflow/tensorflow:latest-gpu
# You can disable GPU support by replacing the above line with:
# FROM tensorflow/tensorflow:latest

LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks

WORKDIR /root
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux

# Install basics
RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub
RUN apt-get update && apt-get install -y make build-essential libssl-dev curl python3-venv

# Install the AWS cli separately to prevent issues with boto being written over
RUN pip install awscli

WORKDIR /opt
RUN curl https://sdk.cloud.google.com > install.sh
RUN bash /opt/install.sh --install-dir=/opt
ENV PATH $PATH:/opt/google-cloud-sdk/bin
WORKDIR /root

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Install wheel after venv is activated
RUN pip3 install wheel

# Install Python dependencies
COPY kftensorflow/requirements.txt /root
RUN pip install -r /root/requirements.txt

# Copy the makefile targets to expose on the container. This makes it easier to register.
COPY in_container.mk /root/Makefile
COPY kftensorflow/sandbox.config /root

# Copy the actual code
COPY kftensorflow/ /root/kftensorflow/

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag

Gallery generated by Sphinx-Gallery