Kubeflow TensorFlow¶
TensorFlow operator is useful to natively run distributed TensorFlow training jobs on Flyte. It is a wrapper built around Kubeflow’s TensorFlow operator.
Installation¶
To install the Kubeflow TensorFlow plugin, run the following command:
pip install flytekitplugins-kftensorflow
To enable the plugin in the backend, follow instructions outlined in the K8s Operator guide.
Code¶
We will write an example that does distributed training using the Kubeflow TensorFlow operator. Before that, let’s look at the compute setup and Dockerfile.
GPU to CPU¶
GPU support has been enabled in the code by default. If you want to test your code on a CPU, incorporate the following changes:
Replace
FROM tensorflow/tensorflow:latest-gpu
withFROM tensorflow/tensorflow:latest
in the DockerfileRemove the
gpu
parameter from theResources
definition in the example
Dockerfile¶
The example uses TensorFlow-GPU image.
FROM tensorflow/tensorflow:latest-gpu
# You can disable GPU support by replacing the above line with:
# FROM tensorflow/tensorflow:latest
LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks
WORKDIR /root
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux
# Install basics
RUN apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub
RUN apt-get update && apt-get install -y make build-essential libssl-dev curl python3-venv
# Install the AWS cli separately to prevent issues with boto being written over
RUN pip install awscli
WORKDIR /opt
RUN curl https://sdk.cloud.google.com > install.sh
RUN bash /opt/install.sh --install-dir=/opt
ENV PATH $PATH:/opt/google-cloud-sdk/bin
WORKDIR /root
ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"
# Install wheel after venv is activated
RUN pip3 install wheel
# Install Python dependencies
COPY kftensorflow/requirements.txt /root
RUN pip install -r /root/requirements.txt
# Copy the makefile targets to expose on the container. This makes it easier to register.
COPY in_container.mk /root/Makefile
COPY kftensorflow/sandbox.config /root
# Copy the actual code
COPY kftensorflow/ /root/kftensorflow/
# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag