AWS Sagemaker Training#

Tags: Integration, MachineLearning, AWS, Advanced

This section provides examples of Flyte Plugins that are designed to work with AWS Hosted services like Sagemaker, EMR, Athena, Redshift etc

Installation#

To use the flytekit aws sagemaker plugin simply run the following:

pip install flytekitplugins-awssagemaker

Builtin Algorithms#

Amazon SageMaker provides several built-in machine learning algorithms that you can use for a variety of problem types. Built-in algorithms are the fastest to get started with, as they are already pre-built and optimized on Sagemaker. To understand how they work and the various options available please refer to Amazon Sagemaker Official Documentation

Flyte Sagemaker plugin intends to greatly simplify using Sagemaker for training. We have tried to distill the API into a meaningful subset that makes it easier for users to adopt and run with Sagemaker. Due to the nature of the Sagemaker built-in algorithms, it is possible to run them completely from a local notebook using Flyte. This is because, Flyte will automatically use a pre-built Image for the given algorithm.

The Algorithm Images are configured in FlytePlugins in the plugin Configuration here. In the default setting, we have configured XGBoost.

Note

Sagemaker Builtin Algorithms do not require explicit docker images to be specified.

Training a custom model#

Flyte Sagemaker plugin intends to greatly simplify using Sagemaker for training. We have tried to distill the API into a meaningful subset that makes it easier for users to adopt and run with Sagemaker. Training code that runs on Sagemaker looks almost identical to writing any other task on Flyte. Once a custom job is defined, hyper parameter optimization for pre-built algorithms or custom jobs is identical. Users need to wrap their training tasks into an HPO Task and launch it.

Note

Sagemaker custom algorithms work by building our own Docker images. These images need to be pushed to ECR for Sagemaker to access them. Thus, this examples need to be compiled and pushed to your own AWS ECR docker registry to actually execute on Sagemaker.

When a remote execution is triggered, the Sagemaker API is invoked to launch a job in and users function @task is invoked and all parameters are passed to it. The returns are automatically captured. The users can provide a special predicate that marks when to capture outputs from Rank-0 task in distributed training.

Prerequisites#

Before following this example, make sure that

Creating a Dockerfile for Sagemaker Custom Training [Required]#

 1FROM python:3.8-buster
 2LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks
 3
 4WORKDIR /root
 5ENV LANG C.UTF-8
 6ENV LC_ALL C.UTF-8
 7ENV PYTHONPATH /root
 8
 9# Install the AWS cli separately to prevent issues with boto being written over
10RUN pip install awscli
11
12# Setup a virtual environment
13ENV VENV /opt/venv
14# Virtual environment
15RUN python3 -m venv ${VENV}
16ENV PATH="${VENV}/bin:$PATH"
17
18# Install Python dependencies
19COPY aws/sagemaker_training/requirements.txt /root
20RUN pip install -r /root/requirements.txt
21
22# Setup Sagemaker entrypoints
23ENV SAGEMAKER_PROGRAM /opt/venv/bin/flytekit_sagemaker_runner.py
24
25# Copy the makefile targets to expose on the container. This makes it easier to register.
26COPY in_container.mk /root/Makefile
27COPY aws/sagemaker_training/sandbox.config /root
28
29# Copy the actual code
30COPY aws/sagemaker_training/ /root/sagemaker_training
31
32# This tag is supplied by the build script and will be used to determine the version
33# when registering tasks, workflows, and launch plans
34ARG tag
35ENV FLYTE_INTERNAL_IMAGE $tag