|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Training with SageMaker Pipe Mode and TensorFlow using the SageMaker Python SDK\n", |
| 8 | + "\n", |
| 9 | + "SageMaker Pipe Mode is an input mechanism for SageMaker training containers based on Linux named pipes. SageMaker makes the data available to the training container using named pipes, which allows data to be downloaded from S3 to the container while training is running. For larger datasets, this dramatically improves the time to start training, as the data does not need to be first downloaded to the container. To learn more about pipe mode, please consult the AWS documentation at: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-trainingdata.\n", |
| 10 | + "\n", |
| 11 | + "In this tutorial, we show you how to train a tf.estimator using data read with SageMaker Pipe Mode. We'll use the SageMaker `PipeModeDataset` class - a special TensorFlow `Dataset` built specifically to read from SageMaker Pipe Mode data. This `Dataset` is available in our TensorFlow containers for TensorFlow versions 1.7.0 and up. It's also open-sourced at https://github.com/aws/sagemaker-tensorflow-extensions and can be built into custom TensorFlow images for use in SageMaker. \n", |
| 12 | + "\n", |
| 13 | + "Although you can also build the PipeModeDataset into your own containers, in this tutorial we'll show how you can use the PipeModeDataset by launching training from the SageMaker Python SDK. The SageMaker Python SDK helps you deploy your models for training and hosting in optimized, production ready containers in SageMaker. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow and MXNet. \n", |
| 14 | + "\n", |
| 15 | + "Different collections of S3 files can be made available to the training container while it's running. These are referred to as \"channels\" in SageMaker. In this example, we use two channels - one for training data and one for evaluation data. Each channel is mapped to S3 files from different directories. The SageMaker PipeModeDataset knows how to read from the named pipes for each channel given just the channel name. When we launch SageMaker training we tell SageMaker what channels we have and where in S3 to read the data for each channel.\n" |
| 16 | + ] |
| 17 | + }, |
| 18 | + { |
| 19 | + "cell_type": "markdown", |
| 20 | + "metadata": {}, |
| 21 | + "source": [ |
| 22 | + "## Setup\n", |
| 23 | + "The following code snippet sets up some variables we'll need later on. Please provide an S3 bucket that a TensorFlow training script and training output can be stored in.\n" |
| 24 | + ] |
| 25 | + }, |
| 26 | + { |
| 27 | + "cell_type": "code", |
| 28 | + "execution_count": 1, |
| 29 | + "metadata": { |
| 30 | + "isConfigCell": true |
| 31 | + }, |
| 32 | + "outputs": [], |
| 33 | + "source": [ |
| 34 | + "from sagemaker import get_execution_role\n", |
| 35 | + "\n", |
| 36 | + "#Bucket location to save your custom code in tar.gz format.\n", |
| 37 | + "custom_code_upload_location = 's3://<bucket-name>/customcode/tensorflow_pipemode'\n", |
| 38 | + "\n", |
| 39 | + "#Bucket location where results of model training are saved.\n", |
| 40 | + "model_artifacts_location = 's3://<bucket-name>/artifacts'\n", |
| 41 | + "\n", |
| 42 | + "#IAM execution role that gives SageMaker access to resources in your AWS account.\n", |
| 43 | + "role = get_execution_role()\n" |
| 44 | + ] |
| 45 | + }, |
| 46 | + { |
| 47 | + "cell_type": "markdown", |
| 48 | + "metadata": {}, |
| 49 | + "source": [ |
| 50 | + "## Complete training source code \n", |
| 51 | + "\n", |
| 52 | + "In this tutorial we train a TensorFlow LinearClassifier using pipe mode data. The TensorFlow training script is contained in following file:" |
| 53 | + ] |
| 54 | + }, |
| 55 | + { |
| 56 | + "cell_type": "code", |
| 57 | + "execution_count": null, |
| 58 | + "metadata": {}, |
| 59 | + "outputs": [], |
| 60 | + "source": [ |
| 61 | + "!cat \"pipemode.py\"" |
| 62 | + ] |
| 63 | + }, |
| 64 | + { |
| 65 | + "cell_type": "markdown", |
| 66 | + "metadata": {}, |
| 67 | + "source": [ |
| 68 | + "### Using a PipeModeDataset in an input_fn\n", |
| 69 | + "To train an estimator using a Pipe Mode channel, we must construct an input_fn that reads from the channel. To do this, we use the SageMaker PipeModeDataset. This is a TensorFlow Dataset specifically created to read from a SageMaker Pipe Mode channel. A PipeModeDataset is a fully-featured TensorFlow Dataset and can be used in exactly the same ways as a regular TensorFlow Dataset can be used.\n", |
| 70 | + "\n", |
| 71 | + "The training and evaluation data used in this tutorial is synthetic. It contains a series of records stored in a TensorFlow Example protobuf object. Each record contains a numeric class label and an array of 1024 floating point numbers. Each array is sampled from a multi-dimensional Gaussian distribution with a class-specific mean. This means it is possible to learn a model using a TensorFlow Linear classifier which can classify examples well. Each record is separated using RecordIO encoding (though the PipeModeDataset class also supports the TFRecord format as well). \n", |
| 72 | + "\n", |
| 73 | + "The training and evaluation data were produced using the benchmarking source code in the sagemaker-tensorflow-extensions benchmarking sub-package. If you want to investigate this further, please visit the GitHub repository for sagemaker-tensorflow-extensions at https://github.com/aws/sagemaker-tensorflow-extensions. \n", |
| 74 | + "\n", |
| 75 | + "The following example code shows how to use a PipeModeDataset in an input_fn." |
| 76 | + ] |
| 77 | + }, |
| 78 | + { |
| 79 | + "cell_type": "code", |
| 80 | + "execution_count": null, |
| 81 | + "metadata": {}, |
| 82 | + "outputs": [], |
| 83 | + "source": [ |
| 84 | + "from sagemaker_tensorflow import PipeModeDataset\n", |
| 85 | + "\n", |
| 86 | + "def input_fn():\n", |
| 87 | + " # Simple example data - a labeled vector.\n", |
| 88 | + " features = {\n", |
| 89 | + " 'data': tf.FixedLenFeature([], tf.string),\n", |
| 90 | + " 'labels': tf.FixedLenFeature([], tf.int64),\n", |
| 91 | + " }\n", |
| 92 | + " \n", |
| 93 | + " # A function to parse record bytes to a labeled vector record\n", |
| 94 | + " def parse(record):\n", |
| 95 | + " parsed = tf.parse_single_example(record, features)\n", |
| 96 | + " return ({\n", |
| 97 | + " 'data': tf.decode_raw(parsed['data'], tf.float64)\n", |
| 98 | + " }, parsed['labels'])\n", |
| 99 | + "\n", |
| 100 | + " # Construct a PipeModeDataset reading from a 'training' channel, using\n", |
| 101 | + " # the TF Record encoding.\n", |
| 102 | + " ds = PipeModeDataset(channel='training', record_format='TFRecord')\n", |
| 103 | + "\n", |
| 104 | + " # The PipeModeDataset is a TensorFlow Dataset and provides standard Dataset methods\n", |
| 105 | + " ds = ds.repeat(20)\n", |
| 106 | + " ds = ds.prefetch(10)\n", |
| 107 | + " ds = ds.map(parse, num_parallel_calls=10)\n", |
| 108 | + " ds = ds.batch(64)\n", |
| 109 | + " \n", |
| 110 | + " return ds" |
| 111 | + ] |
| 112 | + }, |
| 113 | + { |
| 114 | + "cell_type": "markdown", |
| 115 | + "metadata": {}, |
| 116 | + "source": [ |
| 117 | + "# Running training using the Python SDK\n", |
| 118 | + "\n", |
| 119 | + "We can use the SDK to run our local training script on SageMaker infrastructure.\n", |
| 120 | + "\n", |
| 121 | + "1. Pass the path to the pipemode.py file, which contains the functions for defining your estimator, to the sagemaker.TensorFlow init method.\n", |
| 122 | + "2. Pass the S3 location that we uploaded our data to previously to the fit() method." |
| 123 | + ] |
| 124 | + }, |
| 125 | + { |
| 126 | + "cell_type": "code", |
| 127 | + "execution_count": null, |
| 128 | + "metadata": {}, |
| 129 | + "outputs": [], |
| 130 | + "source": [ |
| 131 | + "from sagemaker.tensorflow import TensorFlow\n", |
| 132 | + "\n", |
| 133 | + "tensorflow = TensorFlow(entry_point='pipemode.py',\n", |
| 134 | + " role=role,\n", |
| 135 | + " input_mode='Pipe',\n", |
| 136 | + " output_path=model_artifacts_location,\n", |
| 137 | + " code_location=custom_code_upload_location,\n", |
| 138 | + " train_instance_count=1,\n", |
| 139 | + " training_steps=1000,\n", |
| 140 | + " evaluation_steps=100,\n", |
| 141 | + " train_instance_type='ml.c4.xlarge')" |
| 142 | + ] |
| 143 | + }, |
| 144 | + { |
| 145 | + "cell_type": "markdown", |
| 146 | + "metadata": {}, |
| 147 | + "source": [ |
| 148 | + "After we've created the SageMaker Python SDK TensorFlow object, we can call fit to launch TensorFlow training:" |
| 149 | + ] |
| 150 | + }, |
| 151 | + { |
| 152 | + "cell_type": "code", |
| 153 | + "execution_count": null, |
| 154 | + "metadata": {}, |
| 155 | + "outputs": [], |
| 156 | + "source": [ |
| 157 | + "%%time\n", |
| 158 | + "import boto3\n", |
| 159 | + "\n", |
| 160 | + "# use the region-specific sample data bucket\n", |
| 161 | + "region = boto3.Session().region_name\n", |
| 162 | + "\n", |
| 163 | + "train_data = 's3://sagemaker-sample-data-{}/tensorflow/pipe-mode/train'.format(region)\n", |
| 164 | + "eval_data = 's3://sagemaker-sample-data-{}/tensorflow/pipe-mode/eval'.format(region)\n", |
| 165 | + "\n", |
| 166 | + "tensorflow.fit({'train':train_data, 'eval':eval_data})\n" |
| 167 | + ] |
| 168 | + }, |
| 169 | + { |
| 170 | + "cell_type": "markdown", |
| 171 | + "metadata": {}, |
| 172 | + "source": [ |
| 173 | + "After ``fit`` returns, you've successfully trained a TensorFlow LinearClassifier using SageMaker pipe mode! The TensorFlow model data will be stored in '``s3://<bucket-name>/artifacts``' - where '``<bucket-name>``' is the name of the bucket you supplied earlier." |
| 174 | + ] |
| 175 | + } |
| 176 | + ], |
| 177 | + "metadata": { |
| 178 | + "kernelspec": { |
| 179 | + "display_name": "conda_tensorflow_p27", |
| 180 | + "language": "python", |
| 181 | + "name": "conda_tensorflow_p27" |
| 182 | + }, |
| 183 | + "language_info": { |
| 184 | + "codemirror_mode": { |
| 185 | + "name": "ipython", |
| 186 | + "version": 2 |
| 187 | + }, |
| 188 | + "file_extension": ".py", |
| 189 | + "mimetype": "text/x-python", |
| 190 | + "name": "python", |
| 191 | + "nbconvert_exporter": "python", |
| 192 | + "pygments_lexer": "ipython2", |
| 193 | + "version": "2.7.14" |
| 194 | + }, |
| 195 | + "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." |
| 196 | + }, |
| 197 | + "nbformat": 4, |
| 198 | + "nbformat_minor": 1 |
| 199 | +} |
0 commit comments