Skip to content

Commit b0636f8

Browse files
committed
mnist
1 parent e06c514 commit b0636f8

File tree

2 files changed

+73
-27
lines changed

2 files changed

+73
-27
lines changed

sagemaker-python-sdk/tensorflow_abalone_age_predictor_using_keras/abalone.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
import numpy as np
22
import os
33
import tensorflow as tf
4-
from tensorflow.contrib.keras.python.keras.layers import Dense
54
from tensorflow.python.estimator.export.export import build_raw_serving_input_receiver_fn
65
from tensorflow.python.estimator.export.export_output import PredictOutput
76

@@ -22,9 +21,9 @@ def model_fn(features, labels, mode, params):
2221

2322
# 1. Configure the model via Keras functional api
2423

25-
first_hidden_layer = Dense(10, activation='relu', name='first-layer')(features[INPUT_TENSOR_NAME])
26-
second_hidden_layer = Dense(10, activation='relu')(first_hidden_layer)
27-
output_layer = Dense(1, activation='linear')(second_hidden_layer)
24+
first_hidden_layer = tf.keras.layers.Dense(10, activation='relu', name='first-layer')(features[INPUT_TENSOR_NAME])
25+
second_hidden_layer = tf.keras.layers.Dense(10, activation='relu')(first_hidden_layer)
26+
output_layer = tf.keras.layers.Dense(1, activation='linear')(second_hidden_layer)
2827

2928
predictions = tf.reshape(output_layer, [-1])
3029

sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_distributed_mnist.ipynb

Lines changed: 70 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,17 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"## Let's start by setting up the environment."
7+
"# MNIST distributed training \n",
8+
"\n",
9+
"The **SageMaker Python SDK** helps you deploy your models for training and hosting in optimized, productions ready containers in SageMaker. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow and MXNet. This tutorial focuses on how to create a convolutional neural network model to train the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) using **TensorFlow distributed training**.\n",
10+
"\n"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"metadata": {},
16+
"source": [
17+
"### Set up the environment"
818
]
919
},
1020
{
@@ -20,15 +30,14 @@
2030
"\n",
2131
"sagemaker_session = sagemaker.Session()\n",
2232
"\n",
23-
"# Replace with a role (either name or full arn) that gives SageMaker access to S3 and cloudwatch\n",
24-
"role='SageMakerRole'"
33+
"role = get_execution_role()"
2534
]
2635
},
2736
{
2837
"cell_type": "markdown",
2938
"metadata": {},
3039
"source": [
31-
"## Downloading test and training data"
40+
"### Download the MNIST dataset"
3241
]
3342
},
3443
{
@@ -54,7 +63,8 @@
5463
"cell_type": "markdown",
5564
"metadata": {},
5665
"source": [
57-
"## Uploading the data"
66+
"### Upload the data\n",
67+
"We use the ```sagemaker.Session.upload_data``` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use this later when we start the training job."
5868
]
5969
},
6070
{
@@ -72,7 +82,8 @@
7282
"cell_type": "markdown",
7383
"metadata": {},
7484
"source": [
75-
"# Complete source code"
85+
"# Construct a script for distributed training \n",
86+
"Here is the full code for the network model:"
7687
]
7788
},
7889
{
@@ -90,12 +101,36 @@
90101
"cell_type": "markdown",
91102
"metadata": {},
92103
"source": [
93-
"# Running TensorFlow training on SageMaker\n",
104+
"The script here is and adaptation of the [TensorFlow MNIST example](https://github.com/tensorflow/models/tree/master/official/mnist). It provides a ```model_fn(features, labels, mode)```, which is used for training, evaluation and inference. \n",
105+
"\n",
106+
"## A regular ```model_fn```\n",
94107
"\n",
95-
"We can use the SDK to run our local training script on SageMaker infrastructure.\n",
108+
"A regular **```model_fn```** follows the pattern:\n",
109+
"1. [defines a neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L96)\n",
110+
"- [applies the ```features``` in the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L178)\n",
111+
"- [if the ```mode``` is ```PREDICT```, returns the output from the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L186)\n",
112+
"- [calculates the loss function comparing the output with the ```labels```](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L188)\n",
113+
"- [creates an optimizer and minimizes the loss function to improve the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L193)\n",
114+
"- [returns the output, optimizer and loss function](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L205)\n",
96115
"\n",
97-
"1. Pass the path to the abalone.py file, which contains the functions for defining your estimator, to the sagemaker.TensorFlow init method.\n",
98-
"2. Pass the S3 location that we uploaded our data to previously to the fit() method."
116+
"## Writing writint a ```model_fn``` for distributed training\n",
117+
"When distributed training happens, the same neural network will be sent to the multiple training instances. Each instance will predict a batch of the dataset, calculate loss and minimize the optimizer. One entire loop of this process is called **training step**.\n",
118+
"\n",
119+
"### Syncronizing training steps\n",
120+
"A [global step](https://www.tensorflow.org/api_docs/python/tf/train/global_step) it is a global variable shared between the instances. It necessary for distributed training, so the optimizer will keep track of the number of **training steps** between runs: \n",
121+
"\n",
122+
"```python\n",
123+
"train_op = optimizer.minimize(loss, tf.train.get_or_create_global_step())\n",
124+
"```\n",
125+
"\n",
126+
"That is the only required change for distributed training!"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"metadata": {},
132+
"source": [
133+
"## Create a training job using the sagemaker.TensorFlow estimator"
99134
]
100135
},
101136
{
@@ -109,14 +144,24 @@
109144
"from sagemaker.tensorflow import TensorFlow\n",
110145
"\n",
111146
"mnist_estimator = TensorFlow(entry_point='mnist.py',\n",
112-
" role=role,\n",
113-
" hyperparameters={'training_steps' : 1000, 'evaluation_steps' : 100},\n",
114-
" train_instance_count=2,\n",
115-
" train_instance_type='ml.p2.xlarge')\n",
147+
" role=role,\n",
148+
" training_steps=1000, \n",
149+
" evaluation_steps=100,\n",
150+
" train_instance_count=2,\n",
151+
" train_instance_type='ml.c4.xlarge')\n",
116152
"\n",
117153
"mnist_estimator.fit(inputs)"
118154
]
119155
},
156+
{
157+
"cell_type": "markdown",
158+
"metadata": {},
159+
"source": [
160+
"The **```fit```** method will create a training job in two **ml.c4.xlarge** instances. The logs above will show the instances doing training, evaluation, and incrementing the number of **training steps**. \n",
161+
"\n",
162+
"In the end of the training, the training job will generate a saved model for TF serving."
163+
]
164+
},
120165
{
121166
"cell_type": "markdown",
122167
"metadata": {
@@ -131,7 +176,9 @@
131176
{
132177
"cell_type": "code",
133178
"execution_count": null,
134-
"metadata": {},
179+
"metadata": {
180+
"collapsed": true
181+
},
135182
"outputs": [],
136183
"source": [
137184
"mnist_predictor = mnist_estimator.deploy(initial_instance_count=1,\n",
@@ -190,25 +237,25 @@
190237
}
191238
],
192239
"metadata": {
193-
"notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
194240
"kernelspec": {
195-
"display_name": "Environment (conda_tensorflow_p27)",
241+
"display_name": "Python 2",
196242
"language": "python",
197-
"name": "conda_tensorflow_p27"
243+
"name": "python2"
198244
},
199245
"language_info": {
200246
"codemirror_mode": {
201247
"name": "ipython",
202-
"version": 3
248+
"version": 2
203249
},
204250
"file_extension": ".py",
205251
"mimetype": "text/x-python",
206252
"name": "python",
207253
"nbconvert_exporter": "python",
208-
"pygments_lexer": "ipython3",
209-
"version": "2.7.13"
210-
}
254+
"pygments_lexer": "ipython2",
255+
"version": "2.7.10"
256+
},
257+
"notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
211258
},
212259
"nbformat": 4,
213260
"nbformat_minor": 2
214-
}
261+
}

0 commit comments

Comments
 (0)