Skip to content

Commit 86c024e

Browse files
authored
Merge pull request aws#26 from awslabs/mvs-tensorboard
tensorboard example notebook
2 parents ef9e6fc + 6e363b3 commit 86c024e

File tree

8 files changed

+206
-183
lines changed

8 files changed

+206
-183
lines changed

sagemaker-python-sdk/tensorflow_resnet_cifar10/tensorflow_resnet_cifar10.ipynb

Lines changed: 0 additions & 183 deletions
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# ResNet CIFAR-10 with tensorboard\n",
8+
"\n",
9+
"This notebook shows how to use TensorBoard, and how the training job writes checkpoints to a external bucket.\n",
10+
"The model used for this notebook is a RestNet model, trained with the CIFAR-10 dataset.\n",
11+
"See the following papers for more background:\n",
12+
"\n",
13+
"[Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.\n",
14+
"\n",
15+
"[Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016."
16+
]
17+
},
18+
{
19+
"cell_type": "markdown",
20+
"metadata": {},
21+
"source": [
22+
"### Set up the environment"
23+
]
24+
},
25+
{
26+
"cell_type": "code",
27+
"execution_count": null,
28+
"metadata": {
29+
"collapsed": true
30+
},
31+
"outputs": [],
32+
"source": [
33+
"import os\n",
34+
"import sagemaker\n",
35+
"import tensorflow\n",
36+
"from sagemaker import get_execution_role\n",
37+
"\n",
38+
"sagemaker_session = sagemaker.Session()\n",
39+
"\n",
40+
"role = get_execution_role()"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"metadata": {},
46+
"source": [
47+
"### Download the CIFAR-10 dataset\n",
48+
"Downloading the test and training data will take around 5 minutes."
49+
]
50+
},
51+
{
52+
"cell_type": "code",
53+
"execution_count": null,
54+
"metadata": {
55+
"scrolled": false
56+
},
57+
"outputs": [],
58+
"source": [
59+
"import utils\n",
60+
"\n",
61+
"utils.cifar10_download()"
62+
]
63+
},
64+
{
65+
"cell_type": "markdown",
66+
"metadata": {},
67+
"source": [
68+
"### Upload the data to a S3 bucket"
69+
]
70+
},
71+
{
72+
"cell_type": "code",
73+
"execution_count": null,
74+
"metadata": {
75+
"collapsed": true
76+
},
77+
"outputs": [],
78+
"source": [
79+
"inputs = sagemaker_session.upload_data(path='/tmp/cifar10_data', key_prefix='data/cifar10')"
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"metadata": {},
85+
"source": [
86+
"**sagemaker_session.upload_data** will upload the CIFAR-10 dataset from your machine to a bucket named **sagemaker-{*your aws account number*}**, if you don't have this bucket yet, sagemaker_session will create it for you."
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"metadata": {},
92+
"source": [
93+
"### Complete source code\n",
94+
"- [source_dir/resnet_model.py](source_dir/resnet_model.py): ResNet model\n",
95+
"- [source_dir/resnet_cifar_10.py](source_dir/resnet_cifar_10.py): main script used for training and hosting"
96+
]
97+
},
98+
{
99+
"cell_type": "markdown",
100+
"metadata": {},
101+
"source": [
102+
"## Create a training job using the sagemaker.TensorFlow estimator"
103+
]
104+
},
105+
{
106+
"cell_type": "code",
107+
"execution_count": null,
108+
"metadata": {
109+
"scrolled": false
110+
},
111+
"outputs": [],
112+
"source": [
113+
"from sagemaker.tensorflow import TensorFlow\n",
114+
"\n",
115+
"\n",
116+
"source_dir = os.path.join(os.getcwd(), 'source_dir')\n",
117+
"estimator = TensorFlow(entry_point='resnet_cifar_10.py',\n",
118+
" source_dir=source_dir,\n",
119+
" role=role,\n",
120+
" training_steps=3000, evaluation_steps=100,\n",
121+
" train_instance_count=2, train_instance_type='ml.c4.xlarge', \n",
122+
" base_job_name='tensorboard-example')\n",
123+
"\n",
124+
"estimator.fit(inputs, run_tensorboard_locally=True)"
125+
]
126+
},
127+
{
128+
"cell_type": "markdown",
129+
"metadata": {},
130+
"source": [
131+
"The **```fit```** method will create a training job named **```tensorboard-example-{unique identifier}```** in two **ml.c4.xlarge** instances. These instances will write checkpoints to the s3 bucket **```sagemaker-{your aws account number}```**.\n",
132+
"\n",
133+
"If you don't have this bucket yet, **```sagemaker_session```** will create it for you. These checkpoints can be used for restoring the training job, and to analyze training job metrics using **TensorBoard**. \n",
134+
"\n",
135+
"The parameter **```run_tensorboard_locally=True```** will run **TensorBoard** in the machine that this notebook is running. Everytime a new checkpoint is created by the training job in the S3 bucket, **```fit```** will download the checkpoint to the temp folder that **TensorBoard** is pointing to.\n",
136+
"\n",
137+
"When the **```fit```** method starts the training, it will log the port that **TensorBoard** is using to display the metrics. The default port is **6006**, but another port can be choosen depending on its availability. The port number will increase until finds an available port. After that the port number will printed in stdout.\n",
138+
"\n",
139+
"It takes a few minutes to provision containers and start the training job.**TensorBoard** will start to display metrics shortly after that.\n",
140+
"\n",
141+
"You can access **Tensorboard** locally at [http://localhost:6006](http://localhost:6006) or using your SakeMaker workspace [proxy/6006](/proxy/6006). If TensorBoard started on a different port, adjust these URLs to match."
142+
]
143+
},
144+
{
145+
"cell_type": "markdown",
146+
"metadata": {
147+
"collapsed": true
148+
},
149+
"source": [
150+
"# Deploy the trained model to prepare for predictions\n",
151+
"\n",
152+
"The deploy() method creates an endpoint which serves prediction requests in real-time."
153+
]
154+
},
155+
{
156+
"cell_type": "code",
157+
"execution_count": null,
158+
"metadata": {},
159+
"outputs": [],
160+
"source": [
161+
"predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge')"
162+
]
163+
},
164+
{
165+
"cell_type": "markdown",
166+
"metadata": {},
167+
"source": [
168+
"# Cleaning up\n",
169+
"To avoid incurring charges to your AWS account for the resources used in this tutorial you need to delete the **SageMaker Endpoint:**"
170+
]
171+
},
172+
{
173+
"cell_type": "code",
174+
"execution_count": null,
175+
"metadata": {
176+
"collapsed": true
177+
},
178+
"outputs": [],
179+
"source": [
180+
"estimator.delete_endpoint()"
181+
]
182+
}
183+
],
184+
"metadata": {
185+
"notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
186+
"kernelspec": {
187+
"display_name": "Environment (conda_tensorflow_p27)",
188+
"language": "python",
189+
"name": "conda_tensorflow_p27"
190+
},
191+
"language_info": {
192+
"codemirror_mode": {
193+
"name": "ipython",
194+
"version": 3
195+
},
196+
"file_extension": ".py",
197+
"mimetype": "text/x-python",
198+
"name": "python",
199+
"nbconvert_exporter": "python",
200+
"pygments_lexer": "ipython3",
201+
"version": "2.7.13"
202+
}
203+
},
204+
"nbformat": 4,
205+
"nbformat_minor": 2
206+
}
Binary file not shown.

0 commit comments

Comments
 (0)