pr comments

mvsusp · mvsusp · commit 4f6a7cef58c0 · 2017-11-22T18:22:35.000-08:00
diff --git a/sagemaker-python-sdk/tensorflow_resnet_cifar10_with_tensorboard/tensorflow_resnet_cifar10_with_tensorboard.ipynb b/sagemaker-python-sdk/tensorflow_resnet_cifar10_with_tensorboard/tensorflow_resnet_cifar10_with_tensorboard.ipynb
@@ -1,35 +1,13 @@
 {
  "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "**HOW TO INSTALL THE PROXY PLUGIN ON MEAD**\n",
-    "**ATTENTION** THIS SECTION WILL REMOVED AFTER THE MEAD PROXY PLUGIN IS INSTALLED IN MEAD USER DATA. THESE INSTRUCTIONS WILL NOT BE PART OF THE NOTEBOOK FOR GA.\n",
-    "\n",
-    "OPEN A TERMINAL IN JUPYTER:\n",
-    "File->Open->New->Terminal\n",
-    "\n",
-    "```\n",
-    "sudo su\n",
-    "source /home/ec2-user/anaconda3/bin/activate JupyterSystemEnv\n",
-    "pip install git+https://github.com/jupyterhub/nbserverproxy@v0.3.2\n",
-    "jupyter serverextension enable --py nbserverproxy --sys-prefix\n",
-    "source deactivate\n",
-    "restart part-003\n",
-    "```\n",
-    "\n",
-    "```restart part-003``` will restart the jupyter notebook and install the required plugin to run tensorboard."
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "# ResNet CIFAR-10 with tensorboard\n",
     "\n",
-    "This notebook details how to use TensorBoard, and how the training job writes checkpoints to a external bucket.\n",
-    "The model used for this notebook is a RestNet model, against the CIFAR-10 dataset.\n",
+    "This notebook shows how to use TensorBoard, and how the training job writes checkpoints to a external bucket.\n",
+    "The model used for this notebook is a RestNet model, trained with the CIFAR-10 dataset.\n",
     "See the following papers for more background:\n",
     "\n",
     "[Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.\n",
@@ -41,7 +19,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Let's start by setting up the environment."
+    "### Set up the environment"
    ]
   },
   {
@@ -66,7 +44,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Downloading CIFAR-10 dataset\n",
+    "### Download the CIFAR-10 dataset\n",
     "Downloading the test and training data will take around 5 minutes."
    ]
   },
@@ -95,7 +73,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Uploading the data to a S3 bucket"
+    "### Upload the data to a S3 bucket"
    ]
   },
   {
@@ -120,63 +98,36 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Complete source code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "scrolled": false
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      ".\r\n",
-      "├── __init__.py\r\n",
-      "├── __pycache__\r\n",
-      "│   └── utils.cpython-36.pyc\r\n",
-      "├── source_dir\r\n",
-      "│   ├── __init__.py\r\n",
-      "│   ├── resnet_cifar_10.py\r\n",
-      "│   └── resnet_model.py\r\n",
-      "├── tensorflow_resnet_cifar10_with_tensorboard.ipynb\r\n",
-      "└── utils.py\r\n",
-      "\r\n",
-      "2 directories, 7 files\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "!tree"
+    "### Complete source code\n",
+    "- [source_dir/resnet_model.py](source_dir/resnet_model.py): ResNet model\n",
+    "- [source_dir/resnet_cifar_10.py](source_dir/resnet_cifar_10.py): main script used for training and hosting"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Running TensorFlow training on SageMaker"
+    "## Create a training job using the sagemaker.TensorFlow estimator"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
    "source": [
     "from sagemaker.tensorflow import TensorFlow\n",
     "\n",
     "\n",
-    "sorce_dir = os.path.join(os.getcwd(), 'source_dir')\n",
+    "source_dir = os.path.join(os.getcwd(), 'source_dir')\n",
     "estimator = TensorFlow(entry_point='resnet_cifar_10.py',\n",
-    "                       source_dir=sorce_dir,\n",
+    "                       source_dir=source_dir,\n",
     "                       role=role,\n",
-    "                       hyperparameters={'training_steps': 1000, 'evaluation_steps': 100},\n",
-    "                       train_instance_count=2, train_instance_type='ml.p2.xlarge', \n",
+    "                       training_steps=1000, evaluation_steps=100,\n",
+    "                       train_instance_count=1, train_instance_type='ml.p2.xlarge', \n",
     "                       base_job_name='tensorboard-example')\n",
     "\n",
     "estimator.fit(inputs, run_tensorboard_locally=True)"
@@ -186,15 +137,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The **```fit```** method will create a training job named **```tensorboard-example-{unique identifier}```** with 2 p2 instances. These instances will be writing checkpoints to the s3 bucket **```sagemaker-{your aws account number}```**, if you don't have this bucket yet, sagemaker_session will create it for you. These checkpoints can be used for restoring the training job, and to analyze training job metrics using **TensorBoard**. \n",
+    "The **```fit```** method will create a training job named **```tensorboard-example-{unique identifier}```**in a p2 instance. That instance will write checkpoints to the s3 bucket **```sagemaker-{your aws account number}```**.\n",
+    "\n",
+    "If you don't have this bucket yet, **```sagemaker_session```** will create it for you. These checkpoints can be used for restoring the training job, and to analyze training job metrics using **TensorBoard**. \n",
     "\n",
-    "The parameter **```run_tensorboard_locally=True```** will run **TensorBoard** in the machine that this notebook is running. Everytime a new checkpoint is created by the training job in the S3 bucket, **fit** will download the checkpoint to the temp folder that **TensorBoard** is pointing to.\n",
+    "The parameter **```run_tensorboard_locally=True```** will run **TensorBoard** in the machine that this notebook is running. Everytime a new checkpoint is created by the training job in the S3 bucket, **```fit```** will download the checkpoint to the temp folder that **TensorBoard** is pointing to.\n",
     "\n",
-    "When the **```fit```** method starts the training, it will log the port that **TensorBoard** is using to display the metrics. The default port is **6006**, but another port can be choosen depending on its availability.\n",
+    "When the **```fit```** method starts the training, it will log the port that **TensorBoard** is using to display the metrics. The default port is **6006**, but another port can be choosen depending on its availability. The port number will increase until finds an available port. After that the port number will printed in stdout.\n",
     "\n",
-    "**TensorBoard** will take some minutes to start displaying metrics, depending on how long the training job container take to start their jobs.\n",
+    "It takes a few minutes to provision containers and start the training job.**TensorBoard** will start to display metrics shortly after that.\n",
     "\n",
-    "You can access **Tensorboard** locally [http://localhost:6006](http://localhost:6006) or using your SakeMaker workspace [proxy/6006](/proxy/6006)"
+    "You can access **Tensorboard** locally at [http://localhost:6006](http://localhost:6006) or using your SakeMaker workspace [proxy/6006](/proxy/6006). If TensorBoard started on a different port, adjust these URLs to match."
    ]
   },
   {
@@ -223,7 +176,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Deleting the endpoint"
+    "# Deleting the endpoint\n",
+    "**Important** "
    ]
   },
   {