Make pytorch notebooks compatible with pytorch preview. (aws#429)

nadiaya · web-flow · commit 2697b39eed21 · 2018-10-01T16:22:11.000-07:00
* Make pytorch notebooks compatible with pytorch preview. Add documentation on how to use it.

* Use full framework version.
diff --git a/sagemaker-python-sdk/pytorch_cnn_cifar10/pytorch_local_mode_cifar10.ipynb b/sagemaker-python-sdk/pytorch_cnn_cifar10/pytorch_local_mode_cifar10.ipynb
@@ -183,7 +183,9 @@
     "\n",
     "The `PyTorch` class allows us to run our training function on SageMaker. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type. For local training with GPU, we could set this to \"local_gpu\".  In this case, `instance_type` was set above based on your whether you're running a GPU instance.\n",
     "\n",
-    "After we've constructed our `PyTorch` object, we fit it using the data we uploaded to S3. Even though we're in local mode, using S3 as our data source makes sense because it maintains consistency with how SageMaker's distributed, managed training ingests data."
+    "After we've constructed our `PyTorch` object, we fit it using the data we uploaded to S3. Even though we're in local mode, using S3 as our data source makes sense because it maintains consistency with how SageMaker's distributed, managed training ingests data.\n",
+    "\n",
+    "You can try the \"Preview\" version of PyTorch by specifying ``'1.0.0.dev'`` for ``framework_version`` when creating your PyTorch estimator."
    ]
   },
   {
@@ -194,8 +196,9 @@
    "source": [
     "from sagemaker.pytorch import PyTorch\n",
     "\n",
-    "cifar10_estimator = PyTorch(entry_point=\"source/cifar10.py\",\n",
+    "cifar10_estimator = PyTorch(entry_point='source/cifar10.py',\n",
     "                            role=role,\n",
+    "                            framework_version='0.4.0',\n",
     "                            train_instance_count=1,\n",
     "                            train_instance_type=instance_type)\n",
     "\n",
diff --git a/sagemaker-python-sdk/pytorch_cnn_cifar10/source/cifar10.py b/sagemaker-python-sdk/pytorch_cnn_cifar10/source/cifar10.py
@@ -52,6 +52,7 @@ def _train(args):
         world_size = len(args.hosts)
         os.environ['WORLD_SIZE'] = str(world_size)
         host_rank = args.hosts.index(args.current_host)
+        os.environ['RANK'] = str(host_rank)
         dist.init_process_group(backend=args.dist_backend, rank=host_rank, world_size=world_size)
         logger.info(
             'Initialized the distributed environment: \'{}\' backend on {} nodes. '.format(
diff --git a/sagemaker-python-sdk/pytorch_lstm_word_language_model/pytorch_rnn.ipynb b/sagemaker-python-sdk/pytorch_lstm_word_language_model/pytorch_rnn.ipynb
@@ -171,7 +171,9 @@
    "metadata": {},
    "source": [
     "### Run training in SageMaker\n",
-    "The PyTorch class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script and source directory, an IAM role, the number of training instances, and the training instance type. In this case we will run our training job on ```ml.p2.xlarge``` instance. As you can see in this example you can also specify hyperparameters. "
+    "The PyTorch class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script and source directory, an IAM role, the number of training instances, and the training instance type. In this case we will run our training job on ```ml.p2.xlarge``` instance. As you can see in this example you can also specify hyperparameters. \n",
+    "\n",
+    "You can try the \"Preview\" version of PyTorch by specifying ``'1.0.0.dev'`` for ``framework_version`` when creating your PyTorch estimator."
    ]
   },
   {
@@ -182,7 +184,7 @@
    "source": [
     "from sagemaker.pytorch import PyTorch\n",
     "\n",
-    "estimator = PyTorch(entry_point=\"train.py\",\n",
+    "estimator = PyTorch(entry_point='train.py',\n",
     "                    role=role,\n",
     "                    framework_version='0.4.0',\n",
     "                    train_instance_count=1,\n",
diff --git a/sagemaker-python-sdk/pytorch_mnist/mnist.py b/sagemaker-python-sdk/pytorch_mnist/mnist.py
@@ -63,7 +63,7 @@ def _average_gradients(model):
     # Gradient averaging.
     size = float(dist.get_world_size())
     for param in model.parameters():
-        dist.all_reduce(param.grad.data, op=dist.reduce_op.SUM, group=0)
+        dist.all_reduce(param.grad.data, op=dist.reduce_op.SUM)
         param.grad.data /= size
 
 
@@ -80,6 +80,7 @@ def train(args):
         world_size = len(args.hosts)
         os.environ['WORLD_SIZE'] = str(world_size)
         host_rank = args.hosts.index(args.current_host)
+        os.environ['RANK'] = str(host_rank)
         dist.init_process_group(backend=args.backend, rank=host_rank, world_size=world_size)
         logger.info('Initialized the distributed environment: \'{}\' backend on {} nodes. '.format(
             args.backend, dist.get_world_size()) + 'Current host rank is {}. Number of gpus: {}'.format(
diff --git a/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb b/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb
@@ -143,7 +143,9 @@
    "source": [
     "### Run training in SageMaker\n",
     "\n",
-    "The `PyTorch` class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, the training instance type, and hyperparameters. In this case we are going to run our training job on 2 ```ml.c4.xlarge``` instances. But this example can be ran on one or multiple, cpu or gpu instances ([full list of available instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)). The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the `mnist.py` script above."
+    "The `PyTorch` class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, the training instance type, and hyperparameters. In this case we are going to run our training job on 2 ```ml.c4.xlarge``` instances. But this example can be ran on one or multiple, cpu or gpu instances ([full list of available instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)). The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the `mnist.py` script above.\n",
+    "\n",
+    "You can try the \"Preview\" version of PyTorch by specifying ``'1.0.0.dev'`` for ``framework_version`` when creating your PyTorch estimator."
    ]
   },
   {
@@ -154,7 +156,7 @@
    "source": [
     "from sagemaker.pytorch import PyTorch\n",
     "\n",
-    "estimator = PyTorch(entry_point=\"mnist.py\",\n",
+    "estimator = PyTorch(entry_point='mnist.py',\n",
     "                    role=role,\n",
     "                    framework_version='0.4.0',\n",
     "                    train_instance_count=2,\n",