Skip to content

Commit 2697b39

Browse files
authored
Make pytorch notebooks compatible with pytorch preview. (aws#429)
* Make pytorch notebooks compatible with pytorch preview. Add documentation on how to use it. * Use full framework version.
1 parent a2011f1 commit 2697b39

File tree

5 files changed

+16
-7
lines changed

5 files changed

+16
-7
lines changed

sagemaker-python-sdk/pytorch_cnn_cifar10/pytorch_local_mode_cifar10.ipynb

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,9 @@
183183
"\n",
184184
"The `PyTorch` class allows us to run our training function on SageMaker. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type. For local training with GPU, we could set this to \"local_gpu\". In this case, `instance_type` was set above based on your whether you're running a GPU instance.\n",
185185
"\n",
186-
"After we've constructed our `PyTorch` object, we fit it using the data we uploaded to S3. Even though we're in local mode, using S3 as our data source makes sense because it maintains consistency with how SageMaker's distributed, managed training ingests data."
186+
"After we've constructed our `PyTorch` object, we fit it using the data we uploaded to S3. Even though we're in local mode, using S3 as our data source makes sense because it maintains consistency with how SageMaker's distributed, managed training ingests data.\n",
187+
"\n",
188+
"You can try the \"Preview\" version of PyTorch by specifying ``'1.0.0.dev'`` for ``framework_version`` when creating your PyTorch estimator."
187189
]
188190
},
189191
{
@@ -194,8 +196,9 @@
194196
"source": [
195197
"from sagemaker.pytorch import PyTorch\n",
196198
"\n",
197-
"cifar10_estimator = PyTorch(entry_point=\"source/cifar10.py\",\n",
199+
"cifar10_estimator = PyTorch(entry_point='source/cifar10.py',\n",
198200
" role=role,\n",
201+
" framework_version='0.4.0',\n",
199202
" train_instance_count=1,\n",
200203
" train_instance_type=instance_type)\n",
201204
"\n",

sagemaker-python-sdk/pytorch_cnn_cifar10/source/cifar10.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ def _train(args):
5252
world_size = len(args.hosts)
5353
os.environ['WORLD_SIZE'] = str(world_size)
5454
host_rank = args.hosts.index(args.current_host)
55+
os.environ['RANK'] = str(host_rank)
5556
dist.init_process_group(backend=args.dist_backend, rank=host_rank, world_size=world_size)
5657
logger.info(
5758
'Initialized the distributed environment: \'{}\' backend on {} nodes. '.format(

sagemaker-python-sdk/pytorch_lstm_word_language_model/pytorch_rnn.ipynb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,9 @@
171171
"metadata": {},
172172
"source": [
173173
"### Run training in SageMaker\n",
174-
"The PyTorch class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script and source directory, an IAM role, the number of training instances, and the training instance type. In this case we will run our training job on ```ml.p2.xlarge``` instance. As you can see in this example you can also specify hyperparameters. "
174+
"The PyTorch class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script and source directory, an IAM role, the number of training instances, and the training instance type. In this case we will run our training job on ```ml.p2.xlarge``` instance. As you can see in this example you can also specify hyperparameters. \n",
175+
"\n",
176+
"You can try the \"Preview\" version of PyTorch by specifying ``'1.0.0.dev'`` for ``framework_version`` when creating your PyTorch estimator."
175177
]
176178
},
177179
{
@@ -182,7 +184,7 @@
182184
"source": [
183185
"from sagemaker.pytorch import PyTorch\n",
184186
"\n",
185-
"estimator = PyTorch(entry_point=\"train.py\",\n",
187+
"estimator = PyTorch(entry_point='train.py',\n",
186188
" role=role,\n",
187189
" framework_version='0.4.0',\n",
188190
" train_instance_count=1,\n",

sagemaker-python-sdk/pytorch_mnist/mnist.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ def _average_gradients(model):
6363
# Gradient averaging.
6464
size = float(dist.get_world_size())
6565
for param in model.parameters():
66-
dist.all_reduce(param.grad.data, op=dist.reduce_op.SUM, group=0)
66+
dist.all_reduce(param.grad.data, op=dist.reduce_op.SUM)
6767
param.grad.data /= size
6868

6969

@@ -80,6 +80,7 @@ def train(args):
8080
world_size = len(args.hosts)
8181
os.environ['WORLD_SIZE'] = str(world_size)
8282
host_rank = args.hosts.index(args.current_host)
83+
os.environ['RANK'] = str(host_rank)
8384
dist.init_process_group(backend=args.backend, rank=host_rank, world_size=world_size)
8485
logger.info('Initialized the distributed environment: \'{}\' backend on {} nodes. '.format(
8586
args.backend, dist.get_world_size()) + 'Current host rank is {}. Number of gpus: {}'.format(

sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,9 @@
143143
"source": [
144144
"### Run training in SageMaker\n",
145145
"\n",
146-
"The `PyTorch` class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, the training instance type, and hyperparameters. In this case we are going to run our training job on 2 ```ml.c4.xlarge``` instances. But this example can be ran on one or multiple, cpu or gpu instances ([full list of available instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)). The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the `mnist.py` script above."
146+
"The `PyTorch` class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, the training instance type, and hyperparameters. In this case we are going to run our training job on 2 ```ml.c4.xlarge``` instances. But this example can be ran on one or multiple, cpu or gpu instances ([full list of available instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)). The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the `mnist.py` script above.\n",
147+
"\n",
148+
"You can try the \"Preview\" version of PyTorch by specifying ``'1.0.0.dev'`` for ``framework_version`` when creating your PyTorch estimator."
147149
]
148150
},
149151
{
@@ -154,7 +156,7 @@
154156
"source": [
155157
"from sagemaker.pytorch import PyTorch\n",
156158
"\n",
157-
"estimator = PyTorch(entry_point=\"mnist.py\",\n",
159+
"estimator = PyTorch(entry_point='mnist.py',\n",
158160
" role=role,\n",
159161
" framework_version='0.4.0',\n",
160162
" train_instance_count=2,\n",

0 commit comments

Comments
 (0)