Skip to content

Commit 130e8f5

Browse files
authored
add note about instance limit (aws#102)
1 parent 4e495bc commit 130e8f5

File tree

2 files changed

+7
-5
lines changed

2 files changed

+7
-5
lines changed

sagemaker-python-sdk/mxnet_gluon_cifar10/cifar10.ipynb

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,10 @@
9595
"source": [
9696
"## Run the training script on SageMaker\n",
9797
"\n",
98-
"The ```MXNet``` class allows us to run our training function as a distributed training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type. In this case we will run our training job on two `ml.p2.xlarge` instances."
98+
"The ```MXNet``` class allows us to run our training function as a distributed training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type. In this case we will run our training job on two `ml.p2.xlarge` instances.\n",
99+
"\n",
100+
"**Note:** you may need to request a limit increase in order to use two ``ml.p2.xlarge`` instances. If you \n",
101+
"want to try the example without requesting an increase, just change the ``train_instance_count`` value to ``1``."
99102
]
100103
},
101104
{

sagemaker-python-sdk/mxnet_gluon_cifar10/cifar10.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
from __future__ import print_function
22

3+
import json
34
import logging
5+
import os
46
import time
57

6-
import json
78
import mxnet as mx
8-
import numpy as np
9-
import os
109
from mxnet import autograd as ag
1110
from mxnet import gluon
1211
from mxnet.gluon.model_zoo import vision as models
@@ -28,7 +27,7 @@ def train(current_host, hosts, num_cpus, num_gpus, channel_input_dirs, model_dir
2827
if len(hosts) == 1:
2928
kvstore = 'device' if num_gpus > 0 else 'local'
3029
else:
31-
kvstore = 'dist_sync' # TODO retest 'dist_sync_device'
30+
kvstore = 'dist_device_sync'
3231

3332
ctx = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()]
3433
net = models.get_model('resnet34_v2', ctx=ctx, pretrained=False, classes=10)

0 commit comments

Comments
 (0)