Skip to content

Commit a2bf7cf

Browse files
authored
[SageMaker Data Parallel] Upgrade the TF2 version to 2.4.1 and also update the py_version to 3.7 (#2069)
1 parent e31197e commit a2bf7cf

File tree

5 files changed

+12
-12
lines changed

5 files changed

+12
-12
lines changed

training/distributed_training/tensorflow/data_parallel/bert/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
ARG region
22

3-
FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-training:2.3.1-gpu-py37-cu110-ubuntu18.04
3+
FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-training:2.4.1-gpu-py37-cu110-ubuntu18.04
44

55
RUN pip --no-cache-dir --no-cache install \
66
scikit-learn==0.23.1 \

training/distributed_training/tensorflow/data_parallel/bert/tensorflow2_smdataparallel_bert_demo.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"\n",
99
"HSMDataParallel is a new capability in Amazon SageMaker to train deep learning models faster and cheaper. SMDataParallel is a distributed data parallel training framework for TensorFlow, PyTorch, and MXNet.\n",
1010
"\n",
11-
"This notebook example shows how to use SMDataParallel with TensorFlow(version 2.3.1) on [Amazon SageMaker](https://aws.amazon.com/sagemaker/) to train a BERT model using [Amazon FSx for Lustre file-system](https://aws.amazon.com/fsx/lustre/) as data source.\n",
11+
"This notebook example shows how to use SMDataParallel with TensorFlow(version 2.4.1) on [Amazon SageMaker](https://aws.amazon.com/sagemaker/) to train a BERT model using [Amazon FSx for Lustre file-system](https://aws.amazon.com/fsx/lustre/) as data source.\n",
1212
"\n",
1313
"The outline of steps is as follows:\n",
1414
"\n",
@@ -244,8 +244,8 @@
244244
" role=role,\n",
245245
" image_uri=docker_image,\n",
246246
" source_dir='deep-learning-models/models/nlp',\n",
247-
" framework_version='2.3.1',\n",
248-
" py_version='py3',\n",
247+
" framework_version='2.4.1',\n",
248+
" py_version='py37',\n",
249249
" instance_count=instance_count,\n",
250250
" instance_type=instance_type,\n",
251251
" sagemaker_session=sagemaker_session,\n",
@@ -315,4 +315,4 @@
315315
},
316316
"nbformat": 4,
317317
"nbformat_minor": 4
318-
}
318+
}

training/distributed_training/tensorflow/data_parallel/maskrcnn/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
ARG region
22

3-
FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-training:2.3.1-gpu-py37-cu110-ubuntu18.04
3+
FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-training:2.4.1-gpu-py37-cu110-ubuntu18.04
44

55
RUN pip --no-cache-dir --no-cache install \
66
Cython \

training/distributed_training/tensorflow/data_parallel/maskrcnn/tensorflow2_smdataparallel_maskrcnn_demo.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"\n",
99
"SMDataParallel is a new capability in Amazon SageMaker to train deep learning models faster and cheaper. SMDataParallel is a distributed data parallel training framework for TensorFlow, PyTorch, and MXNet.\n",
1010
"\n",
11-
"This notebook example shows how to use SMDataParallel with TensorFlow(version 2.3.1) on [Amazon SageMaker](https://aws.amazon.com/sagemaker/) to train a MaskRCNN model on [COCO 2017 dataset](https://cocodataset.org/#home) using [Amazon FSx for Lustre file-system](https://aws.amazon.com/fsx/lustre/) as data source.\n",
11+
"This notebook example shows how to use SMDataParallel with TensorFlow(version 2.4.1) on [Amazon SageMaker](https://aws.amazon.com/sagemaker/) to train a MaskRCNN model on [COCO 2017 dataset](https://cocodataset.org/#home) using [Amazon FSx for Lustre file-system](https://aws.amazon.com/fsx/lustre/) as data source.\n",
1212
"\n",
1313
"The outline of steps is as follows:\n",
1414
"\n",
@@ -238,8 +238,8 @@
238238
" role=role,\n",
239239
" image_uri=docker_image,\n",
240240
" source_dir='.',\n",
241-
" framework_version='2.3.1',\n",
242-
" py_version='py3',\n",
241+
" framework_version='2.4.1',\n",
242+
" py_version='py37',\n",
243243
" instance_count=instance_count,\n",
244244
" instance_type=instance_type,\n",
245245
" sagemaker_session=sagemaker_session,\n",
@@ -323,4 +323,4 @@
323323
},
324324
"nbformat": 4,
325325
"nbformat_minor": 4
326-
}
326+
}

training/distributed_training/tensorflow/data_parallel/mnist/tensorflow2_smdataparallel_mnist_demo.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@
104104
" entry_point='train_tensorflow_smdataparallel_mnist.py',\n",
105105
" role=role,\n",
106106
" py_version='py37',\n",
107-
" framework_version='2.3.1',\n",
107+
" framework_version='2.4.1',\n",
108108
" # For training with multinode distributed training, set this count. Example: 2\n",
109109
" instance_count=2,\n",
110110
" # For training with p3dn instance use - ml.p3dn.24xlarge, with p4dn instance use - ml.p4d.24xlarge\n",
@@ -170,4 +170,4 @@
170170
},
171171
"nbformat": 4,
172172
"nbformat_minor": 4
173-
}
173+
}

0 commit comments

Comments
 (0)