Skip to content

Commit 27c2c39

Browse files
committed
2 parents 3b474c2 + a739945 commit 27c2c39

30 files changed

+1727
-129
lines changed

CHANGELOG.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,36 @@
11
# Changelog
22

3+
## v2.24.1 (2021-01-28)
4+
5+
### Bug Fixes and Other Changes
6+
7+
* fix collect-tests tox env
8+
* create profiler specific unsupported regions
9+
* Update smd_model_parallel_pytorch.rst
10+
11+
## v2.24.0 (2021-01-22)
12+
13+
### Features
14+
15+
* add support for Std:Join for pipelines
16+
* Map image name to image uri
17+
* friendly names for short URIs
18+
19+
### Bug Fixes and Other Changes
20+
21+
* increase allowed time for search to get updated
22+
* refactor distribution config construction
23+
24+
### Documentation Changes
25+
26+
* Add SMP 1.2.0 API docs
27+
28+
## v2.23.6 (2021-01-20)
29+
30+
### Bug Fixes and Other Changes
31+
32+
* add artifact, action, context to virsualizer
33+
334
## v2.23.5 (2021-01-18)
435

536
### Bug Fixes and Other Changes

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.23.6.dev0
1+
2.24.2.dev0

doc/api/training/smd_data_parallel.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
###################################
1+
##########################
22
Distributed data parallel
3-
###################################
3+
##########################
44

55
SageMaker's distributed data parallel library extends SageMaker’s training
66
capabilities on deep learning models with near-linear scaling efficiency,
@@ -68,8 +68,8 @@ model.
6868
.. toctree::
6969
:maxdepth: 2
7070

71-
smd_data_parallel_pytorch
72-
smd_data_parallel_tensorflow
71+
sdp_versions/smd_data_parallel_pytorch
72+
sdp_versions/smd_data_parallel_pytorch
7373

7474
Latest Updates
7575
==============

doc/api/training/smd_model_parallel.rst

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,20 +20,31 @@ Use the following sections to learn more about the model parallelism and the lib
2020
<https://integ-docs-aws.amazon.com/sagemaker/latest/dg/model-parallel-use-api.html#model-parallel-customize-container>`__
2121
for more information.
2222

23-
How to Use this Guide
24-
=====================
23+
Use with the SageMaker Python SDK
24+
=================================
25+
26+
Use the following page to learn how to configure and enable distributed model parallel
27+
when you configure an Amazon SageMaker Python SDK `Estimator`.
28+
29+
.. toctree::
30+
:maxdepth: 1
31+
32+
smd_model_parallel_general
33+
34+
API Documentation
35+
=================
2536

2637
The library contains a Common API that is shared across frameworks, as well as APIs
27-
that are specific to supported frameworks, TensorFlow and PyTorch. To use the library, reference the
38+
that are specific to supported frameworks, TensorFlow and PyTorch.
39+
40+
Select a version to see the API documentation for version. To use the library, reference the
2841
**Common API** documentation alongside the framework specific API documentation.
2942

3043
.. toctree::
3144
:maxdepth: 1
3245

33-
smd_model_parallel_general
34-
smd_model_parallel_common_api
35-
smd_model_parallel_pytorch
36-
smd_model_parallel_tensorflow
46+
smp_versions/v1_2_0.rst
47+
smp_versions/v1_1_0.rst
3748

3849
It is recommended to use this documentation alongside `SageMaker Distributed Model Parallel
3950
<http://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel.html>`__ in the Amazon SageMaker
@@ -49,11 +60,11 @@ developer guide. This developer guide documentation includes:
4960
- `Configuration tips and pitfalls
5061
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-tips-pitfalls.html>`__
5162

52-
Latest Updates
53-
==============
5463

55-
New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library.
64+
Release Notes
65+
=============
5666

67+
New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library.
5768
To see the the latest changes made to the library, refer to the library
5869
`Release Notes
5970
<https://github.com/aws/sagemaker-python-sdk/blob/master/doc/api/training/smd_model_parallel_release_notes/>`_.

doc/api/training/smd_model_parallel_general.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
.. _sm-sdk-modelparallel-params:
77

88
SageMaker Python SDK ``modelparallel`` parameters
9-
-------------------------------------------------
9+
=================================================
1010
1111
The TensorFlow and PyTorch ``Estimator`` objects contains a ``distribution`` parameter,
1212
which is used to enable and specify parameters for the
@@ -306,7 +306,7 @@ table are optional.
306306
.. _ranking-basics:
307307

308308
Ranking Basics
309-
--------------
309+
==============
310310

311311
The library maintains a one-to-one mapping between processes and available GPUs:
312312
for each GPU, there is a corresponding CPU process. Each CPU process

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,44 @@
1+
# Sagemaker Distributed Model Parallel 1.2.0 Release Notes
2+
3+
- New Features
4+
- Bug Fixes
5+
- Known Issues
6+
7+
## New Features
8+
9+
### PyTorch
10+
11+
#### Add support for PyTorch 1.7
12+
13+
- Adds support for `gradient_as_bucket_view` (PyTorch 1.7 only), `find_unused_parameters` (PyTorch 1.7 only) and `broadcast_buffers` options to `smp.DistributedModel`. These options behave the same as the corresponding options (with the same names) in
14+
`torch.DistributedDataParallel` API. Please refer to the [SageMaker distributed model parallel API documentation](https://sagemaker.readthedocs.io/en/stable/api/training/smd_model_parallel_pytorch.html#smp.DistributedModel) for more information.
15+
16+
- Adds support for `join` (PyTorch 1.7 only) context manager, which is to be used in conjunction with an instance of `smp.DistributedModel` to be able to train with uneven inputs across participating processes.
17+
18+
- Adds support for `_register_comm_hook` (PyTorch 1.7 only) which will register the callable as a communication hook for DDP. NOTE: Like in DDP, this is an experimental API and subject to change.
19+
20+
### Tensorflow
21+
22+
- Adds support for Tensorflow 2.4
23+
24+
## Bug Fixes
25+
26+
### PyTorch
27+
28+
- `Serialization`: Fix a bug with serialization/flattening where instances of subclasses of dict/OrderedDicts were serialized/deserialized or internally flattened/unflattened as
29+
regular dicts.
30+
31+
### Tensorflow
32+
33+
- Fix a bug that may cause a hang during evaluation when there is no model input for one partition.
34+
35+
## Known Issues
36+
37+
### PyTorch
38+
39+
- A performance regression was observed when training on SMP with PyTorch 1.7.1 compared to 1.6. The rootcause was found to be the slowdown in performance of `.grad` method calls in PyTorch 1.7.1 compared to 1.6. Please see the related discussion: https://github.com/pytorch/pytorch/issues/50636.
40+
41+
142
# Sagemaker Distributed Model Parallel 1.1.0 Release Notes
243

344
- New Features

0 commit comments

Comments
 (0)