Skip to content

Commit 3b45257

Browse files
documentation: the SageMaker model parallel library 1.11.0 release (#3321)
* archive v1.10.0 doc and add v1.11.0 release note * bump version numbers and add noindex * fix index errors and conflicts * revert setup js model def * fix dates * add new params for estimator distribution * minor fix * add description for the new sdp degree param * fix conf file Co-authored-by: Aaron Markham <[email protected]>
1 parent 853e4d8 commit 3b45257

File tree

9 files changed

+2643
-6
lines changed

9 files changed

+2643
-6
lines changed

doc/api/training/smd_model_parallel_general.rst

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,35 @@ The SageMaker model parallel library internally uses MPI.
1919
To use model parallelism, both ``smdistributed`` and MPI must be enabled
2020
through the ``distribution`` parameter.
2121

22+
The following code example is a template of setting up model parallelism for a PyTorch estimator.
23+
24+
.. code:: python
25+
26+
import sagemaker
27+
from sagemaker.pytorch import PyTorch
28+
29+
smp_options = {
30+
"enabled":True,
31+
"parameters": {
32+
...
33+
}
34+
}
35+
36+
mpi_options = {
37+
"enabled" : True,
38+
...
39+
}
40+
41+
smdmp_estimator = PyTorch(
42+
...
43+
distribution={
44+
"smdistributed": {"modelparallel": smp_options},
45+
"mpi": mpi_options
46+
}
47+
)
48+
49+
smdmp_estimator.fit()
50+
2251
.. tip::
2352

2453
This page provides you a complete list of parameters you can use
@@ -214,6 +243,34 @@ PyTorch-specific Parameters
214243
- False
215244
- Skips the initial tracing step. This can be useful in very large models
216245
where even model tracing at the CPU is not possible due to memory constraints.
246+
* - ``sharded_data_parallel_degree`` (**smdistributed-modelparallel**>=v1.11)
247+
- int
248+
- 1
249+
- To run a training job using sharded data parallelism, add this parameter and specify a number greater than 1.
250+
Sharded data parallelism is a memory-saving distributed training technique that splits the training state of a model (model parameters, gradients, and optimizer states) across GPUs in a data parallel group.
251+
For more information, see `Sharded Data Parallelism
252+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html>`_.
253+
* - ``sdp_reduce_bucket_size`` (**smdistributed-modelparallel**>=v1.11)
254+
- int
255+
- 5e8
256+
- Configuration parameter for sharded data parallelism (for ``sharded_data_parallel_degree > 2``).
257+
Specifies the size of PyTorch DDP gradient buckets in number of elements of the default dtype.
258+
* - ``sdp_param_persistence_threshold`` (**smdistributed-modelparallel**>=v1.11)
259+
- int
260+
- 1e6
261+
- Specifies the size of a parameter tensor in number of elements that can persist at each GPU. Sharded data parallelism splits each parameter tensor across GPUs of a data parallel group. If the number of elements in the parameter tensor is smaller than this threshold, the parameter tensor is not split; this helps reduce communication overhead because the parameter tensor is replicated across data-parallel GPUs.
262+
* - ``sdp_max_live_parameters`` (**smdistributed-modelparallel**>=v1.11)
263+
- int
264+
- 1e9
265+
- Specifies the maximum number of parameters that can simultaneously be in a recombined training state during the forward and backward pass. Parameter fetching with the AllGather operation pauses when the number of active parameters reaches the given threshold. Note that increasing this parameter increases the memory footprint.
266+
* - ``sdp_hierarchical_allgather`` (**smdistributed-modelparallel**>=v1.11)
267+
- bool
268+
- True
269+
- If set to True, the AllGather operation runs hierarchically: it runs within each node first, and then runs across nodes. For multi-node distributed training jobs, the hierarchical AllGather operation is automatically activated.
270+
* - ``sdp_gradient_clipping`` (**smdistributed-modelparallel**>=v1.11)
271+
- float
272+
- 1.0
273+
- Specifies a threshold for gradient clipping the L2 norm of the gradients before propagating them backward through the model parameters. When sharded data parallelism is activated, gradient clipping is also activated. The default threshold is 1.0. Adjust this parameter if you have the exploding gradients problem.
217274

218275

219276
Parameters for ``mpi``

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst

Lines changed: 76 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,84 @@ Release Notes
55
New features, bug fixes, and improvements are regularly made to the SageMaker
66
distributed model parallel library.
77

8-
SageMaker Distributed Model Parallel 1.10.0 Release Notes
8+
9+
SageMaker Distributed Model Parallel 1.11.0 Release Notes
910
=========================================================
1011

12+
*Date: August. 17. 2022*
13+
14+
**New Features**
15+
16+
The following new features are added for PyTorch.
17+
18+
* The library implements sharded data parallelism, which is a memory-saving
19+
distributed training technique that splits the training state of a model
20+
(model parameters, gradients, and optimizer states) across data parallel groups.
21+
With sharded data parallelism, you can reduce the per-GPU memory footprint of
22+
a model by sharding the training state over multiple GPUs. To learn more,
23+
see `Sharded Data Parallelism
24+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html>`_
25+
in the *Amazon SageMaker Developer Guide*.
26+
27+
**Migration to AWS Deep Learning Containers**
28+
29+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
30+
31+
- DLC for PyTorch 1.12.0
32+
33+
.. code::
34+
35+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.12.0-gpu-py38-cu113-ubuntu20.04-sagemaker
36+
37+
Binary file of this version of the library for `custom container
38+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
39+
40+
- For PyTorch 1.12.0
41+
42+
.. code::
43+
44+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.0/build-artifacts/2022-08-12-16-58/smdistributed_modelparallel-1.11.0-cp38-cp38-linux_x86_64.whl
45+
46+
----
47+
48+
Release History
49+
===============
50+
51+
SageMaker Distributed Model Parallel 1.10.1 Release Notes
52+
---------------------------------------------------------
53+
54+
*Date: August. 8. 2022*
55+
56+
**Currency Updates**
57+
58+
* Added support for Transformers v4.21.
59+
60+
61+
**Migration to AWS Deep Learning Containers**
62+
63+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
64+
65+
- DLC for PyTorch 1.11.0
66+
67+
.. code::
68+
69+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
70+
71+
72+
Binary file of this version of the library for `custom container
73+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
74+
75+
- For PyTorch 1.11.0
76+
77+
.. code::
78+
79+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-07-28-23-07/smdistributed_modelparallel-1.10.1-cp38-cp38-linux_x86_64.whl
80+
81+
82+
83+
SageMaker Distributed Model Parallel 1.10.0 Release Notes
84+
---------------------------------------------------------
85+
1186
*Date: July. 19. 2022*
1287

1388
**New Features**
@@ -62,10 +137,6 @@ Binary file of this version of the library for `custom container
62137
63138
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.0/build-artifacts/2022-07-11-19-23/smdistributed_modelparallel-1.10.0-cp38-cp38-linux_x86_64.whl
64139
65-
----
66-
67-
Release History
68-
===============
69140
70141
SageMaker Distributed Model Parallel 1.9.0 Release Notes
71142
--------------------------------------------------------

doc/api/training/smp_versions/archives.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
.. toctree::
44
:maxdepth: 1
55

6+
v1_10_0.rst
67
v1_9_0.rst
78
v1_6_0.rst
89
v1_5_0.rst

doc/api/training/smp_versions/latest.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ depending on which version of the library you need to use.
1010
To use the library, reference the
1111
**Common API** documentation alongside the framework specific API documentation.
1212

13-
Version 1.10.0 (Latest)
13+
Version 1.11.0 (Latest)
1414
===========================================
1515

1616
To use the library, reference the Common API documentation alongside the framework specific API documentation.

0 commit comments

Comments
 (0)