aws · BasilBeirouti · Sep 2, 2022 · Aug 19, 2022 · Aug 19, 2022 · Aug 19, 2022
@@ -19,6 +19,35 @@ The SageMaker model parallel library internally uses MPI.
 To use model parallelism, both ``smdistributed`` and MPI must be enabled
 through the ``distribution`` parameter.
 
+The following code example is a template of setting up model parallelism for a PyTorch estimator.
+
+.. code:: python
+
+  import sagemaker
+  from sagemaker.pytorch import PyTorch
+
+  smp_options = {
+      "enabled":True,
+      "parameters": {
+          ...
+      }
+  }
+
+  mpi_options = {
+      "enabled" : True,
+      ...
+  }
+
+  smdmp_estimator = PyTorch(
+      ...
+      distribution={
+          "smdistributed": {"modelparallel": smp_options},
+          "mpi": mpi_options
+      }
+  )
+
+  smdmp_estimator.fit()
+
 .. tip::
 
   This page provides you a complete list of parameters you can use
@@ -214,6 +243,34 @@ PyTorch-specific Parameters
     - False
     - Skips the initial tracing step. This can be useful in very large models
       where even model tracing at the CPU is not possible due to memory constraints.
+  * - ``sharded_data_parallel_degree`` (**smdistributed-modelparallel**>=v1.11)
+    - int
+    - 1
+    - To run a training job using sharded data parallelism, add this parameter and specify a number greater than 1.
+      Sharded data parallelism is a memory-saving distributed training technique that splits the training state of a model (model parameters, gradients, and optimizer states) across GPUs in a data parallel group.
+      For more information, see `Sharded Data Parallelism
+      <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html>`_.
+  * - ``sdp_reduce_bucket_size`` (**smdistributed-modelparallel**>=v1.11)
+    - int
+    - 5e8
+    - Configuration parameter for sharded data parallelism (for ``sharded_data_parallel_degree > 2``).
+      Specifies the size of PyTorch DDP gradient buckets in number of elements of the default dtype.
+  * - ``sdp_param_persistence_threshold`` (**smdistributed-modelparallel**>=v1.11)
+    - int
+    - 1e6
+    -  Specifies the size of a parameter tensor in number of elements that can persist at each GPU. Sharded data parallelism splits each parameter tensor across GPUs of a data parallel group. If the number of elements in the parameter tensor is smaller than this threshold, the parameter tensor is not split; this helps reduce communication overhead because the parameter tensor is replicated across data-parallel GPUs.
+  * - ``sdp_max_live_parameters`` (**smdistributed-modelparallel**>=v1.11)
+    - int
+    - 1e9
+    - Specifies the maximum number of parameters that can simultaneously be in a recombined training state during the forward and backward pass. Parameter fetching with the AllGather operation pauses when the number of active parameters reaches the given threshold. Note that increasing this parameter increases the memory footprint.
+  * - ``sdp_hierarchical_allgather`` (**smdistributed-modelparallel**>=v1.11)
+    - bool
+    - True
+    - If set to True, the AllGather operation runs hierarchically: it runs within each node first, and then runs across nodes. For multi-node distributed training jobs, the hierarchical AllGather operation is automatically activated.
+  * - ``sdp_gradient_clipping`` (**smdistributed-modelparallel**>=v1.11)
+    - float
+    - 1.0
+    - Specifies a threshold for gradient clipping the L2 norm of the gradients before propagating them backward through the model parameters. When sharded data parallelism is activated, gradient clipping is also activated. The default threshold is 1.0. Adjust this parameter if you have the exploding gradients problem.
 
 
 Parameters for ``mpi``

@@ -5,9 +5,84 @@ Release Notes
 New features, bug fixes, and improvements are regularly made to the SageMaker
 distributed model parallel library.
 
-SageMaker Distributed Model Parallel 1.10.0 Release Notes
+
+SageMaker Distributed Model Parallel 1.11.0 Release Notes
 =========================================================
 
+*Date: August. 17. 2022*
+
+**New Features**
+
+The following new features are added for PyTorch.
+
+* The library implements sharded data parallelism, which is a memory-saving
+  distributed training technique that splits the training state of a model
+  (model parameters, gradients, and optimizer states) across data parallel groups.
+  With sharded data parallelism, you can reduce the per-GPU memory footprint of
+  a model by sharding the training state over multiple GPUs. To learn more,
+  see `Sharded Data Parallelism
+  <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html>`_
+  in the *Amazon SageMaker Developer Guide*.
+
+**Migration to AWS Deep Learning Containers**
+
+This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
+
+- DLC for PyTorch 1.12.0
+
+  .. code::
+
+    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.12.0-gpu-py38-cu113-ubuntu20.04-sagemaker
+
+Binary file of this version of the library for `custom container
+<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
+
+- For PyTorch 1.12.0
+
+  .. code::
+
+    https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.0/build-artifacts/2022-08-12-16-58/smdistributed_modelparallel-1.11.0-cp38-cp38-linux_x86_64.whl
+
+----
+
+Release History
+===============
+
+SageMaker Distributed Model Parallel 1.10.1 Release Notes
+---------------------------------------------------------
+
+*Date: August. 8. 2022*
+
+**Currency Updates**
+
+* Added support for Transformers v4.21.
+
+
+**Migration to AWS Deep Learning Containers**
+
+This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
+
+- DLC for PyTorch 1.11.0
+
+  .. code::
+
+    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
+
+
+Binary file of this version of the library for `custom container
+<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
+
+- For PyTorch 1.11.0
+
+  .. code::
+
+    https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-07-28-23-07/smdistributed_modelparallel-1.10.1-cp38-cp38-linux_x86_64.whl
+
+
+
+SageMaker Distributed Model Parallel 1.10.0 Release Notes
+---------------------------------------------------------
+
 *Date: July. 19. 2022*
 
 **New Features**
@@ -62,10 +137,6 @@ Binary file of this version of the library for `custom container
 
     https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.0/build-artifacts/2022-07-11-19-23/smdistributed_modelparallel-1.10.0-cp38-cp38-linux_x86_64.whl
 
-----
-
-Release History
-===============
 
 SageMaker Distributed Model Parallel 1.9.0 Release Notes
 --------------------------------------------------------

@@ -3,6 +3,7 @@
 .. toctree::
     :maxdepth: 1
 
+    v1_10_0.rst
     v1_9_0.rst
     v1_6_0.rst
     v1_5_0.rst

@@ -10,7 +10,7 @@ depending on which version of the library you need to use.
 To use the library, reference the
 **Common API** documentation alongside the framework specific API documentation.
 
-Version 1.10.0 (Latest)
+Version 1.11.0 (Latest)
 ===========================================
 
 To use the library, reference the Common API documentation alongside the framework specific API documentation.