smp v1.15.0 release note

mchoi8739 · mchoi8739 · commit d57c5b2bb0d9 · 2023-04-28T15:22:07.000-07:00
diff --git a/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst b/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst
@@ -3,21 +3,47 @@ Release Notes
 #############
 
 New features, bug fixes, and improvements are regularly made to the SageMaker
-distributed model parallel library.
+distributed model parallelism library.
 
 
-SageMaker Distributed Model Parallel 1.14.0 Release Notes
+SageMaker Distributed Model Parallel 1.15.0 Release Notes
 =========================================================
 
-*Date: Jan. 30. 2023*
+*Date: Apr. 25. 2023*
 
 **Currency Updates**
 
-* Added support for PyTorch v1.13.1
+* Added support for PyTorch v2.0.0.
+  However, the library does not support ``torch.compile`` at this release.
 
-**Improvements**
+**New Features**
 
-* Upgraded the flash-attention (https://github.com/HazyResearch/flash-attention) library to  v0.2.6.post1
+* Using sharded data parallelism with tensor parallelism together is now 
+  available for PyTorch 1.13.1. It allows you to train with smaller global batch 
+  sizes while scaling up to large clusters. For more information, see `Sharded 
+  data parallelism with tensor parallelism <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html#model-parallel-extended-features-pytorch-sharded-data-parallelism-with-tensor-parallelism>`_ 
+  in the *Amazon SageMaker Developer Guide*.
+* Added support for saving and loading full model checkpoints when using sharded 
+  data parallelism. This is enabled by using the standard checkpointing API, 
+  ``smp.save_checkpoint`` with ``partial=False``. 
+  Before, full checkpoints needed to be created by merging partial checkpoint 
+  files after training finishes. 
+* ``DistributedTransformer`` now supports the ALiBi position embeddings. 
+  When using DistributedTransformer, you can set the ``use_alibi`` parameter 
+  to ``True`` to use the Triton-based flash attention kernels. This helps 
+  evaluate sequences longer than those used for training.
+
+**Bug Fixes**
+
+* When using tensor parallelism, parameters were initialized multiple times 
+  unncessarily. This release fixed the multiple initialization of parameters
+  so that each parameter is initialized exactly once. 
+  It not only saves time, but also ensures that the random generator behavior 
+  is similar to the non-tensor parallelism case.
+  
+**Known issues**
+
+* Model initialization might take longer with PyTorch 2.0 than that with PyTorch 1.13. 
 
 **Migration to AWS Deep Learning Containers**
 
@@ -44,6 +70,40 @@ Binary file of this version of the library for `custom container
 Release History
 ===============
 
+SageMaker Distributed Model Parallel 1.14.0 Release Notes
+---------------------------------------------------------
+
+*Date: Jan. 30. 2023*
+
+**Currency Updates**
+
+* Added support for PyTorch v1.13.1
+
+**Improvements**
+
+* Upgraded the flash-attention (https://github.com/HazyResearch/flash-attention) library to  v0.2.6.post1
+
+**Migration to AWS Deep Learning Containers**
+
+This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
+
+- SageMaker training container for PyTorch v1.13.1
+
+  .. code::
+
+    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
+
+
+Binary file of this version of the library for `custom container
+<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
+
+- For PyTorch 1.13.1
+
+  .. code::
+
+    https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.13.1/build-artifacts/2023-01-19-18-35/smdistributed_modelparallel-1.14.0-cp39-cp39-linux_x86_64.whl
+
+
 SageMaker Distributed Model Parallel 1.13.0 Release Notes
 ---------------------------------------------------------
 
diff --git a/doc/api/training/smp_versions/latest.rst b/doc/api/training/smp_versions/latest.rst
@@ -10,8 +10,8 @@ depending on which version of the library you need to use.
 To use the library, reference the
 **Common API** documentation alongside the framework specific API documentation.
 
-Version 1.11.0, 1.13.0, 1.14.0 (Latest)
-=======================================
+Version 1.11.0, 1.13.0, 1.14.0, 1.15.0 (Latest)
+===============================================
 
 To use the library, reference the Common API documentation alongside the framework specific API documentation.
 
diff --git a/doc/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.rst b/doc/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.rst
@@ -301,7 +301,19 @@ Tensor Parallelism Module APIs
          ``post_layernorm`` must be ``True``.
       -  ``post_layernorm``: If ``True``, inserts layer normalization at
          the output. At least one of ``pre_layernorm`` and
-         ``post_layernorm`` must be ``True``.
+         ``post_layernorm`` must be ``True``. (Available from 
+         the SageMaker model parallelism library v1.15.0.)
+      -  ``use_alibi`` (bool, default False): Activates Attention with 
+         Linear Biases (ALiBi) for attention computation.
+         ALiBi facilitates efficient extrapolation on input sequences 
+         and thus improves training efficiency. 
+         The library enables ALiBi by using the `Triton 
+         flash attention kernel
+         <https://github.com/HazyResearch/flash-attention>`_.
+         Refer to https://arxiv.org/abs/2108.12409 for more 
+         details on the technique.
+         (Available from 
+         the SageMaker model parallelism library v1.15.0.)
 
    -  **Methods:**