Documentation: sm distributed model parallel doc versioning updating (#2097)

TEChopra1000 · ahsan-z-khan · web-flow · commit e57b1e3dac17 · 2021-01-20T11:49:18.000-08:00
* documentation: adding docs dedicated to SageMaker distributed versions

* documentation: updating section names

* documentation: reverting makefile and sdp toc depth

Co-authored-by: Ahsan Khan &lt;ahsan.al.zaki@gmail.com&gt;
diff --git a/doc/api/training/sdp_versions/smd_data_parallel_pytorch.rst b/doc/api/training/sdp_versions/smd_data_parallel_pytorch.rst
diff --git a/doc/api/training/sdp_versions/smd_data_parallel_tensorflow.rst b/doc/api/training/sdp_versions/smd_data_parallel_tensorflow.rst
diff --git a/doc/api/training/smd_data_parallel.rst b/doc/api/training/smd_data_parallel.rst
@@ -1,6 +1,6 @@
-###################################
+##########################
 Distributed data parallel
-###################################
+##########################
 
 SageMaker's distributed data parallel library extends SageMaker’s training
 capabilities on deep learning models with near-linear scaling efficiency,
@@ -68,5 +68,5 @@ model.
 .. toctree::
    :maxdepth: 2
 
-   smd_data_parallel_pytorch
-   smd_data_parallel_tensorflow
+   sdp_versions/smd_data_parallel_pytorch
+   sdp_versions/smd_data_parallel_tensorflow
diff --git a/doc/api/training/smd_model_parallel.rst b/doc/api/training/smd_model_parallel.rst
@@ -20,20 +20,30 @@ Use the following sections to learn more about the model parallelism and the lib
    <https://integ-docs-aws.amazon.com/sagemaker/latest/dg/model-parallel-use-api.html#model-parallel-customize-container>`__
    for more information.
 
-How to Use this Guide
-=====================
+Use with the SageMaker Python SDK
+=================================
+
+Use the following page to learn how to configure and enable distributed model parallel
+when you configure an Amazon SageMaker Python SDK `Estimator`.
+
+.. toctree::
+   :maxdepth: 1
+
+   smd_model_parallel_general
+
+API Documentation
+=================
 
 The library contains a Common API that is shared across frameworks, as well as APIs
-that are specific to supported frameworks, TensorFlow and PyTorch. To use the library, reference the
+that are specific to supported frameworks, TensorFlow and PyTorch.
+
+Select a version to see the API documentation for version. To use the library, reference the
 **Common API** documentation alongside the framework specific API documentation.
 
 .. toctree::
    :maxdepth: 1
 
-   smd_model_parallel_general
-   smd_model_parallel_common_api
-   smd_model_parallel_pytorch
-   smd_model_parallel_tensorflow
+   smp_versions/v1_1_0.rst
 
 It is recommended to use this documentation alongside `SageMaker Distributed Model Parallel
 <http://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel.html>`__ in the Amazon SageMaker
@@ -49,11 +59,11 @@ developer guide. This developer guide documentation includes:
    -  `Configuration tips and pitfalls
       <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-tips-pitfalls.html>`__
 
-Latest Updates
-==============
 
-New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library.
+Release Notes
+=============
 
+New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library.
 To see the the latest changes made to the library, refer to the library
 `Release Notes
 <https://github.com/aws/sagemaker-python-sdk/blob/master/doc/api/training/smd_model_parallel_release_notes/>`_.
diff --git a/doc/api/training/smd_model_parallel_general.rst b/doc/api/training/smd_model_parallel_general.rst
@@ -6,7 +6,7 @@
 .. _sm-sdk-modelparallel-params:
 
 SageMaker Python SDK ``modelparallel`` parameters
--------------------------------------------------
+=================================================
 
 The TensorFlow and PyTorch ``Estimator`` objects contains a ``distribution`` parameter,
 which is used to enable and specify parameters for the
@@ -306,7 +306,7 @@ table are optional.
 .. _ranking-basics:
 
 Ranking Basics
---------------
+==============
 
 The library maintains a one-to-one mapping between processes and available GPUs:
 for each GPU, there is a corresponding CPU process. Each CPU process
diff --git a/doc/api/training/smp_versions/v1.1.0/smd_model_parallel_common_api.rst b/doc/api/training/smp_versions/v1.1.0/smd_model_parallel_common_api.rst
@@ -1,7 +1,12 @@
-Common SageMaker distributed model parallel library APIs
---------------------------------------------------------
+.. admonition:: Contents
 
-The following APIs are common across all frameworks.
+   - :ref:`communication_api`
+   - :ref:`mpi_basics`
+
+Common API
+==========
+
+The following SageMaker distribute model parallel APIs are common across all frameworks.
 
 **Important**: This API document assumes you use the following import statement in your training scripts.
 
@@ -243,6 +248,7 @@ The following APIs are common across all frameworks.
          variable. If ``method`` is not ``"variable"``, this argument is
          ignored.
 
+.. _mpi_basics:
 
 MPI Basics
 ^^^^^^^^^^
@@ -265,8 +271,10 @@ The library exposes the following basic MPI primitives to its Python API:
 -  ``smp.get_dp_group()``: The list of ranks that hold different
    replicas of the same model partition.
 
+   .. _communication_api:
+
 Communication API
-=================
+^^^^^^^^^^^^^^^^^
 
 The library provides a few communication primitives which can be helpful while
 developing the training script. These primitives use the following
diff --git a/doc/api/training/smp_versions/v1.1.0/smd_model_parallel_pytorch.rst b/doc/api/training/smp_versions/v1.1.0/smd_model_parallel_pytorch.rst
@@ -1,3 +1,8 @@
+.. admonition:: Contents
+
+   - :ref:`pytorch_saving_loading`
+   - :ref:`pytorch_saving_loading_instructions`
+
 PyTorch API
 ===========
 
@@ -10,6 +15,13 @@ This API document assumes you use the following import statements in your traini
    import smdistributed.modelparallel.torch as smp
 
 
+.. tip::
+
+   Refer to
+   `Modify a PyTorch Training Script
+   <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-pt>`_
+   to learn how to use the following API in your PyTorch training script.
+
 .. class:: smp.DistributedModel
 
    A sub-class of ``torch.nn.Module`` which specifies the model to be
@@ -354,6 +366,7 @@ This API document assumes you use the following import statements in your traini
    currently doesn’t work with the library. ``smp.amp.GradScaler`` replaces
    ``torch.amp.GradScaler`` and provides the same functionality.
 
+.. _pytorch_saving_loading:
 
 APIs for Saving and Loading
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -404,6 +417,8 @@ APIs for Saving and Loading
       ``mp_rank`` loads the checkpoint corresponding to the ``mp_rank``.
       Should be used when loading a model trained with the library.
 
+.. _pytorch_saving_loading_instructions:
+
 General Instruction For Saving and Loading
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/doc/api/training/smp_versions/v1.1.0/smd_model_parallel_tensorflow.rst b/doc/api/training/smp_versions/v1.1.0/smd_model_parallel_tensorflow.rst
@@ -9,6 +9,12 @@ TensorFlow API
 
    import smdistributed.modelparallel.tensorflow as smp
 
+.. tip::
+
+   Refer to
+   `Modify a TensorFlow Training Script
+   <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-tf>`_
+   to learn how to use the following API in your TensorFlow training script.
 
 .. class:: smp.DistributedModel
    :noindex:
diff --git a/doc/api/training/smp_versions/v1_1_0.rst b/doc/api/training/smp_versions/v1_1_0.rst
@@ -0,0 +1,12 @@
+
+Version 1.1.0 (Latest)
+======================
+
+To use the library, reference the Common API documentation alongside the framework specific API documentation.
+
+.. toctree::
+   :maxdepth: 1
+
+   v1.1.0/smd_model_parallel_common_api
+   v1.1.0/smd_model_parallel_pytorch
+   v1.1.0/smd_model_parallel_tensorflow
diff --git a/doc/conf.py b/doc/conf.py
@@ -55,6 +55,15 @@
 
 html_theme = "sphinx_rtd_theme"
 
+html_theme_options = {
+    "collapse_navigation": True,
+    "sticky_navigation": True,
+    "navigation_depth": 6,
+    "includehidden": True,
+    "titles_only": False,
+}
+
+
 html_static_path = ["_static"]
 
 htmlhelp_basename = "%sdoc" % project