polish doc style and structure

mchoi8739 · mchoi8739 · commit 5159c85be803 · 2022-02-17T14:43:33.000-08:00
diff --git a/doc/api/training/distributed.rst b/doc/api/training/distributed.rst
@@ -4,8 +4,19 @@ SageMaker distributed training libraries offer both data parallel and model para
 They combine software and hardware technologies to improve inter-GPU and inter-node communications.
 They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.
 
+The SageMaker Distributed Data Parallel Library
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 .. toctree::
    :maxdepth: 3
 
    smd_data_parallel
+
+
+The SageMaker Distributed Model Parallel Library
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. toctree::
+   :maxdepth: 3
+
    smd_model_parallel
diff --git a/doc/api/training/sdp_versions/latest.rst b/doc/api/training/sdp_versions/latest.rst
@@ -15,7 +15,7 @@ Select the latest or one of the previous versions of the API documentation
 depending on the version of the library you use.
 
 .. important::
-   The distributed data parallel library only supports training jobs using CUDA 11 or later.
+   The distributed data parallel library supports training jobs using CUDA 11 or later.
    When you define a SageMaker ``PyTorch`` or ``TensorFlow``
    estimator with ``dataparallel`` parameter ``enabled`` set to ``True``,
    it uses CUDA 11. When you extend or customize your own training image,
diff --git a/doc/api/training/sdp_versions/latest/smd_data_parallel_pytorch.rst b/doc/api/training/sdp_versions/latest/smd_data_parallel_pytorch.rst
@@ -1,8 +1,8 @@
-##############################################################
-PyTorch Guide to SageMaker's distributed data parallel library
-##############################################################
+#################
+Guide for PyTorch
+#################
 
-Use this guide to learn about the SageMaker distributed
+Use this guide to learn how to use the SageMaker distributed
 data parallel library API for PyTorch.
 
 .. contents:: Topics
@@ -21,63 +21,73 @@ The distributed data parallel library works as a backend of the PyTorch distribu
 See `SageMaker distributed data parallel PyTorch examples <https://sagemaker-examples.readthedocs.io/en/latest/training/distributed_training/index.html#pytorch-distributed>`__ 
 for additional details on how to use the library.
 
-1. Import the SageMaker distributed data parallel library’s PyTorch client.
+1.  Import the SageMaker distributed data parallel library’s PyTorch client.
 
-  .. code:: python
+    .. code:: python
 
-    import smdistributed.dataparallel.torch.torch_smddp
+      import smdistributed.dataparallel.torch.torch_smddp
 
-2. Import the PyTorch distributed modules.
+2.  Import the PyTorch distributed modules.
 
-  .. code:: python
+    .. code:: python
 
-    import torch
-    import torch.distributed as dist
-    from torch.nn.parallel import DistributedDataParallel as DDP
+      import torch
+      import torch.distributed as dist
+      from torch.nn.parallel import DistributedDataParallel as DDP
 
-3. Set the backend of torch.distributed as smddp.
+3.  Set the backend of ``torch.distributed`` as ``smddp``.
 
-  .. code:: python
+    .. code:: python
 
-    dist.init_process_group(backend='smddp')
+      dist.init_process_group(backend='smddp')
 
-4. After parsing arguments and defining a batch size parameter (for example, batch_size=args.batch_size), add a two-line of code to resize the batch size per worker (GPU). PyTorch's DataLoader operation does not automatically handle the batch resizing for distributed training.
+4.  After parsing arguments and defining a batch size parameter
+    (for example, ``batch_size=args.batch_size``), add a two-line of code to
+    resize the batch size per worker (GPU). PyTorch's DataLoader operation
+    does not automatically handle the batch resizing for distributed training.
 
-  .. code:: python
+    .. code:: python
 
-    batch_size //= dist.get_world_size()
-    batch_size = max(batch_size, 1)
+      batch_size //= dist.get_world_size()
+      batch_size = max(batch_size, 1)
 
-5. Pin each GPU to a single SageMaker data parallel library process with local_rank—this refers to the relative rank of the process within a given node.
+5.  Pin each GPU to a single SageMaker data parallel library process with
+    ``local_rank``. This refers to the relative rank of the process within a given node.
 
-  You can retreive the rank of the process from the LOCAL_RANK environment variable.
+    You can retrieve the rank of the process from the ``LOCAL_RANK`` environment variable.
 
-  .. code:: python
+    .. code:: python
 
-    import os
-    local_rank = os.environ["LOCAL_RANK"]
-    torch.cuda.set_device(local_rank)
+      import os
+      local_rank = os.environ["LOCAL_RANK"]
+      torch.cuda.set_device(local_rank)
 
-6. After defining a model, wrap it with the PyTorch DDP.
+6.  After defining a model, wrap it with the PyTorch DDP.
 
-  .. code:: python
+    .. code:: python
 
-    model = ...
+      model = ...
 
-    # Wrap the model with the PyTorch DistributedDataParallel API
-    model = DDP(model)
+      # Wrap the model with the PyTorch DistributedDataParallel API
+      model = DDP(model)
 
-7. When you call the torch.utils.data.distributed.DistributedSampler API, specify the total number of processes (GPUs) participating in training across all the nodes in the cluster. This is called world_size, and you can retrieve the number from the torch.distributed.get_world_size() API. Also, specify the rank of each process among all processes using the torch.distributed.get_rank() API.
+7.  When you call the ``torch.utils.data.distributed.DistributedSampler`` API,
+    specify the total number of processes (GPUs) participating in training across
+    all the nodes in the cluster. This is called ``world_size``, and you can retrieve
+    the number from the ``torch.distributed.get_world_size()`` API. Also, specify
+    the rank of each process among all processes using the ``torch.distributed.get_rank()`` API.
 
-  .. code:: python
+    .. code:: python
 
-    train_sampler = DistributedSampler(
-      train_dataset,
-      num_replicas = dist.get_world_size(),
-      rank = dist.get_rank()
-    )
+      train_sampler = DistributedSampler(
+          train_dataset,
+          num_replicas = dist.get_world_size(),
+          rank = dist.get_rank()
+      )
 
-8. Modify your script to save checkpoints only on the leader process (rank 0). The leader process has a synchronized model. This also avoids other processes overwriting the checkpoints and possibly corrupting the checkpoints.
+8.  Modify your script to save checkpoints only on the leader process (rank 0).
+    The leader process has a synchronized model. This also avoids other processes
+    overwriting the checkpoints and possibly corrupting the checkpoints.
 
 The following example code shows the structure of a PyTorch training script with DDP and smddp as the backend.
 
@@ -142,7 +152,7 @@ The following example code shows the structure of a PyTorch training script with
               test(...)
           scheduler.step()
 
-      # SageMaker data parallel: Save model on the main node (rank 0).
+      # SageMaker data parallel: Save model on the leader node (rank 0).
       if dist.get_rank() == 0:
           torch.save(...)
 
@@ -171,16 +181,16 @@ that are supported in the library v1.3.0 and before.
 
 .. warning::
 
-  The following ``smdistributed`` APIs for its implementation of distributed data parallelism
-  for PyTorch is deprecated.
+  The following APIs for ``smdistributed`` implementation of the PyTorch distributed modules
+  are deprecated.
 
 
 .. class:: smdistributed.dataparallel.torch.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, broadcast_buffers=True, process_group=None, bucket_cap_mb=None)
 
    .. deprecated:: 1.4.0
 
       Use the `torch.nn.parallel.DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`_
-      instead.
+      API instead.
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.is_available()
diff --git a/doc/api/training/sdp_versions/latest/smd_data_parallel_tensorflow.rst b/doc/api/training/sdp_versions/latest/smd_data_parallel_tensorflow.rst
@@ -1,16 +1,18 @@
-#################################################################
-TensorFlow Guide to SageMaker's distributed data parallel library
-#################################################################
+####################
+Guide for TensorFlow
+####################
 
-.. admonition:: Contents
+Use this guide to learn how to use the SageMaker distributed
+data parallel library API for TensorFlow.
 
-   - :ref:`tensorflow-sdp-modify`
-   - :ref:`tensorflow-sdp-api`
+.. contents:: Topics
+  :depth: 3
+  :local:
 
 .. _tensorflow-sdp-modify:
 
-Modify a TensorFlow 2.x training script to use SageMaker data parallel
-======================================================================
+Modify a TensorFlow 2.x training script to use the SageMaker data parallel library
+==================================================================================
 
 The following steps show you how to convert a TensorFlow 2.x training
 script to utilize the distributed data parallel library.
diff --git a/doc/api/training/smd_data_parallel.rst b/doc/api/training/smd_data_parallel.rst
@@ -1,6 +1,6 @@
-###############################################
-The SageMaker Distributed Data Parallel Library
-###############################################
+########################################################
+The SageMaker Distributed Data Parallel Library Overview
+########################################################
 
 SageMaker's distributed data parallel library extends SageMaker’s training
 capabilities on deep learning models with near-linear scaling efficiency,
diff --git a/doc/api/training/smd_data_parallel_use_sm_pysdk.rst b/doc/api/training/smd_data_parallel_use_sm_pysdk.rst
@@ -1,12 +1,14 @@
-Use with the SageMaker Python SDK
-=================================
+Run a Distributed Training Job Using the SageMaker Python SDK
+=============================================================
 
 To use the SageMaker distributed data parallel library with the SageMaker Python SDK,
 you will need the following:
 
 -  A TensorFlow or PyTorch training script that is
-   adapted to use the distributed data parallel library. The :ref:`sdp_api_docs` includes
-   framework specific examples of training scripts that are adapted to use this library.
+   adapted to use the distributed data parallel library. Make sure you read through
+   the previous topic at
+   :ref:`sdp_api_docs`, which includes instructions on how to modify your script and
+   framework-specific examples.
 -  Your input data must be in an S3 bucket or in FSx in the AWS region
    that you will use to launch your training job. If you use the Jupyter
    notebooks provided, create a SageMaker notebook instance in the same
diff --git a/doc/api/training/smd_model_parallel.rst b/doc/api/training/smd_model_parallel.rst
@@ -1,5 +1,5 @@
-The SageMaker Distributed Model Parallel Library
-------------------------------------------------
+The SageMaker Distributed Model Parallel Library Overview
+---------------------------------------------------------
 
 The Amazon SageMaker distributed model parallel library is a model parallelism library for training
 large deep learning models that were previously difficult to train due to GPU memory limitations.
diff --git a/doc/api/training/smd_model_parallel_general.rst b/doc/api/training/smd_model_parallel_general.rst
@@ -1,6 +1,6 @@
-#################################
-Use with the SageMaker Python SDK
-#################################
+#############################################################
+Run a Distributed Training Job Using the SageMaker Python SDK
+#############################################################
 
 Walk through the following pages to learn about the SageMaker model parallel library's APIs
 to configure and enable distributed model parallelism