aws · TEChopra1000 · Mar 23, 2021 · Mar 19, 2021 · Mar 20, 2021 · Mar 22, 2021
@@ -8,6 +8,7 @@ PyTorch Guide to SageMaker's distributed data parallel library
    - :ref:`pytorch-sdp-api`
 
 .. _pytorch-sdp-modify:
+   :noindex:
 
 Modify a PyTorch training script to use SageMaker data parallel
 ======================================================================
@@ -149,6 +150,7 @@ you will have for distributed training with the distributed data parallel librar
 
 
 .. _pytorch-sdp-api:
+   :noindex:
 
 PyTorch API
 ===========
@@ -159,6 +161,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.is_available()
+   :noindex:
 
    Check if script started as a distributed job. For local runs user can
    check that is_available returns False and run the training script
@@ -174,6 +177,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.init_process_group(*args, **kwargs)
+   :noindex:
 
    Initialize ``smdistributed.dataparallel``. Must be called at the
    beginning of the training script, before calling any other methods.
@@ -198,6 +202,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.is_initialized()
+   :noindex:
 
    Checks if the default process group has been initialized.
 
@@ -211,6 +216,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.get_world_size(group=smdistributed.dataparallel.torch.distributed.group.WORLD)
+   :noindex:
 
    The total number of GPUs across all the nodes in the cluster. For
    example, in a 8 node cluster with 8 GPU each, size will be equal to 64.
@@ -230,6 +236,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.get_rank(group=smdistributed.dataparallel.torch.distributed.group.WORLD)
+   :noindex:
 
    The rank of the node in the cluster. The rank ranges from 0 to number of
    nodes - 1. This is similar to MPI's World Rank.
@@ -249,6 +256,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.get_local_rank()
+   :noindex:
 
    Local rank refers to the relative rank of
    the ``smdistributed.dataparallel`` process within the node the current
@@ -267,6 +275,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.all_reduce(tensor, op=smdistributed.dataparallel.torch.distributed.ReduceOp.SUM, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
+   :noindex:
 
    Performs an all-reduce operation on a tensor (torch.tensor) across
    all ``smdistributed.dataparallel`` workers
@@ -311,6 +320,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.broadcast(tensor, src=0, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
+   :noindex:
 
    Broadcasts the tensor (torch.tensor) to the whole group.
 
@@ -335,6 +345,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.all_gather(tensor_list, tensor, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
+   :noindex:
 
    Gathers tensors from the whole group in a list.
 
@@ -361,6 +372,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.all_to_all_single(output_t, input_t, output_split_sizes=None, input_split_sizes=None, group=group.WORLD, async_op=False)
+   :noindex:
 
    Each process scatters input tensor to all processes in a group and return gathered tensor in output.
 
@@ -385,6 +397,7 @@ PyTorch API
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.barrier(group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
+   :noindex:
 
    Synchronizes all ``smdistributed.dataparallel`` processes.
 
@@ -410,6 +423,7 @@ PyTorch API
 
 
 .. class:: smdistributed.dataparallel.torch.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, broadcast_buffers=True, process_group=None, bucket_cap_mb=None)
+   :noindex:
 
    ``smdistributed.dataparallel's`` implementation of distributed data
    parallelism for PyTorch. In most cases, wrapping your PyTorch Module
@@ -503,6 +517,7 @@ PyTorch API
 
 
 .. class:: smdistributed.dataparallel.torch.distributed.ReduceOp
+   :noindex:
 
    An enum-like class for supported reduction operations
    in ``smdistributed.dataparallel``.

@@ -8,6 +8,7 @@ TensorFlow Guide to SageMaker's distributed data parallel library
    - :ref:`tensorflow-sdp-api`
 
 .. _tensorflow-sdp-modify:
+   :noindex:
 
 Modify a TensorFlow 2.x training script to use SageMaker data parallel
 ======================================================================
@@ -150,6 +151,7 @@ script you will have for distributed training with the library.
 
 
 .. _tensorflow-sdp-api:
+   :noindex:
 
 TensorFlow API
 ==============
@@ -160,6 +162,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.init()
+   :noindex:
 
    Initialize ``smdistributed.dataparallel``. Must be called at the
    beginning of the training script.
@@ -183,6 +186,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.size()
+   :noindex:
 
    The total number of GPUs across all the nodes in the cluster. For
    example, in a 8 node cluster with 8 GPUs each, ``size`` will be equal
@@ -200,6 +204,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.local_size()
+   :noindex:
 
    The total number of GPUs on a node. For example, on a node with 8
    GPUs, ``local_size`` will be equal to 8.
@@ -214,6 +219,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.rank()
+   :noindex:
 
    The rank of the node in the cluster. The rank ranges from 0 to number of
    nodes - 1. This is similar to MPI's World Rank.
@@ -228,6 +234,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.local_rank()
+   :noindex:
 
    Local rank refers to the relative rank of the
    GPUs’ ``smdistributed.dataparallel`` processes within the node. For
@@ -246,6 +253,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.allreduce(tensor, param_index, num_params, compression=Compression.none, op=ReduceOp.AVERAGE)
+   :noindex:
 
    Performs an all-reduce operation on a tensor (``tf.Tensor``).
 
@@ -273,6 +281,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.broadcast_global_variables(root_rank)
+   :noindex:
 
    Broadcasts all global variables from root rank to all other processes.
 
@@ -287,6 +296,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.broadcast_variables(variables, root_rank)
+   :noindex:
 
    Applicable for TensorFlow 2.x only.
    
@@ -309,6 +319,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.oob_allreduce(tensor, compression=Compression.none, op=ReduceOp.AVERAGE)
+   :noindex:
 
    OutOfBand (oob) AllReduce is simplified AllReduce function for use cases
    such as calculating total loss across all the GPUs in the training.
@@ -342,6 +353,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.overlap(tensor)
+   :noindex:
 
    This function is applicable only for models compiled with XLA. Use this
    function to enable ``smdistributed.dataparallel`` to efficiently
@@ -379,6 +391,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.broadcast(tensor, root_rank)
+   :noindex:
 
    Broadcasts the input tensor on root rank to the same input tensor on all
    other ``smdistributed.dataparallel`` processes.
@@ -399,6 +412,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.shutdown()
+   :noindex:
 
    Shuts down ``smdistributed.dataparallel``. Optional to call at the end
    of the training script.
@@ -413,6 +427,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.DistributedOptimizer
+   :noindex:
 
    Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).
    
@@ -453,6 +468,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.DistributedGradientTape
+   :noindex:
 
    Applicable to TensorFlow 2.x only.
 
@@ -488,6 +504,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.BroadcastGlobalVariablesHook
+   :noindex:
 
    Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).
 
@@ -516,6 +533,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.Compression
+   :noindex:
 
    Optional Gradient Compression algorithm that can be used in AllReduce
    operation.
@@ -527,6 +545,7 @@ TensorFlow API
 
 
 .. function:: smdistributed.dataparallel.tensorflow.ReduceOp
+   :noindex:
 
    Supported reduction operations in ``smdistributed.dataparallel``.