Skip to content

documentation: adding version 1.1.0 docs for smdistributed.dataparallel #2232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ PyTorch Guide to SageMaker's distributed data parallel library
- :ref:`pytorch-sdp-api`

.. _pytorch-sdp-modify:
:noindex:

Modify a PyTorch training script to use SageMaker data parallel
======================================================================
Expand Down Expand Up @@ -149,6 +150,7 @@ you will have for distributed training with the distributed data parallel librar


.. _pytorch-sdp-api:
:noindex:

PyTorch API
===========
Expand All @@ -159,6 +161,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.is_available()
:noindex:

Check if script started as a distributed job. For local runs user can
check that is_available returns False and run the training script
Expand All @@ -174,6 +177,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.init_process_group(*args, **kwargs)
:noindex:

Initialize ``smdistributed.dataparallel``. Must be called at the
beginning of the training script, before calling any other methods.
Expand All @@ -198,6 +202,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.is_initialized()
:noindex:

Checks if the default process group has been initialized.

Expand All @@ -211,6 +216,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.get_world_size(group=smdistributed.dataparallel.torch.distributed.group.WORLD)
:noindex:

The total number of GPUs across all the nodes in the cluster. For
example, in a 8 node cluster with 8 GPU each, size will be equal to 64.
Expand All @@ -230,6 +236,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.get_rank(group=smdistributed.dataparallel.torch.distributed.group.WORLD)
:noindex:

The rank of the node in the cluster. The rank ranges from 0 to number of
nodes - 1. This is similar to MPI's World Rank.
Expand All @@ -249,6 +256,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.get_local_rank()
:noindex:

Local rank refers to the relative rank of
the ``smdistributed.dataparallel`` process within the node the current
Expand All @@ -267,6 +275,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.all_reduce(tensor, op=smdistributed.dataparallel.torch.distributed.ReduceOp.SUM, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
:noindex:

Performs an all-reduce operation on a tensor (torch.tensor) across
all ``smdistributed.dataparallel`` workers
Expand Down Expand Up @@ -311,6 +320,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.broadcast(tensor, src=0, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
:noindex:

Broadcasts the tensor (torch.tensor) to the whole group.

Expand All @@ -335,6 +345,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.all_gather(tensor_list, tensor, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
:noindex:

Gathers tensors from the whole group in a list.

Expand All @@ -361,6 +372,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.all_to_all_single(output_t, input_t, output_split_sizes=None, input_split_sizes=None, group=group.WORLD, async_op=False)
:noindex:

Each process scatters input tensor to all processes in a group and return gathered tensor in output.

Expand All @@ -385,6 +397,7 @@ PyTorch API


.. function:: smdistributed.dataparallel.torch.distributed.barrier(group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
:noindex:

Synchronizes all ``smdistributed.dataparallel`` processes.

Expand All @@ -410,6 +423,7 @@ PyTorch API


.. class:: smdistributed.dataparallel.torch.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, broadcast_buffers=True, process_group=None, bucket_cap_mb=None)
:noindex:

``smdistributed.dataparallel's`` implementation of distributed data
parallelism for PyTorch. In most cases, wrapping your PyTorch Module
Expand Down Expand Up @@ -503,6 +517,7 @@ PyTorch API


.. class:: smdistributed.dataparallel.torch.distributed.ReduceOp
:noindex:

An enum-like class for supported reduction operations
in ``smdistributed.dataparallel``.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ TensorFlow Guide to SageMaker's distributed data parallel library
- :ref:`tensorflow-sdp-api`

.. _tensorflow-sdp-modify:
:noindex:

Modify a TensorFlow 2.x training script to use SageMaker data parallel
======================================================================
Expand Down Expand Up @@ -150,6 +151,7 @@ script you will have for distributed training with the library.


.. _tensorflow-sdp-api:
:noindex:

TensorFlow API
==============
Expand All @@ -160,6 +162,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.init()
:noindex:

Initialize ``smdistributed.dataparallel``. Must be called at the
beginning of the training script.
Expand All @@ -183,6 +186,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.size()
:noindex:

The total number of GPUs across all the nodes in the cluster. For
example, in a 8 node cluster with 8 GPUs each, ``size`` will be equal
Expand All @@ -200,6 +204,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.local_size()
:noindex:

The total number of GPUs on a node. For example, on a node with 8
GPUs, ``local_size`` will be equal to 8.
Expand All @@ -214,6 +219,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.rank()
:noindex:

The rank of the node in the cluster. The rank ranges from 0 to number of
nodes - 1. This is similar to MPI's World Rank.
Expand All @@ -228,6 +234,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.local_rank()
:noindex:

Local rank refers to the relative rank of the
GPUs’ ``smdistributed.dataparallel`` processes within the node. For
Expand All @@ -246,6 +253,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.allreduce(tensor, param_index, num_params, compression=Compression.none, op=ReduceOp.AVERAGE)
:noindex:

Performs an all-reduce operation on a tensor (``tf.Tensor``).

Expand Down Expand Up @@ -273,6 +281,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.broadcast_global_variables(root_rank)
:noindex:

Broadcasts all global variables from root rank to all other processes.

Expand All @@ -287,6 +296,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.broadcast_variables(variables, root_rank)
:noindex:

Applicable for TensorFlow 2.x only.
Expand All @@ -309,6 +319,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.oob_allreduce(tensor, compression=Compression.none, op=ReduceOp.AVERAGE)
:noindex:

OutOfBand (oob) AllReduce is simplified AllReduce function for use cases
such as calculating total loss across all the GPUs in the training.
Expand Down Expand Up @@ -342,6 +353,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.overlap(tensor)
:noindex:

This function is applicable only for models compiled with XLA. Use this
function to enable ``smdistributed.dataparallel`` to efficiently
Expand Down Expand Up @@ -379,6 +391,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.broadcast(tensor, root_rank)
:noindex:

Broadcasts the input tensor on root rank to the same input tensor on all
other ``smdistributed.dataparallel`` processes.
Expand All @@ -399,6 +412,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.shutdown()
:noindex:

Shuts down ``smdistributed.dataparallel``. Optional to call at the end
of the training script.
Expand All @@ -413,6 +427,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.DistributedOptimizer
:noindex:

Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).
Expand Down Expand Up @@ -453,6 +468,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.DistributedGradientTape
:noindex:

Applicable to TensorFlow 2.x only.

Expand Down Expand Up @@ -488,6 +504,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.BroadcastGlobalVariablesHook
:noindex:

Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).

Expand Down Expand Up @@ -516,6 +533,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.Compression
:noindex:

Optional Gradient Compression algorithm that can be used in AllReduce
operation.
Expand All @@ -527,6 +545,7 @@ TensorFlow API


.. function:: smdistributed.dataparallel.tensorflow.ReduceOp
:noindex:

Supported reduction operations in ``smdistributed.dataparallel``.

Expand Down
Loading