Skip to content

Documentation: sm distributed model parallel doc versioning updating #2097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions doc/api/training/smd_data_parallel.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
###################################
##########################
Distributed data parallel
###################################
##########################

SageMaker's distributed data parallel library extends SageMaker’s training
capabilities on deep learning models with near-linear scaling efficiency,
Expand Down Expand Up @@ -68,5 +68,5 @@ model.
.. toctree::
:maxdepth: 2

smd_data_parallel_pytorch
smd_data_parallel_tensorflow
sdp_versions/smd_data_parallel_pytorch
sdp_versions/smd_data_parallel_tensorflow
30 changes: 20 additions & 10 deletions doc/api/training/smd_model_parallel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,30 @@ Use the following sections to learn more about the model parallelism and the lib
<https://integ-docs-aws.amazon.com/sagemaker/latest/dg/model-parallel-use-api.html#model-parallel-customize-container>`__
for more information.

How to Use this Guide
=====================
Use with the SageMaker Python SDK
=================================

Use the following page to learn how to configure and enable distributed model parallel
when you configure an Amazon SageMaker Python SDK `Estimator`.

.. toctree::
:maxdepth: 1

smd_model_parallel_general

API Documentation
=================

The library contains a Common API that is shared across frameworks, as well as APIs
that are specific to supported frameworks, TensorFlow and PyTorch. To use the library, reference the
that are specific to supported frameworks, TensorFlow and PyTorch.

Select a version to see the API documentation for version. To use the library, reference the
**Common API** documentation alongside the framework specific API documentation.

.. toctree::
:maxdepth: 1

smd_model_parallel_general
smd_model_parallel_common_api
smd_model_parallel_pytorch
smd_model_parallel_tensorflow
smp_versions/v1_1_0.rst

It is recommended to use this documentation alongside `SageMaker Distributed Model Parallel
<http://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel.html>`__ in the Amazon SageMaker
Expand All @@ -49,11 +59,11 @@ developer guide. This developer guide documentation includes:
- `Configuration tips and pitfalls
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-tips-pitfalls.html>`__

Latest Updates
==============

New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library.
Release Notes
=============

New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library.
To see the the latest changes made to the library, refer to the library
`Release Notes
<https://github.com/aws/sagemaker-python-sdk/blob/master/doc/api/training/smd_model_parallel_release_notes/>`_.
Expand Down
4 changes: 2 additions & 2 deletions doc/api/training/smd_model_parallel_general.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
.. _sm-sdk-modelparallel-params:

SageMaker Python SDK ``modelparallel`` parameters
-------------------------------------------------
=================================================

The TensorFlow and PyTorch ``Estimator`` objects contains a ``distribution`` parameter,
which is used to enable and specify parameters for the
Expand Down Expand Up @@ -306,7 +306,7 @@ table are optional.
.. _ranking-basics:

Ranking Basics
--------------
==============

The library maintains a one-to-one mapping between processes and available GPUs:
for each GPU, there is a corresponding CPU process. Each CPU process
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
Common SageMaker distributed model parallel library APIs
--------------------------------------------------------
.. admonition:: Contents

The following APIs are common across all frameworks.
- :ref:`communication_api`
- :ref:`mpi_basics`

Common API
==========

The following SageMaker distribute model parallel APIs are common across all frameworks.

**Important**: This API document assumes you use the following import statement in your training scripts.

Expand Down Expand Up @@ -243,6 +248,7 @@ The following APIs are common across all frameworks.
variable. If ``method`` is not ``"variable"``, this argument is
ignored.

.. _mpi_basics:

MPI Basics
^^^^^^^^^^
Expand All @@ -265,8 +271,10 @@ The library exposes the following basic MPI primitives to its Python API:
- ``smp.get_dp_group()``: The list of ranks that hold different
replicas of the same model partition.

.. _communication_api:

Communication API
=================
^^^^^^^^^^^^^^^^^

The library provides a few communication primitives which can be helpful while
developing the training script. These primitives use the following
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
.. admonition:: Contents

- :ref:`pytorch_saving_loading`
- :ref:`pytorch_saving_loading_instructions`

PyTorch API
===========

Expand All @@ -10,6 +15,13 @@ This API document assumes you use the following import statements in your traini
import smdistributed.modelparallel.torch as smp


.. tip::

Refer to
`Modify a PyTorch Training Script
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-pt>`_
to learn how to use the following API in your PyTorch training script.

.. class:: smp.DistributedModel

A sub-class of ``torch.nn.Module`` which specifies the model to be
Expand Down Expand Up @@ -354,6 +366,7 @@ This API document assumes you use the following import statements in your traini
currently doesn’t work with the library. ``smp.amp.GradScaler`` replaces
``torch.amp.GradScaler`` and provides the same functionality.

.. _pytorch_saving_loading:

APIs for Saving and Loading
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -404,6 +417,8 @@ APIs for Saving and Loading
``mp_rank`` loads the checkpoint corresponding to the ``mp_rank``.
Should be used when loading a model trained with the library.

.. _pytorch_saving_loading_instructions:

General Instruction For Saving and Loading
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ TensorFlow API

import smdistributed.modelparallel.tensorflow as smp

.. tip::

Refer to
`Modify a TensorFlow Training Script
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-tf>`_
to learn how to use the following API in your TensorFlow training script.

.. class:: smp.DistributedModel
:noindex:
Expand Down
12 changes: 12 additions & 0 deletions doc/api/training/smp_versions/v1_1_0.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

Version 1.1.0 (Latest)
======================

To use the library, reference the Common API documentation alongside the framework specific API documentation.

.. toctree::
:maxdepth: 1

v1.1.0/smd_model_parallel_common_api
v1.1.0/smd_model_parallel_pytorch
v1.1.0/smd_model_parallel_tensorflow
9 changes: 9 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,15 @@

html_theme = "sphinx_rtd_theme"

html_theme_options = {
"collapse_navigation": True,
"sticky_navigation": True,
"navigation_depth": 6,
"includehidden": True,
"titles_only": False,
}


html_static_path = ["_static"]

htmlhelp_basename = "%sdoc" % project
Expand Down