Skip to content

Commit e57b1e3

Browse files
Documentation: sm distributed model parallel doc versioning updating (#2097)
* documentation: adding docs dedicated to SageMaker distributed versions * documentation: updating section names * documentation: reverting makefile and sdp toc depth Co-authored-by: Ahsan Khan <[email protected]>
1 parent e3c54e1 commit e57b1e3

10 files changed

+80
-20
lines changed

doc/api/training/smd_data_parallel.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
###################################
1+
##########################
22
Distributed data parallel
3-
###################################
3+
##########################
44

55
SageMaker's distributed data parallel library extends SageMaker’s training
66
capabilities on deep learning models with near-linear scaling efficiency,
@@ -68,5 +68,5 @@ model.
6868
.. toctree::
6969
:maxdepth: 2
7070

71-
smd_data_parallel_pytorch
72-
smd_data_parallel_tensorflow
71+
sdp_versions/smd_data_parallel_pytorch
72+
sdp_versions/smd_data_parallel_tensorflow

doc/api/training/smd_model_parallel.rst

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,20 +20,30 @@ Use the following sections to learn more about the model parallelism and the lib
2020
<https://integ-docs-aws.amazon.com/sagemaker/latest/dg/model-parallel-use-api.html#model-parallel-customize-container>`__
2121
for more information.
2222

23-
How to Use this Guide
24-
=====================
23+
Use with the SageMaker Python SDK
24+
=================================
25+
26+
Use the following page to learn how to configure and enable distributed model parallel
27+
when you configure an Amazon SageMaker Python SDK `Estimator`.
28+
29+
.. toctree::
30+
:maxdepth: 1
31+
32+
smd_model_parallel_general
33+
34+
API Documentation
35+
=================
2536

2637
The library contains a Common API that is shared across frameworks, as well as APIs
27-
that are specific to supported frameworks, TensorFlow and PyTorch. To use the library, reference the
38+
that are specific to supported frameworks, TensorFlow and PyTorch.
39+
40+
Select a version to see the API documentation for version. To use the library, reference the
2841
**Common API** documentation alongside the framework specific API documentation.
2942

3043
.. toctree::
3144
:maxdepth: 1
3245

33-
smd_model_parallel_general
34-
smd_model_parallel_common_api
35-
smd_model_parallel_pytorch
36-
smd_model_parallel_tensorflow
46+
smp_versions/v1_1_0.rst
3747

3848
It is recommended to use this documentation alongside `SageMaker Distributed Model Parallel
3949
<http://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel.html>`__ in the Amazon SageMaker
@@ -49,11 +59,11 @@ developer guide. This developer guide documentation includes:
4959
- `Configuration tips and pitfalls
5060
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-tips-pitfalls.html>`__
5161

52-
Latest Updates
53-
==============
5462

55-
New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library.
63+
Release Notes
64+
=============
5665

66+
New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library.
5767
To see the the latest changes made to the library, refer to the library
5868
`Release Notes
5969
<https://github.com/aws/sagemaker-python-sdk/blob/master/doc/api/training/smd_model_parallel_release_notes/>`_.

doc/api/training/smd_model_parallel_general.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
.. _sm-sdk-modelparallel-params:
77

88
SageMaker Python SDK ``modelparallel`` parameters
9-
-------------------------------------------------
9+
=================================================
1010
1111
The TensorFlow and PyTorch ``Estimator`` objects contains a ``distribution`` parameter,
1212
which is used to enable and specify parameters for the
@@ -306,7 +306,7 @@ table are optional.
306306
.. _ranking-basics:
307307

308308
Ranking Basics
309-
--------------
309+
==============
310310

311311
The library maintains a one-to-one mapping between processes and available GPUs:
312312
for each GPU, there is a corresponding CPU process. Each CPU process

doc/api/training/smd_model_parallel_common_api.rst renamed to doc/api/training/smp_versions/v1.1.0/smd_model_parallel_common_api.rst

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
1-
Common SageMaker distributed model parallel library APIs
2-
--------------------------------------------------------
1+
.. admonition:: Contents
32

4-
The following APIs are common across all frameworks.
3+
- :ref:`communication_api`
4+
- :ref:`mpi_basics`
5+
6+
Common API
7+
==========
8+
9+
The following SageMaker distribute model parallel APIs are common across all frameworks.
510

611
**Important**: This API document assumes you use the following import statement in your training scripts.
712

@@ -243,6 +248,7 @@ The following APIs are common across all frameworks.
243248
variable. If ``method`` is not ``"variable"``, this argument is
244249
ignored.
245250

251+
.. _mpi_basics:
246252

247253
MPI Basics
248254
^^^^^^^^^^
@@ -265,8 +271,10 @@ The library exposes the following basic MPI primitives to its Python API:
265271
- ``smp.get_dp_group()``: The list of ranks that hold different
266272
replicas of the same model partition.
267273

274+
.. _communication_api:
275+
268276
Communication API
269-
=================
277+
^^^^^^^^^^^^^^^^^
270278

271279
The library provides a few communication primitives which can be helpful while
272280
developing the training script. These primitives use the following

doc/api/training/smd_model_parallel_pytorch.rst renamed to doc/api/training/smp_versions/v1.1.0/smd_model_parallel_pytorch.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
.. admonition:: Contents
2+
3+
- :ref:`pytorch_saving_loading`
4+
- :ref:`pytorch_saving_loading_instructions`
5+
16
PyTorch API
27
===========
38

@@ -10,6 +15,13 @@ This API document assumes you use the following import statements in your traini
1015
import smdistributed.modelparallel.torch as smp
1116
1217
18+
.. tip::
19+
20+
Refer to
21+
`Modify a PyTorch Training Script
22+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-pt>`_
23+
to learn how to use the following API in your PyTorch training script.
24+
1325
.. class:: smp.DistributedModel
1426

1527
A sub-class of ``torch.nn.Module`` which specifies the model to be
@@ -354,6 +366,7 @@ This API document assumes you use the following import statements in your traini
354366
currently doesn’t work with the library. ``smp.amp.GradScaler`` replaces
355367
``torch.amp.GradScaler`` and provides the same functionality.
356368

369+
.. _pytorch_saving_loading:
357370

358371
APIs for Saving and Loading
359372
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -404,6 +417,8 @@ APIs for Saving and Loading
404417
``mp_rank`` loads the checkpoint corresponding to the ``mp_rank``.
405418
Should be used when loading a model trained with the library.
406419

420+
.. _pytorch_saving_loading_instructions:
421+
407422
General Instruction For Saving and Loading
408423
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
409424

doc/api/training/smd_model_parallel_tensorflow.rst renamed to doc/api/training/smp_versions/v1.1.0/smd_model_parallel_tensorflow.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@ TensorFlow API
99
1010
import smdistributed.modelparallel.tensorflow as smp
1111
12+
.. tip::
13+
14+
Refer to
15+
`Modify a TensorFlow Training Script
16+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-tf>`_
17+
to learn how to use the following API in your TensorFlow training script.
1218

1319
.. class:: smp.DistributedModel
1420
:noindex:
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
2+
Version 1.1.0 (Latest)
3+
======================
4+
5+
To use the library, reference the Common API documentation alongside the framework specific API documentation.
6+
7+
.. toctree::
8+
:maxdepth: 1
9+
10+
v1.1.0/smd_model_parallel_common_api
11+
v1.1.0/smd_model_parallel_pytorch
12+
v1.1.0/smd_model_parallel_tensorflow

doc/conf.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@
5555

5656
html_theme = "sphinx_rtd_theme"
5757

58+
html_theme_options = {
59+
"collapse_navigation": True,
60+
"sticky_navigation": True,
61+
"navigation_depth": 6,
62+
"includehidden": True,
63+
"titles_only": False,
64+
}
65+
66+
5867
html_static_path = ["_static"]
5968

6069
htmlhelp_basename = "%sdoc" % project

0 commit comments

Comments
 (0)