Skip to content

Commit cb9d235

Browse files
committed
restructure the herring api doc
1 parent 3833310 commit cb9d235

File tree

7 files changed

+94
-89
lines changed

7 files changed

+94
-89
lines changed
Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
1-
.. _smdmp-pt-version-archive:
1+
.. _smddp-version-archive:
22

33
.. toctree::
44
:maxdepth: 1
55

6-
v1_5_0.rst
7-
v1_4_0.rst
8-
v1_3_0.rst
9-
v1_2_0.rst
10-
v1_1_0.rst
6+
v1_2_x.rst
7+
v1_1_x.rst
8+
v1_0_0.rst
Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,44 @@
1+
.. _sdp_api_docs:
12

2-
Version 1.2.x (Latest)
3+
###############################################
4+
Use the Library's API to Adapt Training Scripts
5+
###############################################
6+
7+
This section contains the SageMaker distributed data parallel API documentation.
8+
If you are a new user of this library, it is recommended you use this guide alongside
9+
`SageMaker's Distributed Data Parallel Library
10+
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html>`_.
11+
12+
The library provides framework-specific APIs for TensorFlow and PyTorch.
13+
14+
Select the latest or one of the previous versions of the API documentation
15+
depending on the version of the library you use.
16+
17+
.. important::
18+
The distributed data parallel library only supports training jobs using CUDA 11. When you define a PyTorch or TensorFlow
19+
``Estimator`` with ``dataparallel`` parameter ``enabled`` set to ``True``,
20+
it uses CUDA 11. When you extend or customize your own training image
21+
you must use a CUDA 11 base image. See
22+
`SageMaker Python SDK's distributed data parallel library APIs
23+
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
24+
for more information.
25+
26+
Version 1.4.0 (Latest)
327
======================
428

529
.. toctree::
630
:maxdepth: 1
731

8-
latest/smd_data_parallel_pytorch.rst
9-
latest/smd_data_parallel_tensorflow.rst
32+
latest/smd_data_parallel_pytorch
33+
latest/smd_data_parallel_tensorflow
34+
35+
To find archived API documentation for the previous versions of the library,
36+
see the following link:
37+
38+
Documentation Archive
39+
=====================
40+
41+
.. toctree::
42+
:maxdepth: 1
43+
44+
archives

doc/api/training/sdp_versions/v1_2_x.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

2-
Version 1.2.x
3-
=============
2+
Version 1.2.x and 1.3.x
3+
=======================
44

55
.. toctree::
66
:maxdepth: 1

doc/api/training/smd_data_parallel.rst

Lines changed: 3 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -28,83 +28,9 @@ To learn more about the core features of this library, see
2828
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-intro.html>`_
2929
in the SageMaker Developer Guide.
3030

31-
Use with the SageMaker Python SDK
32-
=================================
33-
34-
To use the SageMaker distributed data parallel library with the SageMaker Python SDK, you will need the following:
35-
36-
- A TensorFlow or PyTorch training script that is
37-
adapted to use the distributed data parallel library. The :ref:`sdp_api_docs` includes
38-
framework specific examples of training scripts that are adapted to use this library.
39-
- Your input data must be in an S3 bucket or in FSx in the AWS region
40-
that you will use to launch your training job. If you use the Jupyter
41-
notebooks provided, create a SageMaker notebook instance in the same
42-
region as the bucket that contains your input data. For more
43-
information about storing your training data, refer to
44-
the `SageMaker Python SDK data
45-
inputs <https://sagemaker.readthedocs.io/en/stable/overview.html#use-file-systems-as-training-inputs>`__ documentation.
46-
47-
When you define
48-
a Pytorch or TensorFlow ``Estimator`` using the SageMaker Python SDK,
49-
you must select ``dataparallel`` as your ``distribution`` strategy:
50-
51-
.. code::
52-
53-
distribution = { "smdistributed": { "dataparallel": { "enabled": True } } }
54-
55-
We recommend you use one of the example notebooks as your template to launch a training job. When
56-
you use an example notebook you’ll need to swap your training script with the one that came with the
57-
notebook and modify any input functions as necessary. For instructions on how to get started using a
58-
Jupyter Notebook example, see `Distributed Training Jupyter Notebook Examples
59-
<https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training-notebook-examples.html>`_.
60-
61-
Once you have launched a training job, you can monitor it using CloudWatch. To learn more, see
62-
`Monitor and Analyze Training Jobs Using Metrics
63-
<https://docs.aws.amazon.com/sagemaker/latest/dg/training-metrics.html>`_.
64-
65-
66-
After you train a model, you can see how to deploy your trained model to an endpoint for inference by
67-
following one of the `example notebooks for deploying a model
68-
<https://sagemaker-examples.readthedocs.io/en/latest/inference/index.html>`_.
69-
For more information, see `Deploy Models for Inference
70-
<https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html>`_.
71-
72-
.. _sdp_api_docs:
73-
74-
API Documentation
75-
=================
76-
77-
This section contains the SageMaker distributed data parallel API documentation. If you are a
78-
new user of this library, it is recommended you use this guide alongside
79-
`SageMaker's Distributed Data Parallel Library
80-
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html>`_.
81-
82-
Select a version to see the API documentation for version.
83-
84-
.. toctree::
85-
:maxdepth: 1
86-
87-
sdp_versions/latest.rst
88-
sdp_versions/v1_1_x.rst
89-
sdp_versions/v1_0_0.rst
90-
91-
.. important::
92-
The distributed data parallel library only supports training jobs using CUDA 11. When you define a PyTorch or TensorFlow
93-
``Estimator`` with ``dataparallel`` parameter ``enabled`` set to ``True``,
94-
it uses CUDA 11. When you extend or customize your own training image
95-
you must use a CUDA 11 base image. See
96-
`SageMaker Python SDK's distributed data parallel library APIs
97-
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
98-
for more information.
99-
100-
101-
Release Notes
102-
=============
103-
104-
New features, bug fixes, and improvements are regularly made to the SageMaker
105-
distributed data parallel library.
106-
10731
.. toctree::
108-
:maxdepth: 1
32+
:maxdepth: 3
10933

34+
sdp_versions/latest
35+
smd_data_parallel_use_sm_pysdk
11036
smd_data_parallel_release_notes/smd_data_parallel_change_log

doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
.. _sdp_1.2.2_release_note:
22

3+
#############
4+
Release Notes
5+
#############
6+
7+
New features, bug fixes, and improvements are regularly made to the SageMaker
8+
distributed data parallel library.
9+
310
SageMaker Distributed Data Parallel 1.2.2 Release Notes
411
=======================================================
512

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
Use with the SageMaker Python SDK
2+
=================================
3+
4+
To use the SageMaker distributed data parallel library with the SageMaker Python SDK, you will need the following:
5+
6+
- A TensorFlow or PyTorch training script that is
7+
adapted to use the distributed data parallel library. The :ref:`sdp_api_docs` includes
8+
framework specific examples of training scripts that are adapted to use this library.
9+
- Your input data must be in an S3 bucket or in FSx in the AWS region
10+
that you will use to launch your training job. If you use the Jupyter
11+
notebooks provided, create a SageMaker notebook instance in the same
12+
region as the bucket that contains your input data. For more
13+
information about storing your training data, refer to
14+
the `SageMaker Python SDK data
15+
inputs <https://sagemaker.readthedocs.io/en/stable/overview.html#use-file-systems-as-training-inputs>`__ documentation.
16+
17+
When you define
18+
a Pytorch or TensorFlow ``Estimator`` using the SageMaker Python SDK,
19+
you must select ``dataparallel`` as your ``distribution`` strategy:
20+
21+
.. code:: python
22+
23+
distribution = { "smdistributed": { "dataparallel": { "enabled": True } } }
24+
25+
We recommend you use one of the example notebooks as your template to launch a training job. When
26+
you use an example notebook you’ll need to swap your training script with the one that came with the
27+
notebook and modify any input functions as necessary. For instructions on how to get started using a
28+
Jupyter Notebook example, see `Distributed Training Jupyter Notebook Examples
29+
<https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training-notebook-examples.html>`_.
30+
31+
Once you have launched a training job, you can monitor it using CloudWatch. To learn more, see
32+
`Monitor and Analyze Training Jobs Using Metrics
33+
<https://docs.aws.amazon.com/sagemaker/latest/dg/training-metrics.html>`_.
34+
35+
36+
After you train a model, you can see how to deploy your trained model to an endpoint for inference by
37+
following one of the `example notebooks for deploying a model
38+
<https://sagemaker-examples.readthedocs.io/en/latest/inference/index.html>`_.
39+
For more information, see `Deploy Models for Inference
40+
<https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html>`_.

doc/api/training/smp_versions/latest.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ To use the library, reference the Common API documentation alongside the framewo
2626
To find archived API documentation for the previous versions of the library,
2727
see the following link:
2828

29-
3029
Documentation Archive
3130
=====================
3231

0 commit comments

Comments
 (0)