@@ -28,83 +28,9 @@ To learn more about the core features of this library, see
28
28
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-intro.html> `_
29
29
in the SageMaker Developer Guide.
30
30
31
- Use with the SageMaker Python SDK
32
- =================================
33
-
34
- To use the SageMaker distributed data parallel library with the SageMaker Python SDK, you will need the following:
35
-
36
- - A TensorFlow or PyTorch training script that is
37
- adapted to use the distributed data parallel library. The :ref: `sdp_api_docs ` includes
38
- framework specific examples of training scripts that are adapted to use this library.
39
- - Your input data must be in an S3 bucket or in FSx in the AWS region
40
- that you will use to launch your training job. If you use the Jupyter
41
- notebooks provided, create a SageMaker notebook instance in the same
42
- region as the bucket that contains your input data. For more
43
- information about storing your training data, refer to
44
- the `SageMaker Python SDK data
45
- inputs <https://sagemaker.readthedocs.io/en/stable/overview.html#use-file-systems-as-training-inputs> `__ documentation.
46
-
47
- When you define
48
- a Pytorch or TensorFlow ``Estimator `` using the SageMaker Python SDK,
49
- you must select ``dataparallel `` as your ``distribution `` strategy:
50
-
51
- .. code ::
52
-
53
- distribution = { "smdistributed": { "dataparallel": { "enabled": True } } }
54
-
55
- We recommend you use one of the example notebooks as your template to launch a training job. When
56
- you use an example notebook you’ll need to swap your training script with the one that came with the
57
- notebook and modify any input functions as necessary. For instructions on how to get started using a
58
- Jupyter Notebook example, see `Distributed Training Jupyter Notebook Examples
59
- <https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training-notebook-examples.html> `_.
60
-
61
- Once you have launched a training job, you can monitor it using CloudWatch. To learn more, see
62
- `Monitor and Analyze Training Jobs Using Metrics
63
- <https://docs.aws.amazon.com/sagemaker/latest/dg/training-metrics.html> `_.
64
-
65
-
66
- After you train a model, you can see how to deploy your trained model to an endpoint for inference by
67
- following one of the `example notebooks for deploying a model
68
- <https://sagemaker-examples.readthedocs.io/en/latest/inference/index.html> `_.
69
- For more information, see `Deploy Models for Inference
70
- <https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html> `_.
71
-
72
- .. _sdp_api_docs :
73
-
74
- API Documentation
75
- =================
76
-
77
- This section contains the SageMaker distributed data parallel API documentation. If you are a
78
- new user of this library, it is recommended you use this guide alongside
79
- `SageMaker's Distributed Data Parallel Library
80
- <https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html> `_.
81
-
82
- Select a version to see the API documentation for version.
83
-
84
- .. toctree ::
85
- :maxdepth: 1
86
-
87
- sdp_versions/latest.rst
88
- sdp_versions/v1_1_x.rst
89
- sdp_versions/v1_0_0.rst
90
-
91
- .. important ::
92
- The distributed data parallel library only supports training jobs using CUDA 11. When you define a PyTorch or TensorFlow
93
- ``Estimator `` with ``dataparallel `` parameter ``enabled `` set to ``True ``,
94
- it uses CUDA 11. When you extend or customize your own training image
95
- you must use a CUDA 11 base image. See
96
- `SageMaker Python SDK's distributed data parallel library APIs
97
- <https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api> `_
98
- for more information.
99
-
100
-
101
- Release Notes
102
- =============
103
-
104
- New features, bug fixes, and improvements are regularly made to the SageMaker
105
- distributed data parallel library.
106
-
107
31
.. toctree ::
108
- :maxdepth: 1
32
+ :maxdepth: 3
109
33
34
+ sdp_versions/latest
35
+ smd_data_parallel_use_sm_pysdk
110
36
smd_data_parallel_release_notes/smd_data_parallel_change_log
0 commit comments