Skip to content

Commit abd71f4

Browse files
committed
add description for the new sdp degree param
1 parent c32e491 commit abd71f4

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

doc/api/training/smd_model_parallel_general.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,12 +247,14 @@ PyTorch-specific Parameters
247247
- int
248248
- 1
249249
- To run a training job using sharded data parallelism, add this parameter and specify a number greater than 1.
250+
Sharded data parallelism is a memory-saving distributed training technique that splits the training state of a model (model parameters, gradients, and optimizer states) across GPUs in a data parallel group.
250251
For more information, see `Sharded Data Parallelism
251252
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html>`_.
252253
* - ``sdp_reduce_bucket_size`` (**smdistributed-modelparallel**>=v1.11)
253254
- int
254255
- 5e8
255-
- Specifies the size of PyTorch DDP gradient buckets in number of elements of the default dtype.
256+
- Configuration parameter for sharded data parallelism (for ``sharded_data_parallel_degree > 2``).
257+
Specifies the size of PyTorch DDP gradient buckets in number of elements of the default dtype.
256258
* - ``sdp_param_persistence_threshold`` (**smdistributed-modelparallel**>=v1.11)
257259
- int
258260
- 1e6

0 commit comments

Comments
 (0)