Skip to content

Commit bde5c34

Browse files
author
Talia Chopra
committed
documentation: adding new section for smdataparallel 1.1.0
1 parent 82e7f90 commit bde5c34

File tree

7 files changed

+1100
-36
lines changed

7 files changed

+1100
-36
lines changed

doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_pytorch.rst

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ PyTorch Guide to SageMaker's distributed data parallel library
88
- :ref:`pytorch-sdp-api`
99

1010
.. _pytorch-sdp-modify:
11-
:noindex:
11+
:noindex:
1212

1313
Modify a PyTorch training script to use SageMaker data parallel
1414
======================================================================
@@ -150,7 +150,7 @@ you will have for distributed training with the distributed data parallel librar
150150
151151
152152
.. _pytorch-sdp-api:
153-
:noindex:
153+
:noindex:
154154

155155
PyTorch API
156156
===========
@@ -161,7 +161,7 @@ PyTorch API
161161

162162

163163
.. function:: smdistributed.dataparallel.torch.distributed.is_available()
164-
:noindex:
164+
:noindex:
165165

166166
Check if script started as a distributed job. For local runs user can
167167
check that is_available returns False and run the training script
@@ -177,7 +177,7 @@ PyTorch API
177177

178178

179179
.. function:: smdistributed.dataparallel.torch.distributed.init_process_group(*args, **kwargs)
180-
:noindex:
180+
:noindex:
181181

182182
Initialize ``smdistributed.dataparallel``. Must be called at the
183183
beginning of the training script, before calling any other methods.
@@ -202,7 +202,7 @@ PyTorch API
202202

203203

204204
.. function:: smdistributed.dataparallel.torch.distributed.is_initialized()
205-
:noindex:
205+
:noindex:
206206

207207
Checks if the default process group has been initialized.
208208

@@ -216,7 +216,7 @@ PyTorch API
216216

217217

218218
.. function:: smdistributed.dataparallel.torch.distributed.get_world_size(group=smdistributed.dataparallel.torch.distributed.group.WORLD)
219-
:noindex:
219+
:noindex:
220220

221221
The total number of GPUs across all the nodes in the cluster. For
222222
example, in a 8 node cluster with 8 GPU each, size will be equal to 64.
@@ -236,7 +236,7 @@ PyTorch API
236236

237237

238238
.. function:: smdistributed.dataparallel.torch.distributed.get_rank(group=smdistributed.dataparallel.torch.distributed.group.WORLD)
239-
:noindex:
239+
:noindex:
240240

241241
The rank of the node in the cluster. The rank ranges from 0 to number of
242242
nodes - 1. This is similar to MPI's World Rank.
@@ -256,7 +256,7 @@ PyTorch API
256256

257257

258258
.. function:: smdistributed.dataparallel.torch.distributed.get_local_rank()
259-
:noindex:
259+
:noindex:
260260

261261
Local rank refers to the relative rank of
262262
the ``smdistributed.dataparallel`` process within the node the current
@@ -275,7 +275,7 @@ PyTorch API
275275

276276

277277
.. function:: smdistributed.dataparallel.torch.distributed.all_reduce(tensor, op=smdistributed.dataparallel.torch.distributed.ReduceOp.SUM, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
278-
:noindex:
278+
:noindex:
279279

280280
Performs an all-reduce operation on a tensor (torch.tensor) across
281281
all ``smdistributed.dataparallel`` workers
@@ -320,7 +320,7 @@ PyTorch API
320320

321321

322322
.. function:: smdistributed.dataparallel.torch.distributed.broadcast(tensor, src=0, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
323-
:noindex:
323+
:noindex:
324324

325325
Broadcasts the tensor (torch.tensor) to the whole group.
326326

@@ -345,7 +345,7 @@ PyTorch API
345345

346346

347347
.. function:: smdistributed.dataparallel.torch.distributed.all_gather(tensor_list, tensor, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
348-
:noindex:
348+
:noindex:
349349

350350
Gathers tensors from the whole group in a list.
351351

@@ -372,7 +372,7 @@ PyTorch API
372372

373373

374374
.. function:: smdistributed.dataparallel.torch.distributed.all_to_all_single(output_t, input_t, output_split_sizes=None, input_split_sizes=None, group=group.WORLD, async_op=False)
375-
:noindex:
375+
:noindex:
376376

377377
Each process scatters input tensor to all processes in a group and return gathered tensor in output.
378378

@@ -397,7 +397,7 @@ PyTorch API
397397

398398

399399
.. function:: smdistributed.dataparallel.torch.distributed.barrier(group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
400-
:noindex:
400+
:noindex:
401401

402402
Synchronizes all ``smdistributed.dataparallel`` processes.
403403

@@ -423,7 +423,7 @@ PyTorch API
423423

424424

425425
.. class:: smdistributed.dataparallel.torch.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, broadcast_buffers=True, process_group=None, bucket_cap_mb=None)
426-
:noindex:
426+
:noindex:
427427

428428
``smdistributed.dataparallel's`` implementation of distributed data
429429
parallelism for PyTorch. In most cases, wrapping your PyTorch Module
@@ -517,7 +517,7 @@ PyTorch API
517517

518518

519519
.. class:: smdistributed.dataparallel.torch.distributed.ReduceOp
520-
:noindex:
520+
:noindex:
521521

522522
An enum-like class for supported reduction operations
523523
in ``smdistributed.dataparallel``.

doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_tensorflow.rst

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ TensorFlow Guide to SageMaker's distributed data parallel library
88
- :ref:`tensorflow-sdp-api`
99

1010
.. _tensorflow-sdp-modify:
11-
:noindex:
11+
:noindex:
1212

1313
Modify a TensorFlow 2.x training script to use SageMaker data parallel
1414
======================================================================
@@ -151,7 +151,7 @@ script you will have for distributed training with the library.
151151
152152
153153
.. _tensorflow-sdp-api:
154-
:noindex:
154+
:noindex:
155155

156156
TensorFlow API
157157
==============
@@ -162,7 +162,7 @@ TensorFlow API
162162

163163

164164
.. function:: smdistributed.dataparallel.tensorflow.init()
165-
:noindex:
165+
:noindex:
166166

167167
Initialize ``smdistributed.dataparallel``. Must be called at the
168168
beginning of the training script.
@@ -186,7 +186,7 @@ TensorFlow API
186186

187187

188188
.. function:: smdistributed.dataparallel.tensorflow.size()
189-
:noindex:
189+
:noindex:
190190

191191
The total number of GPUs across all the nodes in the cluster. For
192192
example, in a 8 node cluster with 8 GPUs each, ``size`` will be equal
@@ -204,7 +204,7 @@ TensorFlow API
204204

205205

206206
.. function:: smdistributed.dataparallel.tensorflow.local_size()
207-
:noindex:
207+
:noindex:
208208

209209
The total number of GPUs on a node. For example, on a node with 8
210210
GPUs, ``local_size`` will be equal to 8.
@@ -219,7 +219,7 @@ TensorFlow API
219219

220220

221221
.. function:: smdistributed.dataparallel.tensorflow.rank()
222-
:noindex:
222+
:noindex:
223223

224224
The rank of the node in the cluster. The rank ranges from 0 to number of
225225
nodes - 1. This is similar to MPI's World Rank.
@@ -234,7 +234,7 @@ TensorFlow API
234234

235235

236236
.. function:: smdistributed.dataparallel.tensorflow.local_rank()
237-
:noindex:
237+
:noindex:
238238

239239
Local rank refers to the relative rank of the
240240
GPUs’ ``smdistributed.dataparallel`` processes within the node. For
@@ -253,7 +253,7 @@ TensorFlow API
253253

254254

255255
.. function:: smdistributed.dataparallel.tensorflow.allreduce(tensor, param_index, num_params, compression=Compression.none, op=ReduceOp.AVERAGE)
256-
:noindex:
256+
:noindex:
257257

258258
Performs an all-reduce operation on a tensor (``tf.Tensor``).
259259

@@ -281,7 +281,7 @@ TensorFlow API
281281

282282

283283
.. function:: smdistributed.dataparallel.tensorflow.broadcast_global_variables(root_rank)
284-
:noindex:
284+
:noindex:
285285

286286
Broadcasts all global variables from root rank to all other processes.
287287

@@ -296,7 +296,7 @@ TensorFlow API
296296

297297

298298
.. function:: smdistributed.dataparallel.tensorflow.broadcast_variables(variables, root_rank)
299-
:noindex:
299+
:noindex:
300300

301301
Applicable for TensorFlow 2.x only.
302302
@@ -319,7 +319,7 @@ TensorFlow API
319319

320320

321321
.. function:: smdistributed.dataparallel.tensorflow.oob_allreduce(tensor, compression=Compression.none, op=ReduceOp.AVERAGE)
322-
:noindex:
322+
:noindex:
323323

324324
OutOfBand (oob) AllReduce is simplified AllReduce function for use cases
325325
such as calculating total loss across all the GPUs in the training.
@@ -353,7 +353,7 @@ TensorFlow API
353353

354354

355355
.. function:: smdistributed.dataparallel.tensorflow.overlap(tensor)
356-
:noindex:
356+
:noindex:
357357

358358
This function is applicable only for models compiled with XLA. Use this
359359
function to enable ``smdistributed.dataparallel`` to efficiently
@@ -391,7 +391,7 @@ TensorFlow API
391391

392392

393393
.. function:: smdistributed.dataparallel.tensorflow.broadcast(tensor, root_rank)
394-
:noindex:
394+
:noindex:
395395

396396
Broadcasts the input tensor on root rank to the same input tensor on all
397397
other ``smdistributed.dataparallel`` processes.
@@ -412,7 +412,7 @@ TensorFlow API
412412
413413
414414
.. function:: smdistributed.dataparallel.tensorflow.shutdown()
415-
:noindex:
415+
:noindex:
416416
417417
Shuts down ``smdistributed.dataparallel``. Optional to call at the end
418418
of the training script.
@@ -427,7 +427,7 @@ TensorFlow API
427427

428428

429429
.. function:: smdistributed.dataparallel.tensorflow.DistributedOptimizer
430-
:noindex:
430+
:noindex:
431431

432432
Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).
433433
@@ -468,7 +468,7 @@ TensorFlow API
468468

469469

470470
.. function:: smdistributed.dataparallel.tensorflow.DistributedGradientTape
471-
:noindex:
471+
:noindex:
472472

473473
Applicable to TensorFlow 2.x only.
474474

@@ -504,7 +504,7 @@ TensorFlow API
504504

505505

506506
.. function:: smdistributed.dataparallel.tensorflow.BroadcastGlobalVariablesHook
507-
:noindex:
507+
:noindex:
508508

509509
Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).
510510

@@ -533,7 +533,7 @@ TensorFlow API
533533
534534
535535
.. function:: smdistributed.dataparallel.tensorflow.Compression
536-
:noindex:
536+
:noindex:
537537
538538
Optional Gradient Compression algorithm that can be used in AllReduce
539539
operation.
@@ -545,7 +545,7 @@ TensorFlow API
545545

546546

547547
.. function:: smdistributed.dataparallel.tensorflow.ReduceOp
548-
:noindex:
548+
:noindex:
549549

550550
Supported reduction operations in ``smdistributed.dataparallel``.
551551

0 commit comments

Comments
 (0)