@@ -8,6 +8,7 @@ TensorFlow Guide to SageMaker's distributed data parallel library
8
8
- :ref: `tensorflow-sdp-api `
9
9
10
10
.. _tensorflow-sdp-modify :
11
+ :noindex:
11
12
12
13
Modify a TensorFlow 2.x training script to use SageMaker data parallel
13
14
======================================================================
@@ -150,6 +151,7 @@ script you will have for distributed training with the library.
150
151
151
152
152
153
.. _tensorflow-sdp-api :
154
+ :noindex:
153
155
154
156
TensorFlow API
155
157
==============
@@ -160,6 +162,7 @@ TensorFlow API
160
162
161
163
162
164
.. function :: smdistributed.dataparallel.tensorflow.init()
165
+ :noindex:
163
166
164
167
Initialize ``smdistributed.dataparallel ``. Must be called at the
165
168
beginning of the training script.
@@ -183,6 +186,7 @@ TensorFlow API
183
186
184
187
185
188
.. function :: smdistributed.dataparallel.tensorflow.size()
189
+ :noindex:
186
190
187
191
The total number of GPUs across all the nodes in the cluster. For
188
192
example, in a 8 node cluster with 8 GPUs each, ``size `` will be equal
@@ -200,6 +204,7 @@ TensorFlow API
200
204
201
205
202
206
.. function :: smdistributed.dataparallel.tensorflow.local_size()
207
+ :noindex:
203
208
204
209
The total number of GPUs on a node. For example, on a node with 8
205
210
GPUs, ``local_size `` will be equal to 8.
@@ -214,6 +219,7 @@ TensorFlow API
214
219
215
220
216
221
.. function :: smdistributed.dataparallel.tensorflow.rank()
222
+ :noindex:
217
223
218
224
The rank of the node in the cluster. The rank ranges from 0 to number of
219
225
nodes - 1. This is similar to MPI's World Rank.
@@ -228,6 +234,7 @@ TensorFlow API
228
234
229
235
230
236
.. function :: smdistributed.dataparallel.tensorflow.local_rank()
237
+ :noindex:
231
238
232
239
Local rank refers to the relative rank of the
233
240
GPUs’ ``smdistributed.dataparallel `` processes within the node. For
@@ -246,6 +253,7 @@ TensorFlow API
246
253
247
254
248
255
.. function :: smdistributed.dataparallel.tensorflow.allreduce(tensor, param_index, num_params, compression=Compression.none, op=ReduceOp.AVERAGE)
256
+ :noindex:
249
257
250
258
Performs an all-reduce operation on a tensor (``tf.Tensor ``).
251
259
@@ -273,6 +281,7 @@ TensorFlow API
273
281
274
282
275
283
.. function :: smdistributed.dataparallel.tensorflow.broadcast_global_variables(root_rank)
284
+ :noindex:
276
285
277
286
Broadcasts all global variables from root rank to all other processes.
278
287
@@ -287,6 +296,7 @@ TensorFlow API
287
296
288
297
289
298
.. function :: smdistributed.dataparallel.tensorflow.broadcast_variables(variables, root_rank)
299
+ :noindex:
290
300
291
301
Applicable for TensorFlow 2.x only.
292
302
@@ -309,6 +319,7 @@ TensorFlow API
309
319
310
320
311
321
.. function :: smdistributed.dataparallel.tensorflow.oob_allreduce(tensor, compression=Compression.none, op=ReduceOp.AVERAGE)
322
+ :noindex:
312
323
313
324
OutOfBand (oob) AllReduce is simplified AllReduce function for use cases
314
325
such as calculating total loss across all the GPUs in the training.
@@ -342,6 +353,7 @@ TensorFlow API
342
353
343
354
344
355
.. function :: smdistributed.dataparallel.tensorflow.overlap(tensor)
356
+ :noindex:
345
357
346
358
This function is applicable only for models compiled with XLA. Use this
347
359
function to enable ``smdistributed.dataparallel `` to efficiently
@@ -379,6 +391,7 @@ TensorFlow API
379
391
380
392
381
393
.. function :: smdistributed.dataparallel.tensorflow.broadcast(tensor, root_rank)
394
+ :noindex:
382
395
383
396
Broadcasts the input tensor on root rank to the same input tensor on all
384
397
other ``smdistributed.dataparallel `` processes.
@@ -399,6 +412,7 @@ TensorFlow API
399
412
400
413
401
414
.. function:: smdistributed.dataparallel.tensorflow.shutdown()
415
+ :noindex:
402
416
403
417
Shuts down ``smdistributed.dataparallel ``. Optional to call at the end
404
418
of the training script.
@@ -413,6 +427,7 @@ TensorFlow API
413
427
414
428
415
429
.. function :: smdistributed.dataparallel.tensorflow.DistributedOptimizer
430
+ :noindex:
416
431
417
432
Applicable if you use the ``tf.estimator `` API in TensorFlow 2.x (2.3.1).
418
433
@@ -453,6 +468,7 @@ TensorFlow API
453
468
454
469
455
470
.. function :: smdistributed.dataparallel.tensorflow.DistributedGradientTape
471
+ :noindex:
456
472
457
473
Applicable to TensorFlow 2.x only.
458
474
@@ -488,6 +504,7 @@ TensorFlow API
488
504
489
505
490
506
.. function :: smdistributed.dataparallel.tensorflow.BroadcastGlobalVariablesHook
507
+ :noindex:
491
508
492
509
Applicable if you use the ``tf.estimator `` API in TensorFlow 2.x (2.3.1).
493
510
@@ -516,6 +533,7 @@ TensorFlow API
516
533
517
534
518
535
.. function:: smdistributed.dataparallel.tensorflow.Compression
536
+ :noindex:
519
537
520
538
Optional Gradient Compression algorithm that can be used in AllReduce
521
539
operation.
@@ -527,6 +545,7 @@ TensorFlow API
527
545
528
546
529
547
.. function :: smdistributed.dataparallel.tensorflow.ReduceOp
548
+ :noindex:
530
549
531
550
Supported reduction operations in ``smdistributed.dataparallel ``.
532
551
0 commit comments