@@ -8,7 +8,7 @@ TensorFlow Guide to SageMaker's distributed data parallel library
8
8
- :ref: `tensorflow-sdp-api `
9
9
10
10
.. _tensorflow-sdp-modify :
11
- :noindex:
11
+ :noindex:
12
12
13
13
Modify a TensorFlow 2.x training script to use SageMaker data parallel
14
14
======================================================================
@@ -151,7 +151,7 @@ script you will have for distributed training with the library.
151
151
152
152
153
153
.. _tensorflow-sdp-api :
154
- :noindex:
154
+ :noindex:
155
155
156
156
TensorFlow API
157
157
==============
@@ -162,7 +162,7 @@ TensorFlow API
162
162
163
163
164
164
.. function :: smdistributed.dataparallel.tensorflow.init()
165
- :noindex:
165
+ :noindex:
166
166
167
167
Initialize ``smdistributed.dataparallel ``. Must be called at the
168
168
beginning of the training script.
@@ -186,7 +186,7 @@ TensorFlow API
186
186
187
187
188
188
.. function :: smdistributed.dataparallel.tensorflow.size()
189
- :noindex:
189
+ :noindex:
190
190
191
191
The total number of GPUs across all the nodes in the cluster. For
192
192
example, in a 8 node cluster with 8 GPUs each, ``size `` will be equal
@@ -204,7 +204,7 @@ TensorFlow API
204
204
205
205
206
206
.. function :: smdistributed.dataparallel.tensorflow.local_size()
207
- :noindex:
207
+ :noindex:
208
208
209
209
The total number of GPUs on a node. For example, on a node with 8
210
210
GPUs, ``local_size `` will be equal to 8.
@@ -219,7 +219,7 @@ TensorFlow API
219
219
220
220
221
221
.. function :: smdistributed.dataparallel.tensorflow.rank()
222
- :noindex:
222
+ :noindex:
223
223
224
224
The rank of the node in the cluster. The rank ranges from 0 to number of
225
225
nodes - 1. This is similar to MPI's World Rank.
@@ -234,7 +234,7 @@ TensorFlow API
234
234
235
235
236
236
.. function :: smdistributed.dataparallel.tensorflow.local_rank()
237
- :noindex:
237
+ :noindex:
238
238
239
239
Local rank refers to the relative rank of the
240
240
GPUs’ ``smdistributed.dataparallel `` processes within the node. For
@@ -253,7 +253,7 @@ TensorFlow API
253
253
254
254
255
255
.. function :: smdistributed.dataparallel.tensorflow.allreduce(tensor, param_index, num_params, compression=Compression.none, op=ReduceOp.AVERAGE)
256
- :noindex:
256
+ :noindex:
257
257
258
258
Performs an all-reduce operation on a tensor (``tf.Tensor ``).
259
259
@@ -281,7 +281,7 @@ TensorFlow API
281
281
282
282
283
283
.. function :: smdistributed.dataparallel.tensorflow.broadcast_global_variables(root_rank)
284
- :noindex:
284
+ :noindex:
285
285
286
286
Broadcasts all global variables from root rank to all other processes.
287
287
@@ -296,7 +296,7 @@ TensorFlow API
296
296
297
297
298
298
.. function :: smdistributed.dataparallel.tensorflow.broadcast_variables(variables, root_rank)
299
- :noindex:
299
+ :noindex:
300
300
301
301
Applicable for TensorFlow 2.x only.
302
302
@@ -319,7 +319,7 @@ TensorFlow API
319
319
320
320
321
321
.. function :: smdistributed.dataparallel.tensorflow.oob_allreduce(tensor, compression=Compression.none, op=ReduceOp.AVERAGE)
322
- :noindex:
322
+ :noindex:
323
323
324
324
OutOfBand (oob) AllReduce is simplified AllReduce function for use cases
325
325
such as calculating total loss across all the GPUs in the training.
@@ -353,7 +353,7 @@ TensorFlow API
353
353
354
354
355
355
.. function :: smdistributed.dataparallel.tensorflow.overlap(tensor)
356
- :noindex:
356
+ :noindex:
357
357
358
358
This function is applicable only for models compiled with XLA. Use this
359
359
function to enable ``smdistributed.dataparallel `` to efficiently
@@ -391,7 +391,7 @@ TensorFlow API
391
391
392
392
393
393
.. function :: smdistributed.dataparallel.tensorflow.broadcast(tensor, root_rank)
394
- :noindex:
394
+ :noindex:
395
395
396
396
Broadcasts the input tensor on root rank to the same input tensor on all
397
397
other ``smdistributed.dataparallel `` processes.
@@ -412,7 +412,7 @@ TensorFlow API
412
412
413
413
414
414
.. function:: smdistributed.dataparallel.tensorflow.shutdown()
415
- :noindex:
415
+ :noindex:
416
416
417
417
Shuts down ``smdistributed.dataparallel ``. Optional to call at the end
418
418
of the training script.
@@ -427,7 +427,7 @@ TensorFlow API
427
427
428
428
429
429
.. function :: smdistributed.dataparallel.tensorflow.DistributedOptimizer
430
- :noindex:
430
+ :noindex:
431
431
432
432
Applicable if you use the ``tf.estimator `` API in TensorFlow 2.x (2.3.1).
433
433
@@ -468,7 +468,7 @@ TensorFlow API
468
468
469
469
470
470
.. function :: smdistributed.dataparallel.tensorflow.DistributedGradientTape
471
- :noindex:
471
+ :noindex:
472
472
473
473
Applicable to TensorFlow 2.x only.
474
474
@@ -504,7 +504,7 @@ TensorFlow API
504
504
505
505
506
506
.. function :: smdistributed.dataparallel.tensorflow.BroadcastGlobalVariablesHook
507
- :noindex:
507
+ :noindex:
508
508
509
509
Applicable if you use the ``tf.estimator `` API in TensorFlow 2.x (2.3.1).
510
510
@@ -533,7 +533,7 @@ TensorFlow API
533
533
534
534
535
535
.. function:: smdistributed.dataparallel.tensorflow.Compression
536
- :noindex:
536
+ :noindex:
537
537
538
538
Optional Gradient Compression algorithm that can be used in AllReduce
539
539
operation.
@@ -545,7 +545,7 @@ TensorFlow API
545
545
546
546
547
547
.. function :: smdistributed.dataparallel.tensorflow.ReduceOp
548
- :noindex:
548
+ :noindex:
549
549
550
550
Supported reduction operations in ``smdistributed.dataparallel ``.
551
551
0 commit comments