Skip to content

Commit d17c0d9

Browse files
authored
Merge branch 'master' into fix-use-workflow-parameters-in-hyperparameters
2 parents 6419e4a + e7b9702 commit d17c0d9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+4355
-73
lines changed

CHANGELOG.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,40 @@
11
# Changelog
22

3+
## v2.31.1 (2021-03-23)
4+
5+
### Bug Fixes and Other Changes
6+
7+
* added documentation for Hugging Face Estimator
8+
* mark HuggingFace tests as release tests
9+
10+
### Documentation Changes
11+
12+
* adding version 1.1.0 docs for smdistributed.dataparallel
13+
14+
## v2.31.0 (2021-03-23)
15+
16+
### Features
17+
18+
* add HuggingFace framework estimator
19+
* update TF framework version support
20+
* Support all processor types in ProcessingStep
21+
22+
### Bug Fixes and Other Changes
23+
24+
* Add pipelines functions.
25+
26+
## v2.30.0 (2021-03-17)
27+
28+
### Features
29+
30+
* add support for PyTorch 1.8.0
31+
* Allow users to send custom attributes to the model endpoint
32+
33+
### Bug Fixes and Other Changes
34+
35+
* use ResolvedOutputS3Uir for Hive DDL LOCATION
36+
* Do lazy initialization in predictor
37+
338
## v2.29.2 (2021-03-11)
439

540
### Bug Fixes and Other Changes

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.29.3.dev0
1+
2.31.2.dev0

doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_pytorch.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ PyTorch Guide to SageMaker's distributed data parallel library
88
- :ref:`pytorch-sdp-api`
99

1010
.. _pytorch-sdp-modify:
11+
:noindex:
1112

1213
Modify a PyTorch training script to use SageMaker data parallel
1314
======================================================================
@@ -149,6 +150,7 @@ you will have for distributed training with the distributed data parallel librar
149150
150151
151152
.. _pytorch-sdp-api:
153+
:noindex:
152154

153155
PyTorch API
154156
===========
@@ -159,6 +161,7 @@ PyTorch API
159161

160162

161163
.. function:: smdistributed.dataparallel.torch.distributed.is_available()
164+
:noindex:
162165

163166
Check if script started as a distributed job. For local runs user can
164167
check that is_available returns False and run the training script
@@ -174,6 +177,7 @@ PyTorch API
174177

175178

176179
.. function:: smdistributed.dataparallel.torch.distributed.init_process_group(*args, **kwargs)
180+
:noindex:
177181

178182
Initialize ``smdistributed.dataparallel``. Must be called at the
179183
beginning of the training script, before calling any other methods.
@@ -198,6 +202,7 @@ PyTorch API
198202

199203

200204
.. function:: smdistributed.dataparallel.torch.distributed.is_initialized()
205+
:noindex:
201206

202207
Checks if the default process group has been initialized.
203208

@@ -211,6 +216,7 @@ PyTorch API
211216

212217

213218
.. function:: smdistributed.dataparallel.torch.distributed.get_world_size(group=smdistributed.dataparallel.torch.distributed.group.WORLD)
219+
:noindex:
214220

215221
The total number of GPUs across all the nodes in the cluster. For
216222
example, in a 8 node cluster with 8 GPU each, size will be equal to 64.
@@ -230,6 +236,7 @@ PyTorch API
230236

231237

232238
.. function:: smdistributed.dataparallel.torch.distributed.get_rank(group=smdistributed.dataparallel.torch.distributed.group.WORLD)
239+
:noindex:
233240

234241
The rank of the node in the cluster. The rank ranges from 0 to number of
235242
nodes - 1. This is similar to MPI's World Rank.
@@ -249,6 +256,7 @@ PyTorch API
249256

250257

251258
.. function:: smdistributed.dataparallel.torch.distributed.get_local_rank()
259+
:noindex:
252260

253261
Local rank refers to the relative rank of
254262
the ``smdistributed.dataparallel`` process within the node the current
@@ -267,6 +275,7 @@ PyTorch API
267275

268276

269277
.. function:: smdistributed.dataparallel.torch.distributed.all_reduce(tensor, op=smdistributed.dataparallel.torch.distributed.ReduceOp.SUM, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
278+
:noindex:
270279

271280
Performs an all-reduce operation on a tensor (torch.tensor) across
272281
all ``smdistributed.dataparallel`` workers
@@ -311,6 +320,7 @@ PyTorch API
311320

312321

313322
.. function:: smdistributed.dataparallel.torch.distributed.broadcast(tensor, src=0, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
323+
:noindex:
314324

315325
Broadcasts the tensor (torch.tensor) to the whole group.
316326

@@ -335,6 +345,7 @@ PyTorch API
335345

336346

337347
.. function:: smdistributed.dataparallel.torch.distributed.all_gather(tensor_list, tensor, group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
348+
:noindex:
338349

339350
Gathers tensors from the whole group in a list.
340351

@@ -361,6 +372,7 @@ PyTorch API
361372

362373

363374
.. function:: smdistributed.dataparallel.torch.distributed.all_to_all_single(output_t, input_t, output_split_sizes=None, input_split_sizes=None, group=group.WORLD, async_op=False)
375+
:noindex:
364376

365377
Each process scatters input tensor to all processes in a group and return gathered tensor in output.
366378

@@ -385,6 +397,7 @@ PyTorch API
385397

386398

387399
.. function:: smdistributed.dataparallel.torch.distributed.barrier(group=smdistributed.dataparallel.torch.distributed.group.WORLD, async_op=False)
400+
:noindex:
388401

389402
Synchronizes all ``smdistributed.dataparallel`` processes.
390403

@@ -410,6 +423,7 @@ PyTorch API
410423

411424

412425
.. class:: smdistributed.dataparallel.torch.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, broadcast_buffers=True, process_group=None, bucket_cap_mb=None)
426+
:noindex:
413427

414428
``smdistributed.dataparallel's`` implementation of distributed data
415429
parallelism for PyTorch. In most cases, wrapping your PyTorch Module
@@ -503,6 +517,7 @@ PyTorch API
503517

504518

505519
.. class:: smdistributed.dataparallel.torch.distributed.ReduceOp
520+
:noindex:
506521

507522
An enum-like class for supported reduction operations
508523
in ``smdistributed.dataparallel``.

doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_tensorflow.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ TensorFlow Guide to SageMaker's distributed data parallel library
88
- :ref:`tensorflow-sdp-api`
99

1010
.. _tensorflow-sdp-modify:
11+
:noindex:
1112

1213
Modify a TensorFlow 2.x training script to use SageMaker data parallel
1314
======================================================================
@@ -150,6 +151,7 @@ script you will have for distributed training with the library.
150151
151152
152153
.. _tensorflow-sdp-api:
154+
:noindex:
153155

154156
TensorFlow API
155157
==============
@@ -160,6 +162,7 @@ TensorFlow API
160162

161163

162164
.. function:: smdistributed.dataparallel.tensorflow.init()
165+
:noindex:
163166

164167
Initialize ``smdistributed.dataparallel``. Must be called at the
165168
beginning of the training script.
@@ -183,6 +186,7 @@ TensorFlow API
183186

184187

185188
.. function:: smdistributed.dataparallel.tensorflow.size()
189+
:noindex:
186190

187191
The total number of GPUs across all the nodes in the cluster. For
188192
example, in a 8 node cluster with 8 GPUs each, ``size`` will be equal
@@ -200,6 +204,7 @@ TensorFlow API
200204

201205

202206
.. function:: smdistributed.dataparallel.tensorflow.local_size()
207+
:noindex:
203208

204209
The total number of GPUs on a node. For example, on a node with 8
205210
GPUs, ``local_size`` will be equal to 8.
@@ -214,6 +219,7 @@ TensorFlow API
214219

215220

216221
.. function:: smdistributed.dataparallel.tensorflow.rank()
222+
:noindex:
217223

218224
The rank of the node in the cluster. The rank ranges from 0 to number of
219225
nodes - 1. This is similar to MPI's World Rank.
@@ -228,6 +234,7 @@ TensorFlow API
228234

229235

230236
.. function:: smdistributed.dataparallel.tensorflow.local_rank()
237+
:noindex:
231238

232239
Local rank refers to the relative rank of the
233240
GPUs’ ``smdistributed.dataparallel`` processes within the node. For
@@ -246,6 +253,7 @@ TensorFlow API
246253

247254

248255
.. function:: smdistributed.dataparallel.tensorflow.allreduce(tensor, param_index, num_params, compression=Compression.none, op=ReduceOp.AVERAGE)
256+
:noindex:
249257

250258
Performs an all-reduce operation on a tensor (``tf.Tensor``).
251259

@@ -273,6 +281,7 @@ TensorFlow API
273281

274282

275283
.. function:: smdistributed.dataparallel.tensorflow.broadcast_global_variables(root_rank)
284+
:noindex:
276285

277286
Broadcasts all global variables from root rank to all other processes.
278287

@@ -287,6 +296,7 @@ TensorFlow API
287296

288297

289298
.. function:: smdistributed.dataparallel.tensorflow.broadcast_variables(variables, root_rank)
299+
:noindex:
290300

291301
Applicable for TensorFlow 2.x only.
292302
@@ -309,6 +319,7 @@ TensorFlow API
309319

310320

311321
.. function:: smdistributed.dataparallel.tensorflow.oob_allreduce(tensor, compression=Compression.none, op=ReduceOp.AVERAGE)
322+
:noindex:
312323

313324
OutOfBand (oob) AllReduce is simplified AllReduce function for use cases
314325
such as calculating total loss across all the GPUs in the training.
@@ -342,6 +353,7 @@ TensorFlow API
342353

343354

344355
.. function:: smdistributed.dataparallel.tensorflow.overlap(tensor)
356+
:noindex:
345357

346358
This function is applicable only for models compiled with XLA. Use this
347359
function to enable ``smdistributed.dataparallel`` to efficiently
@@ -379,6 +391,7 @@ TensorFlow API
379391

380392

381393
.. function:: smdistributed.dataparallel.tensorflow.broadcast(tensor, root_rank)
394+
:noindex:
382395

383396
Broadcasts the input tensor on root rank to the same input tensor on all
384397
other ``smdistributed.dataparallel`` processes.
@@ -399,6 +412,7 @@ TensorFlow API
399412
400413
401414
.. function:: smdistributed.dataparallel.tensorflow.shutdown()
415+
:noindex:
402416
403417
Shuts down ``smdistributed.dataparallel``. Optional to call at the end
404418
of the training script.
@@ -413,6 +427,7 @@ TensorFlow API
413427

414428

415429
.. function:: smdistributed.dataparallel.tensorflow.DistributedOptimizer
430+
:noindex:
416431

417432
Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).
418433
@@ -453,6 +468,7 @@ TensorFlow API
453468

454469

455470
.. function:: smdistributed.dataparallel.tensorflow.DistributedGradientTape
471+
:noindex:
456472

457473
Applicable to TensorFlow 2.x only.
458474

@@ -488,6 +504,7 @@ TensorFlow API
488504

489505

490506
.. function:: smdistributed.dataparallel.tensorflow.BroadcastGlobalVariablesHook
507+
:noindex:
491508

492509
Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).
493510

@@ -516,6 +533,7 @@ TensorFlow API
516533
517534
518535
.. function:: smdistributed.dataparallel.tensorflow.Compression
536+
:noindex:
519537
520538
Optional Gradient Compression algorithm that can be used in AllReduce
521539
operation.
@@ -527,6 +545,7 @@ TensorFlow API
527545

528546

529547
.. function:: smdistributed.dataparallel.tensorflow.ReduceOp
548+
:noindex:
530549

531550
Supported reduction operations in ``smdistributed.dataparallel``.
532551

0 commit comments

Comments
 (0)