Skip to content

Updated release notes and API doc for smd model parallel 1.3.1 #2267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
# Sagemaker Distributed Model Parallel 1.3.1 Release Notes

- New Features
- Bug Fixes
- Known Issues

## New Features

### TensorFlow

- Exposes a new decorator ``register_post_partition_hook``. This allows invoking the decorated methods just after model partition but before executing the first step. For example loading a checkpoint. Refer to the [SageMaker distributed model parallel API documentation](https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/latest/smd_model_parallel_tensorflow.html) for more information.

## Bug Fixes

### PyTorch

- Improved memory efficiency when using active microbatches by clearing activations at end of each microbatch.

### TensorFlow

- Fixed issue that caused hangs when training some models with XLA enabled.

## Known Issues

### PyTorch

- A crash was observed when ``optimizer.step()`` was called for certain optimizers such as AdaDelta, when the partition on which this method was called has no local parameters assigned to it after partitioning. This is due to a bug in PyTorch which [has since been fixed](https://github.com/pytorch/pytorch/pull/52944). Till that makes its way to the next release of PyTorch, only call ``optimizer.step()`` on processes which have at least one local parameter. This can be checked like this ``len(list(model.local_parameters())) > 0``.

- A performance regression still exists when training on SMP with PyTorch 1.7.1 compared to 1.6. The rootcause was found to be the slowdown in performance of `.grad` method calls in PyTorch 1.7.1 compared to 1.6. See the related discussion: https://github.com/pytorch/pytorch/issues/50636. This issue does not exist with PyTorch 1.8.

# Sagemaker Distributed Model Parallel 1.3.0 Release Notes

- New Features
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,21 @@ TensorFlow API
    with smp.partition(3):
        z = tf.reduce_sum(y)             # placed in partition 3


.. function:: register_post_partition_hook(hook)

Registers a callable ``hook`` to
be executed after the model is partitioned. This is useful in situations
where an operation needs to be executed after the model partition during
the first call to ``smp.step``, but before the actual execution of the
first forward pass.

.. code:: python

@smp.register_post_partition_hook
def test_eager():
# All statements here will be executed right after partition but before the first forward pass
tf.print("Entered hook through eager context")

.. class:: smp.CheckpointManager

Expand All @@ -102,13 +116,6 @@ TensorFlow API
                      max_to_keep=None,
                      checkpoint_name="ckpt")


**Important:** ``smp.CheckpointManager.restore()`` must be called after
the first training step. This is because the first call of the
``smp.step`` function constructs and partitions the model, which must
take place before the checkpoint restore. Calling it before the first
``smp.step`` call might result in hangs or unexpected behavior.

**Parameters**

- ``checkpoint``: A `tf.train.Checkpoint
Expand Down Expand Up @@ -154,7 +161,8 @@ TensorFlow API
.. code:: python

for step, inputs in enumerate(train_ds):
    if step == 1:                    # NOTE: restore occurs on the second step
    if step == 0:
        ckpt_manager.restore()
    loss = train_step(inputs)