Skip to content

documentation: adding change log for smd model parallel #2056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 21 additions & 11 deletions doc/api/training/smd_model_parallel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,21 @@ Use the following sections to learn more about the model parallelism and the lib
<https://integ-docs-aws.amazon.com/sagemaker/latest/dg/model-parallel-use-api.html#model-parallel-customize-container>`__
for more information.

How to Use this Guide
=====================

The library contains a Common API that is shared across frameworks, as well as APIs
that are specific to supported frameworks, TensorFlow and PyTorch. To use the library, reference the
**Common API** documentation alongside the framework specific API documentation.

.. toctree::
:maxdepth: 1

smd_model_parallel_general
smd_model_parallel_common_api
smd_model_parallel_pytorch
smd_model_parallel_tensorflow

It is recommended to use this documentation alongside `SageMaker Distributed Model Parallel
<http://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel.html>`__ in the Amazon SageMaker
developer guide. This developer guide documentation includes:
Expand All @@ -34,17 +49,12 @@ developer guide. This developer guide documentation includes:
- `Configuration tips and pitfalls
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-tips-pitfalls.html>`__

**How to Use this Guide**

The library contains a Common API that is shared across frameworks, as well as APIs
that are specific to supported frameworks, TensorFlow and PyTroch. To use the library, reference the
**Common API** documentation alongside framework specific API documentation.
Latest Updates
==============

New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library.

.. toctree::
:maxdepth: 1
To see the the latest changes made to the library, refer to the library
`Release Notes
<https://github.com/aws/sagemaker-python-sdk/blob/master/doc/api/training/smd_model_parallel_release_notes/>`_.

smd_model_parallel_general
smd_model_parallel_common_api
smd_model_parallel_pytorch
smd_model_parallel_tensorflow
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Sagemaker Distributed Model Parallel 1.1.0 Release Notes

- New Features
- Bug Fixes
- Improvements
- Performance
- Known Issues

## New Features

The following sections describe new feature releases that are common across frameworks and that are framework specific.

### Common across frameworks

#### Custom slicing support (`smp_slice` method) for objects passed to `smp.step` decorated functions

To pass an object to `smp.step` that contains tensors that needs to be split across
microbatches and is not an instance of list, dict, tuple or set, you should implement `smp_slice` method for the object.

Below is an example of how to use this with PyTorch:

```
class CustomType:
def __init__(self, tensor):
self.data = tensor

# SMP will call this to invoke slicing on the object passing in total microbatches (num_mb)
# and the current microbatch index (mb).
def smp_slice(self, num_mb, mb, axis):
dim_size = list(self.data.size())[axis]

split_size = dim_size // num_mb
sliced_tensor = self.data.narrow(axis, mb * split_size, split_size)
return CustomType(sliced_tensor, self.other)

custom_obj = CustomType(torch.ones(4,))

@smp.step()
def step(custom_obj):
loss = model(custom_obj)
model.backward(loss)
return loss
```

### PyTorch

#### Add support for smp.DistributedModel.cpu()

`smp.DistributedModel.cpu()`
[allgather](https://sagemaker.readthedocs.io/en/stable/api/training/smd_model_parallel_common_api.html#smp.allgather)s
parameters and buffers across all `mp_ranks` and moves them to the CPU.

#### Add `trace_memory_usage` option to `smp.DistributedModel` to measure memory usage per module

Adds `trace_memory_usage` option to `smp.DistributedModel`. This attempts to measure memory usage per module during
tracing. If this is disabled, memory usage is estimated through the sizes of tensors returned from the module.
This option is disabled by default.

## Bug Fixes

### PyTorch

- `torch.nn.Sequential`: Fix a bug with `torch.nn.Sequential` which causes a failure with the error message : `shouldnt go less than 0, there is a bug` when the inputs to the first module don't require grads.

- `smp.DistributedModel`: Fix a bug with `DistributedModel` execution when a module has multiple parents. The bug surfaces with the error message: `actual_parent should be different than module_execution_stack parent only for torch.nn.ModuleList`

- `apex.optimizers.FusedNovoGrad`: Fix a bug with `apex.optimizers.FusedNovoGrad` which surfaces with the error message: `KeyError: 'exp_avg_sq'`

## Improvements

### Usability

#### PyTorch

- `smp.DistributedModel`: Improve the error message when the forward pass on `smp.DistributedModel` is called outside the `smp.step` decorated function.

- `smp.load`: Add user friendly error messages when loading checkpoints with `smp.load`.

### Partitioning Algorithm

#### PyTorch

- Better memory balancing by taking into account the existing modules already assigned to the parent, while partitioning the children of a given module.

## Performance

### Tensorflow

- Addresses long pre-processing times introduced by SMP XLA optimizer when dealing with large graphs and large number of microbatches. BERT (large) preprocessing time goes down from 40 minutes to 6 minutes on p3.16xlarge.

## Known Issues

### PyTorch

- Serialization for Torch in SMP overwrites instances of dict subclass to be dict itself, instead of the instances of subclass. One of the use cases which fails because of this issue is when a user implements a subclass of OrderedDict with the `__getitem__` method. After serialization/deserialization in SMP, indexing on the object will lead to errors. A workaround is to use the dict keys to access the underlying item.