Skip to content

Commit 863a70c

Browse files
authored
Add specifics around DeepSpeed docs (#6142)
* Be more specific with DeepSpeed compatibility * Better wording
1 parent 0456b45 commit 863a70c

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

docs/source/advanced/multi_gpu.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -690,9 +690,9 @@ DeepSpeed
690690
.. note::
691691
The DeepSpeed plugin is in beta and the API is subject to change. Please create an `issue <https://github.com/PyTorchLightning/pytorch-lightning/issues>`_ if you run into any issues.
692692

693-
`DeepSpeed <https://github.com/microsoft/DeepSpeed>`_ offers additional CUDA deep learning training optimizations, similar to `FairScale <https://github.com/facebookresearch/fairscale>`_. DeepSpeed offers lower level training optimizations, and useful efficient optimizers such as `1-bit Adam <https://www.deepspeed.ai/tutorials/onebit-adam/>`_.
694-
Using the plugin, we were able to **train model sizes of 10 Billion parameters and above**, with a lot of useful information in this `benchmark <https://github.com/huggingface/transformers/issues/9996>`_ and the DeepSpeed `docs <https://www.deepspeed.ai/tutorials/megatron/>`_.
695-
We recommend using DeepSpeed in environments where speed and memory optimizations are important (such as training large billion parameter models). In addition, we recommend trying :ref:`sharded` first before trying DeepSpeed's further optimizations, primarily due to FairScale Sharded ease of use in scenarios such as multiple optimizers/schedulers.
693+
`DeepSpeed <https://github.com/microsoft/DeepSpeed>`_ is a deep learning training optimization library, providing the means to train massive billion parameter models at scale.
694+
Using the DeepSpeed plugin, we were able to **train model sizes of 10 Billion parameters and above**, with a lot of useful information in this `benchmark <https://github.com/huggingface/transformers/issues/9996>`_ and the DeepSpeed `docs <https://www.deepspeed.ai/tutorials/megatron/>`_.
695+
DeepSpeed also offers lower level training optimizations, and efficient optimizers such as `1-bit Adam <https://www.deepspeed.ai/tutorials/onebit-adam/>`_. We recommend using DeepSpeed in environments where speed and memory optimizations are important (such as training large billion parameter models).
696696

697697
To use DeepSpeed, you first need to install DeepSpeed using the commands below.
698698

@@ -706,7 +706,7 @@ Additionally if you run into any issues installing m4py, ensure you have openmpi
706706
.. note::
707707
Currently ``resume_from_checkpoint`` and manual optimization are not supported.
708708

709-
DeepSpeed only supports single optimizer, single scheduler.
709+
DeepSpeed currently only supports single optimizer, single scheduler within the training loop.
710710

711711
ZeRO-Offload
712712
""""""""""""

0 commit comments

Comments
 (0)