File tree Expand file tree Collapse file tree 3 files changed +9
-3
lines changed
doc/api/training/smp_versions Expand file tree Collapse file tree 3 files changed +9
-3
lines changed Original file line number Diff line number Diff line change @@ -265,7 +265,9 @@ This API document assumes you use the following import statements in your traini
265
265
Returns the ``state_dict `` that contains optimizer state for the entire model.
266
266
It first collects the ``local_state_dict`` and gathers and merges
267
267
the ``local_state_dict`` from all ``mp_rank``s to create a full
268
- ``state_dict ``.
268
+ ``state_dict ``. Please note that this needs to be called on all ranks with
269
+ ``dp_rank()==0 `` to ensure the gather happens properly.
270
+ If it is only called on all such ranks, it can hang.
269
271
270
272
.. function :: load_state_dict( )
271
273
:noindex:
Original file line number Diff line number Diff line change @@ -232,7 +232,9 @@ This API document assumes you use the following import statements in your traini
232
232
Returns the ``state_dict `` that contains parameters
233
233
for the entire model. It first collects the \ ``local_state_dict`` and
234
234
gathers and merges the \ ``local_state_dict`` from all ``mp_rank ``\ s to
235
- create a full ``state_dict ``.
235
+ create a full ``state_dict ``. Please note that this needs to be called on all ranks with
236
+ ``dp_rank()==0 `` to ensure the gather happens properly.
237
+ If it is only called on all such ranks, it can hang.
236
238
237
239
.. function :: load_state_dict( )
238
240
Original file line number Diff line number Diff line change @@ -232,7 +232,9 @@ This API document assumes you use the following import statements in your traini
232
232
Returns the ``state_dict `` that contains parameters
233
233
for the entire model. It first collects the \ ``local_state_dict`` and
234
234
gathers and merges the \ ``local_state_dict`` from all ``mp_rank ``\ s to
235
- create a full ``state_dict ``.
235
+ create a full ``state_dict ``. Please note that this needs to be called on all ranks with
236
+ ``dp_rank()==0 `` to ensure the gather happens properly.
237
+ If it is only called on all such ranks, it can hang.
236
238
237
239
.. function :: load_state_dict( )
238
240
You can’t perform that action at this time.
0 commit comments