Skip to content

Commit 69c6201

Browse files
authored
Merge branch 'main' into add-pypandoc
2 parents 35cfb9e + 219a9e3 commit 69c6201

File tree

4 files changed

+19
-3
lines changed

4 files changed

+19
-3
lines changed

.devcontainer/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ ipython
2424
# to run examples
2525
pandas
2626
scikit-image
27-
pillow==10.0.1
27+
pillow==10.2.0
2828
wget
2929

3030
# for codespaces env

beginner_source/dist_overview.rst

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,10 @@ common development trajectory would be:
7474
4. Use multi-machine `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
7575
and the `launching script <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__,
7676
if the application needs to scale across machine boundaries.
77-
5. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
77+
5. Use multi-GPU `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
78+
training on a single-machine or multi-machine when the data and model cannot
79+
fit on one GPU.
80+
6. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
7881
to launch distributed training if errors (e.g., out-of-memory) are expected or if
7982
resources can join and leave dynamically during training.
8083

@@ -134,6 +137,18 @@ DDP materials are listed below:
134137
5. The `Distributed Training with Uneven Inputs Using the Join Context Manager <../advanced/generic_join.html>`__
135138
tutorial walks through using the generic join context for distributed training with uneven inputs.
136139

140+
141+
``torch.distributed.FullyShardedDataParallel``
142+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143+
144+
The `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
145+
(FSDP) is a type of data parallelism paradigm which maintains a per-GPU copy of a model’s
146+
parameters, gradients and optimizer states, it shards all of these states across
147+
data-parallel workers. The support for FSDP was added starting PyTorch v1.11. The tutorial
148+
`Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
149+
provides in depth explanation and example of how FSDP works.
150+
151+
137152
torch.distributed.elastic
138153
~~~~~~~~~~~~~~~~~~~~~~~~~
139154

en-wordlist.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ ExportDB
9595
FC
9696
FGSM
9797
FLAVA
98+
FSDP
9899
FX
99100
FX's
100101
FloydHub

intermediate_source/reinforcement_q_learning.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -403,7 +403,7 @@ def optimize_model():
403403
num_episodes = 50
404404

405405
for i_episode in range(num_episodes):
406-
# Initialize the environment and get it's state
406+
# Initialize the environment and get its state
407407
state, info = env.reset()
408408
state = torch.tensor(state, dtype=torch.float32, device=device).unsqueeze(0)
409409
for t in count():

0 commit comments

Comments
 (0)