Merge branch 'main' into add-pypandoc

svekars · web-flow · commit 69c62018b51e · 2024-01-24T09:22:52.000-08:00
diff --git a/.devcontainer/requirements.txt b/.devcontainer/requirements.txt
@@ -24,7 +24,7 @@ ipython
 # to run examples
 pandas
 scikit-image
-pillow==10.0.1
+pillow==10.2.0
 wget
 
 # for codespaces env
diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst
@@ -74,7 +74,10 @@ common development trajectory would be:
 4. Use multi-machine `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
    and the `launching script <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__,
    if the application needs to scale across machine boundaries.
-5. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
+5. Use multi-GPU `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
+   training on a single-machine or multi-machine when the data and model cannot
+   fit on one GPU.
+6. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
    to launch distributed training if errors (e.g., out-of-memory) are expected or if
    resources can join and leave dynamically during training.
 
@@ -134,6 +137,18 @@ DDP materials are listed below:
 5. The `Distributed Training with Uneven Inputs Using the Join Context Manager <../advanced/generic_join.html>`__
    tutorial walks through using the generic join context for distributed training with uneven inputs.
 
+
+``torch.distributed.FullyShardedDataParallel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
+(FSDP) is a type of data parallelism paradigm which maintains a per-GPU copy of a model’s
+parameters, gradients and optimizer states, it shards all of these states across
+data-parallel workers. The support for FSDP was added starting PyTorch v1.11. The tutorial
+`Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
+provides in depth explanation and example of how FSDP works.
+
+
 torch.distributed.elastic
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/en-wordlist.txt b/en-wordlist.txt
@@ -95,6 +95,7 @@ ExportDB
 FC
 FGSM
 FLAVA
+FSDP
 FX
 FX's
 FloydHub
diff --git a/intermediate_source/reinforcement_q_learning.py b/intermediate_source/reinforcement_q_learning.py
@@ -403,7 +403,7 @@ def optimize_model():
     num_episodes = 50
 
 for i_episode in range(num_episodes):
-    # Initialize the environment and get it's state
+    # Initialize the environment and get its state
     state, info = env.reset()
     state = torch.tensor(state, dtype=torch.float32, device=device).unsqueeze(0)
     for t in count():

-Original file line number
+Diff line change
 FC
 FGSM
 FLAVA
 +FSDP
 FX
 FX's
 FloydHub