torch.distributed.pipelining tutorial #2962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

H-Huang merged 2 commits into pytorch:main from H-Huang:2.4-RC-TEST

Jul 9, 2024

Member

H-Huang commented Jul 3, 2024 •

edited

Loading

PyTorch 2.4 release of torch.distributed.pipelining, pytorch docs: https://pytorch.org/docs/main/distributed.pipelining.html#pipeline-parallelism

https://fburl.com/workplace/9w8z3das

tutorial preview: https://docs-preview.pytorch.org/pytorch/tutorials/2962/intermediate/pipelining_tutorial.html

pytorch-bot bot commented Jul 3, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2962

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit de14717 with merge base cad4839 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the cla signed label

H-Huang mentioned this pull request

[WIP] pipelining tutorial #2961

Closed

H-Huang added the 2.4 label

H-Huang requested a review from svekars

July 3, 2024 16:44

H-Huang force-pushed the 2.4-RC-TEST branch from 0100538 to 7bc866f Compare

July 3, 2024 18:14

svekars reviewed

View reviewed changes

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

H-Huang force-pushed the 2.4-RC-TEST branch from 7bc866f to 8279959 Compare

July 8, 2024 16:59

H-Huang mentioned this pull request

remove old pipeline parallel tutorials #2964

Merged

H-Huang requested a review from wconstab

July 8, 2024 17:18

svekars reviewed

View reviewed changes

Contributor

svekars left a comment

Thank you @H-Huang - some editorial suggestions. Let me know if you have any questions.

intermediate_source/pipelining_tutorial.rst Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

Contributor

svekars commented Jul 8, 2024

Also, since it's .rst, we can target to merge directly to main.

H-Huang force-pushed the 2.4-RC-TEST branch from 8279959 to c3dc90f Compare

July 8, 2024 19:24

H-Huang changed the base branch from 2.4-RC-TEST to main

July 8, 2024 19:24

Member Author

H-Huang commented Jul 8, 2024

Thanks so much for the comments @svekars, that was very helpful! Also updated the base branch to merge directly into main

H-Huang requested a review from svekars

July 8, 2024 19:26

H-Huang force-pushed the 2.4-RC-TEST branch from c3dc90f to dace205 Compare

July 8, 2024 21:11

svekars reviewed

View reviewed changes

intermediate_source/pipelining_tutorial.rst Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

intermediate_source/pipelining_tutorial.rst Show resolved Hide resolved

wconstab reviewed

View reviewed changes

intermediate_source/pipelining_tutorial.rst

+                    device = torch.device(f"cuda:{rank}") if torch.cuda.is_available() else torch.device("cpu")
+                    dist.init_process_group()
+                    pp_group = dist.new_group()

Contributor

wconstab Jul 9, 2024

its a little funny that we show creating a new group for pp usage, but we don't explain why, and it's the same size as the default group. I think its good to leave pp_group here but maybe add a comment explaining that in this example its trivial but it could be a sub-group in N-D parallel cases

wconstab reviewed

View reviewed changes

intermediate_source/pipelining_tutorial.rst Outdated Show resolved Hide resolved

wconstab approved these changes

View reviewed changes

Contributor

wconstab left a comment

lgtm!


          pipelining tutorials

df9c848

H-Huang force-pushed the 2.4-RC-TEST branch from bc489dd to df9c848 Compare

July 9, 2024 17:09


          Merge branch 'main' into 2.4-RC-TEST

de14717

H-Huang merged commit ffc8efc into pytorch:main

19 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels