Used functional all-reduce for amax reduction #219

awgu · 2024-02-16T15:34:32Z

Stack from ghstack (oldest at bottom):

Dynamo cannot remap the in-place max all-reduce to its functional equivalent on PyTorch main branch.
pytorch/pytorch#120082

[ghstack-poisoned]

pytorch/pytorch#120082 [ghstack-poisoned]

Dynamo cannot remap the in-place max all-reduce to its functional equivalent on PyTorch `main` branch. pytorch/pytorch#120082 [ghstack-poisoned]

drisspg

Looks great!

awgu · 2024-02-27T18:16:50Z

I will hold off on landing this since @yifuwang will land the Dynamo rewrite fix soon.

yifuwang · 2024-02-27T19:49:16Z

float8_experimental/float8_utils.py

+        # https://github.com/pytorch/pytorch/issues/120082
+        # Use functional all-reduce to avoid graph breaking.
+        amax = dist._functional_collectives.all_reduce(
+            amax, "MAX", list(range(dist.get_world_size()))


Ranks + tag as the process group identifier has been deprecated. Can we pass dist.group.WORLD or dist.group.WORLD.group_name here?

If we are not landing this PR, then is it okay to leave the call as dist.all_reduce(amax, op=dist.ReduceOp.MAX) and wait for your Dynamo rewrite changes?

Oh the changes you are referring to are for rewriting functional collective. The options I mentioned above should already work :)

Let me know if it doesn't though.

yifuwang · 2024-02-27T21:07:47Z

I will hold off on landing this since @yifuwang will land the Dynamo rewrite fix soon.

Thanks @awgu! The on-going PRs are for rewriting non-functional collectives. If you are already using functional collectives, I don't think (hopefully :) ) there are any blockers

Used functional all-reduce for amax reduction

cfa64cd

[ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 16, 2024

Andrew Gu added 3 commits February 16, 2024 07:36

Update on "Used functional all-reduce for amax reduction"

ecc540c

[ghstack-poisoned]

Update on "Used functional all-reduce for amax reduction"

97c867e

pytorch/pytorch#120082 [ghstack-poisoned]

Update on "Used functional all-reduce for amax reduction"

a56be19

Dynamo cannot remap the in-place max all-reduce to its functional equivalent on PyTorch `main` branch. pytorch/pytorch#120082 [ghstack-poisoned]

awgu requested review from bdhirsh and drisspg February 16, 2024 16:06

drisspg approved these changes Feb 16, 2024

View reviewed changes

yifuwang reviewed Feb 27, 2024

View reviewed changes

awgu closed this Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Used functional all-reduce for amax reduction #219

Used functional all-reduce for amax reduction #219

Uh oh!

awgu commented Feb 16, 2024 •

edited

Loading

Uh oh!

drisspg left a comment

Uh oh!

awgu commented Feb 27, 2024

Uh oh!

yifuwang Feb 27, 2024

Uh oh!

awgu Feb 27, 2024

Uh oh!

yifuwang Feb 27, 2024

Uh oh!

yifuwang commented Feb 27, 2024

Uh oh!

Uh oh!

Used functional all-reduce for amax reduction #219

Used functional all-reduce for amax reduction #219

Uh oh!

Conversation

awgu commented Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

awgu commented Feb 27, 2024

Uh oh!

yifuwang Feb 27, 2024

Choose a reason for hiding this comment

Uh oh!

awgu Feb 27, 2024

Choose a reason for hiding this comment

Uh oh!

yifuwang Feb 27, 2024

Choose a reason for hiding this comment

Uh oh!

yifuwang commented Feb 27, 2024

Uh oh!

Uh oh!

awgu commented Feb 16, 2024 •

edited

Loading