Skip to content
This repository was archived by the owner on Aug 7, 2024. It is now read-only.

Used functional all-reduce for amax reduction #219

Closed
wants to merge 4 commits into from

Conversation

awgu
Copy link

@awgu awgu commented Feb 16, 2024

Andrew Gu added 3 commits February 16, 2024 07:36
Dynamo cannot remap the in-place max all-reduce to its functional equivalent on PyTorch `main` branch.
pytorch/pytorch#120082

[ghstack-poisoned]
@awgu awgu requested review from bdhirsh and drisspg February 16, 2024 16:06
Copy link
Contributor

@drisspg drisspg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@awgu
Copy link
Author

awgu commented Feb 27, 2024

I will hold off on landing this since @yifuwang will land the Dynamo rewrite fix soon.

# https://github.com/pytorch/pytorch/issues/120082
# Use functional all-reduce to avoid graph breaking.
amax = dist._functional_collectives.all_reduce(
amax, "MAX", list(range(dist.get_world_size()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ranks + tag as the process group identifier has been deprecated. Can we pass dist.group.WORLD or dist.group.WORLD.group_name here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not landing this PR, then is it okay to leave the call as dist.all_reduce(amax, op=dist.ReduceOp.MAX) and wait for your Dynamo rewrite changes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the changes you are referring to are for rewriting functional collective. The options I mentioned above should already work :)

Let me know if it doesn't though.

@yifuwang
Copy link

I will hold off on landing this since @yifuwang will land the Dynamo rewrite fix soon.

Thanks @awgu! The on-going PRs are for rewriting non-functional collectives. If you are already using functional collectives, I don't think (hopefully :) ) there are any blockers

@awgu awgu closed this Apr 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants