Skip to content

Revert "AMDGPU: Move insertion into V2SCopies map (#130776)" #133948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alex-t
Copy link
Contributor

@alex-t alex-t commented Apr 1, 2025

This reverts commit dea5aa7.
Reason: This commit causes a serious performance regression in Triton ( 256 vgprs and 47 spills after vs 218 vgprs before).
In SIFixSGPRCopies::analyzeVGPRToSGPRCopy the V2SCopyInfo data structure holds the analysis results collected in the loop that forms the function body. After gathering the data, we store the structure in a map. Moving this insertion to the beginning of the function—before the data is collected—results in storing an empty structure.
We decide whether to keep the V2S user chain in scalar form or convert it to vector form based on its length. Since we store empty structures, all chains have a length of 0, causing every V2S copy user chain to be converted to vector form. Consequently, this change effectively disables analysis and optimization. As a result, we use significantly more VGPRs, leading to a performance regression.

@alex-t alex-t requested a review from arsenm April 1, 2025 17:36
Copy link

github-actions bot commented Apr 1, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
index 4342e7a36..3fdc436fa 100644
--- a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
@@ -927,7 +927,7 @@ void SIFixSGPRCopies::analyzeVGPRToSGPRCopy(MachineInstr* MI) {
   const TargetRegisterClass *DstRC = MRI->getRegClass(DstReg);
 
   V2SCopyInfo Info(getNextVGPRToSGPRCopyId(), MI,
-                      TRI->getRegSizeInBits(*DstRC));
+                   TRI->getRegSizeInBits(*DstRC));
   SmallVector<MachineInstr *, 8> AnalysisWorklist;
   // Needed because the SSA is not a tree but a graph and may have
   // forks and joins. We should not then go same way twice.

@kzhuravl
Copy link
Contributor

kzhuravl commented Apr 1, 2025

It is worth noting that @alex-t will investigate the issue and reapply the PR #130776 with the issue is fixed.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance regression is not an overriding concern for assertions. This fixed assertion failures

@piotrAMD
Copy link
Collaborator

piotrAMD commented Apr 7, 2025

We are also seeing ~5% regression in one graphics benchmark due to dea5aa7 with similar symptoms (more VALUs, more vgprs used).

@piotrAMD
Copy link
Collaborator

piotrAMD commented Apr 8, 2025

Is there a better fix available short-term than the revert? We would like to avoid having a stack of local reverts in the internal branch.

Would reverting #129464 on top of this revert make more sense or there are too many dependencies?

@arsenm
Copy link
Contributor

arsenm commented Apr 8, 2025

Is there a better fix available short-term than the revert?

#134153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants