Revert "AMDGPU: Move insertion into V2SCopies map (#130776)" #133948
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This reverts commit dea5aa7.
Reason: This commit causes a serious performance regression in Triton ( 256 vgprs and 47 spills after vs 218 vgprs before).
In SIFixSGPRCopies::analyzeVGPRToSGPRCopy the V2SCopyInfo data structure holds the analysis results collected in the loop that forms the function body. After gathering the data, we store the structure in a map. Moving this insertion to the beginning of the function—before the data is collected—results in storing an empty structure.
We decide whether to keep the V2S user chain in scalar form or convert it to vector form based on its length. Since we store empty structures, all chains have a length of 0, causing every V2S copy user chain to be converted to vector form. Consequently, this change effectively disables analysis and optimization. As a result, we use significantly more VGPRs, leading to a performance regression.