-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU][True16][CodeGen] update wwm reg sorting check condition #135053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU][True16][CodeGen] update wwm reg sorting check condition #135053
Conversation
f7e89d4
to
53e1dd4
Compare
@llvm/pr-subscribers-backend-amdgpu Author: Brox Chen (broxigarchen) ChangesWe currently just need to shift down 32bit wmm registers. Update check condition to skip the 16bit register in wmm reg sorting Full diff: https://github.com/llvm/llvm-project/pull/135053.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index 9c737b4f3e378..8f488f5154650 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -1650,7 +1650,7 @@ void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,
// are of 32-bit size. SIPreAllocateWWMRegs pass can add tuples into WWM
// reserved registers.
const TargetRegisterClass *RC = TRI->getPhysRegBaseClass(Reg);
- if (TRI->getRegSizeInBits(*RC) > 32)
+ if (TRI->getRegSizeInBits(*RC) != 32)
continue;
SortedWWMVGPRs.push_back(Reg);
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test
I checked the wwm reg related test and it seems I could not find any thing that test for the sorting of the reserved reg. Hi @cdevadas are you comfortable with merging this as it, or is there a recommended test that I can look into? Thanks for the hint! |
This change you added to skip the 16-bit wwm-regs. Did you happen to encounter any errors earlier? If yes, add a test for the 16-bit register - nothing to be added for the 32-bit wwm-reg case. |
@@ -1650,7 +1650,7 @@ void SIFrameLowering::determineCalleeSaves(MachineFunction &MF, | |||
// are of 32-bit size. SIPreAllocateWWMRegs pass can add tuples into WWM | |||
// reserved registers. | |||
const TargetRegisterClass *RC = TRI->getPhysRegBaseClass(Reg); | |||
if (TRI->getRegSizeInBits(*RC) > 32) | |||
if (TRI->getRegSizeInBits(*RC) != 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this code needs to filter anything.
I don't understand this comment:
// SIPreAllocateWWMRegs pass can add tuples into WWM reserved registers.
It can, but why? This should only track the 32-bit element registers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Matt. I am not quite familiar with this. I think maybe @cdevadas can answer this question.
If we want to change something here I think we probably should do a seperate patch. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIPreallocateWWMRegs
pass handpicks VGPRs from the lower end. They might allocate VGPR tuples as well. However, the wwm-regalloc pass gets registers from the tail-end (we reserve them from the higher end to ensure the per-lane VGPR tuple allocation gets sufficient free contiguous registers from the initial scratch range.). This code inserted during PEI is trying to shift them down to the lowest range. Remember, the shifting is only required for those allocated during wwm-regalloc pass. Since we have the unified set WWMReservedRegs
that holds all sort of wwm-regs, this loop here is trying to identify only 32-bit registers that are allocated during wwm-regalloc pass (the VGPRs used for SGPR spilling, at the moment). We may also pick the 32-bit regs custom allocated during the SIPreallcoate pass. But that's ok. This filter avoids any VGPR tuple allocated for wwm-operands during the custom allocate pass. The shift-down logic currently Inserted here considers only 32-bit regclasses.
The 16-bit registers weren't enabled earlier. So, it makes sense to change the condition to exactly match 32-bit classes.
Added a test. I am not quite familar with wwm reg reserve so please let me know if the test seems wrong to you. Thanks! |
f21fd67
to
ed6bdc3
Compare
ed6bdc3
to
c9c76d5
Compare
; GCN-NEXT: $sgpr4 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec | ||
; GCN-NEXT: $vgpr0_lo16 = IMPLICIT_DEF | ||
; GCN-NEXT: $vgpr0_lo16 = V_CNDMASK_B16_t16_e64 0, killed $vgpr0_hi16, 0, $vgpr0_lo16, $sgpr0, 0, implicit $exec | ||
; GCN-NEXT: $exec_lo = EXIT_STRICT_WWM killed renamable $sgpr4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wwm spill-restore missing for vgpr0_lo16 at the epilogue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take a further check in this part.
The vulkan benchmark failure that this patch is trying to fix is caused by an entryFunction, and thus the wwm spill/restore are not happening. It's just the wwm reg sorting that is messing up with the 16bit reg. Update the test to just targetting the reg sorting fix.
For the non entry-function case, it seems the CSR reg spill/restore is default using 32bit pseudo which need to be updated to support 16bit registers, and the spill/restore builder need to be changed as well. However, those requires more changes and I think it's better to do those in a seperate patch.
c9c76d5
to
e2597cb
Compare
e2597cb
to
25ac05c
Compare
name: wwm_reg_skip_sort_16bit | ||
tracksRegLiveness: true | ||
machineFunctionInfo: | ||
isEntryFunction: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There won't be any prolog epilog CSR spills inserted for entry functions. This test isn't relevant for this patch, especially the shifting won't happen for the entry functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Christ. Then this sounds like a bug. The shifting is happening before the entryfunction check
Should this entryFunction check being moved to early stage at line 1646? or even earlier to 1619?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad. The shift-back is needed to ensure the wwm-regs are finally in the lowest range. This is needed for both entry functions and device functions. But their CSR spills/restores will be ignored for entry functions.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/16853 Here is the relevant piece of the build log for the reference
|
…m#135053) We currently just need to shift down 32bit wwm registers. Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting
…m#135053) We currently just need to shift down 32bit wwm registers. Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting
…m#135053) We currently just need to shift down 32bit wwm registers. Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting
…m#135053) We currently just need to shift down 32bit wwm registers. Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting
…m#135053) We currently just need to shift down 32bit wwm registers. Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting
We currently just need to shift down 32bit wwm registers.
Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting