[AMDGPU][True16][CodeGen] update wwm reg sorting check condition #135053

broxigarchen · 2025-04-09T17:11:14Z

We currently just need to shift down 32bit wwm registers.

Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting

llvmbot · 2025-04-10T13:53:52Z

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

We currently just need to shift down 32bit wmm registers.

Update check condition to skip the 16bit register in wmm reg sorting

Full diff: https://github.com/llvm/llvm-project/pull/135053.diff

1 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFrameLowering.cpp (+1-1)

diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index 9c737b4f3e378..8f488f5154650 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -1650,7 +1650,7 @@ void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,
     // are of 32-bit size. SIPreAllocateWWMRegs pass can add tuples into WWM
     // reserved registers.
     const TargetRegisterClass *RC = TRI->getPhysRegBaseClass(Reg);
-    if (TRI->getRegSizeInBits(*RC) > 32)
+    if (TRI->getRegSizeInBits(*RC) != 32)
       continue;
     SortedWWMVGPRs.push_back(Reg);
   }

Sisyph

Please add a test

broxigarchen · 2025-04-10T20:29:44Z

Please add a test

I checked the wwm reg related test and it seems I could not find any thing that test for the sorting of the reserved reg. Hi @cdevadas are you comfortable with merging this as it, or is there a recommended test that I can look into? Thanks for the hint!

cdevadas · 2025-04-11T03:09:51Z

Please add a test

I checked the wwm reg related test and it seems I could not find any thing that test for the sorting of the reserved reg. Hi @cdevadas are you comfortable with merging this as it, or is there a recommended test that I can look into? Thanks for the hint!

This change you added to skip the 16-bit wwm-regs. Did you happen to encounter any errors earlier? If yes, add a test for the 16-bit register - nothing to be added for the 32-bit wwm-reg case.

arsenm · 2025-04-13T07:36:25Z

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

@@ -1650,7 +1650,7 @@ void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,
    // are of 32-bit size. SIPreAllocateWWMRegs pass can add tuples into WWM
    // reserved registers.
    const TargetRegisterClass *RC = TRI->getPhysRegBaseClass(Reg);
-    if (TRI->getRegSizeInBits(*RC) > 32)
+    if (TRI->getRegSizeInBits(*RC) != 32)


I don't know why this code needs to filter anything.

I don't understand this comment:

// SIPreAllocateWWMRegs pass can add tuples into WWM reserved registers.

It can, but why? This should only track the 32-bit element registers

Hi Matt. I am not quite familiar with this. I think maybe @cdevadas can answer this question.

If we want to change something here I think we probably should do a seperate patch. Thanks!

SIPreallocateWWMRegs pass handpicks VGPRs from the lower end. They might allocate VGPR tuples as well. However, the wwm-regalloc pass gets registers from the tail-end (we reserve them from the higher end to ensure the per-lane VGPR tuple allocation gets sufficient free contiguous registers from the initial scratch range.). This code inserted during PEI is trying to shift them down to the lowest range. Remember, the shifting is only required for those allocated during wwm-regalloc pass. Since we have the unified set WWMReservedRegs that holds all sort of wwm-regs, this loop here is trying to identify only 32-bit registers that are allocated during wwm-regalloc pass (the VGPRs used for SGPR spilling, at the moment). We may also pick the 32-bit regs custom allocated during the SIPreallcoate pass. But that's ok. This filter avoids any VGPR tuple allocated for wwm-operands during the custom allocate pass. The shift-down logic currently Inserted here considers only 32-bit regclasses.
The 16-bit registers weren't enabled earlier. So, it makes sense to change the condition to exactly match 32-bit classes.

broxigarchen · 2025-04-14T16:19:27Z

Please add a test

I checked the wwm reg related test and it seems I could not find any thing that test for the sorting of the reserved reg. Hi @cdevadas are you comfortable with merging this as it, or is there a recommended test that I can look into? Thanks for the hint!

This change you added to skip the 16-bit wwm-regs. Did you happen to encounter any errors earlier? If yes, add a test for the 16-bit register - nothing to be added for the 32-bit wwm-reg case.

Added a test. I am not quite familar with wwm reg reserve so please let me know if the test seems wrong to you. Thanks!

llvm/test/CodeGen/AMDGPU/wwm-reg-shift-down-gfx11plus.mir

cdevadas · 2025-04-22T05:17:55Z

llvm/test/CodeGen/AMDGPU/wwm-reg-shift-down-gfx11plus.mir

+    ; GCN-NEXT: $sgpr4 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
+    ; GCN-NEXT: $vgpr0_lo16 = IMPLICIT_DEF
+    ; GCN-NEXT: $vgpr0_lo16 = V_CNDMASK_B16_t16_e64 0, killed $vgpr0_hi16, 0, $vgpr0_lo16, $sgpr0, 0, implicit $exec
+    ; GCN-NEXT: $exec_lo = EXIT_STRICT_WWM killed renamable $sgpr4


wwm spill-restore missing for vgpr0_lo16 at the epilogue.

I take a further check in this part.

The vulkan benchmark failure that this patch is trying to fix is caused by an entryFunction, and thus the wwm spill/restore are not happening. It's just the wwm reg sorting that is messing up with the 16bit reg. Update the test to just targetting the reg sorting fix.

For the non entry-function case, it seems the CSR reg spill/restore is default using 32bit pseudo which need to be updated to support 16bit registers, and the spill/restore builder need to be changed as well. However, those requires more changes and I think it's better to do those in a seperate patch.

cdevadas · 2025-04-26T15:05:56Z

llvm/test/CodeGen/AMDGPU/wwm-reg-shift-down-gfx11plus.mir

+name:            wwm_reg_skip_sort_16bit
+tracksRegLiveness: true
+machineFunctionInfo:
+  isEntryFunction: true


There won't be any prolog epilog CSR spills inserted for entry functions. This test isn't relevant for this patch, especially the shifting won't happen for the entry functions.

Hi Christ. Then this sounds like a bug. The shifting is happening before the entryfunction check

Should this entryFunction check being moved to early stage at line 1646? or even earlier to 1619?

Ah my bad. The shift-back is needed to ensure the wwm-regs are finally in the lowest range. This is needed for both entry functions and device functions. But their CSR spills/restores will be ignored for entry functions.

llvm-ci · 2025-04-27T19:05:11Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu running on sie-linux-worker3 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/16853

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
...
PASS: lit :: shtest-not.py (90432 of 90442)
PASS: lit :: allow-retries.py (90433 of 90442)
PASS: lit :: discovery.py (90434 of 90442)
PASS: lit :: shtest-external-shell-kill.py (90435 of 90442)
PASS: lit :: googletest-timeout.py (90436 of 90442)
PASS: lit :: selecting.py (90437 of 90442)
PASS: lit :: shtest-timeout.py (90438 of 90442)
PASS: lit :: max-time.py (90439 of 90442)
PASS: lit :: shtest-shell.py (90440 of 90442)
PASS: lit :: shtest-define.py (90441 of 90442)
command timed out: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1845.341861

…m#135053) We currently just need to shift down 32bit wwm registers. Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting

broxigarchen changed the title ~~skip 16bit register for wmm reg sorting~~ [AMDGPU][True16][CodeGen] skip 16bit register for wmm reg sorting Apr 10, 2025

broxigarchen changed the title ~~[AMDGPU][True16][CodeGen] skip 16bit register for wmm reg sorting~~ [AMDGPU][True16][CodeGen] update wmm reg sorting check condition Apr 10, 2025

broxigarchen force-pushed the main-merge-true16-fix-siframe branch 2 times, most recently from f7e89d4 to 53e1dd4 Compare April 10, 2025 13:53

broxigarchen marked this pull request as ready for review April 10, 2025 13:53

broxigarchen requested review from arsenm, cdevadas and Sisyph April 10, 2025 13:53

llvmbot added the backend:AMDGPU label Apr 10, 2025

broxigarchen requested a review from kosarev April 10, 2025 13:53

broxigarchen changed the title ~~[AMDGPU][True16][CodeGen] update wmm reg sorting check condition~~ [AMDGPU][True16][CodeGen] update wwm reg sorting check condition Apr 10, 2025

cdevadas approved these changes Apr 10, 2025

View reviewed changes

Sisyph reviewed Apr 10, 2025

View reviewed changes

arsenm reviewed Apr 13, 2025

View reviewed changes

broxigarchen requested a review from Sisyph April 14, 2025 16:21

cdevadas reviewed Apr 16, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/wwm-reg-shift-down-gfx11plus.mir Outdated Show resolved Hide resolved

broxigarchen force-pushed the main-merge-true16-fix-siframe branch from f21fd67 to ed6bdc3 Compare April 18, 2025 20:48

cdevadas reviewed Apr 21, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/wwm-reg-shift-down-gfx11plus.mir Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/wwm-reg-shift-down-gfx11plus.mir Outdated Show resolved Hide resolved

broxigarchen force-pushed the main-merge-true16-fix-siframe branch from ed6bdc3 to c9c76d5 Compare April 21, 2025 15:00

cdevadas reviewed Apr 22, 2025

View reviewed changes

broxigarchen added 2 commits April 23, 2025 13:42

skip 16bit register for wmm reg sorting

3aeb9e4

test

6688c60

broxigarchen force-pushed the main-merge-true16-fix-siframe branch from c9c76d5 to e2597cb Compare April 25, 2025 02:08

update test

25ac05c

broxigarchen force-pushed the main-merge-true16-fix-siframe branch from e2597cb to 25ac05c Compare April 25, 2025 02:08

cdevadas reviewed Apr 26, 2025

View reviewed changes

cdevadas approved these changes Apr 27, 2025

View reviewed changes

broxigarchen merged commit 72bc052 into llvm:main Apr 27, 2025
11 checks passed

[AMDGPU][True16][CodeGen] update wwm reg sorting check condition #135053

[AMDGPU][True16][CodeGen] update wwm reg sorting check condition #135053

Uh oh!

Conversation

broxigarchen commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 10, 2025

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

broxigarchen commented Apr 10, 2025

Uh oh!

cdevadas commented Apr 11, 2025

Uh oh!

arsenm Apr 13, 2025

Choose a reason for hiding this comment

Uh oh!

broxigarchen Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdevadas Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

broxigarchen commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cdevadas Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

broxigarchen Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdevadas Apr 26, 2025

Choose a reason for hiding this comment

Uh oh!

broxigarchen Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdevadas Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Apr 27, 2025

Uh oh!

Uh oh!

broxigarchen commented Apr 9, 2025 •

edited

Loading

broxigarchen Apr 14, 2025 •

edited

Loading

broxigarchen commented Apr 14, 2025 •

edited

Loading

broxigarchen Apr 25, 2025 •

edited

Loading

broxigarchen Apr 26, 2025 •

edited

Loading