Skip to content

[AMDGPU][True16][CodeGen] update wwm reg sorting check condition #135053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 27, 2025

Conversation

broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Apr 9, 2025

We currently just need to shift down 32bit wwm registers.

Previous check condition mistakenly select 16bit registers in true16 mode. Update check condition to skip the 16bit register in wmm reg sorting

@broxigarchen broxigarchen changed the title skip 16bit register for wmm reg sorting [AMDGPU][True16][CodeGen] skip 16bit register for wmm reg sorting Apr 10, 2025
@broxigarchen broxigarchen changed the title [AMDGPU][True16][CodeGen] skip 16bit register for wmm reg sorting [AMDGPU][True16][CodeGen] update wmm reg sorting check condition Apr 10, 2025
@broxigarchen broxigarchen force-pushed the main-merge-true16-fix-siframe branch 2 times, most recently from f7e89d4 to 53e1dd4 Compare April 10, 2025 13:53
@broxigarchen broxigarchen marked this pull request as ready for review April 10, 2025 13:53
@broxigarchen broxigarchen requested a review from kosarev April 10, 2025 13:53
@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

We currently just need to shift down 32bit wmm registers.

Update check condition to skip the 16bit register in wmm reg sorting


Full diff: https://github.com/llvm/llvm-project/pull/135053.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFrameLowering.cpp (+1-1)
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index 9c737b4f3e378..8f488f5154650 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -1650,7 +1650,7 @@ void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,
     // are of 32-bit size. SIPreAllocateWWMRegs pass can add tuples into WWM
     // reserved registers.
     const TargetRegisterClass *RC = TRI->getPhysRegBaseClass(Reg);
-    if (TRI->getRegSizeInBits(*RC) > 32)
+    if (TRI->getRegSizeInBits(*RC) != 32)
       continue;
     SortedWWMVGPRs.push_back(Reg);
   }

@broxigarchen broxigarchen changed the title [AMDGPU][True16][CodeGen] update wmm reg sorting check condition [AMDGPU][True16][CodeGen] update wwm reg sorting check condition Apr 10, 2025
Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test

@broxigarchen
Copy link
Contributor Author

Please add a test

I checked the wwm reg related test and it seems I could not find any thing that test for the sorting of the reserved reg. Hi @cdevadas are you comfortable with merging this as it, or is there a recommended test that I can look into? Thanks for the hint!

@cdevadas
Copy link
Collaborator

Please add a test

I checked the wwm reg related test and it seems I could not find any thing that test for the sorting of the reserved reg. Hi @cdevadas are you comfortable with merging this as it, or is there a recommended test that I can look into? Thanks for the hint!

This change you added to skip the 16-bit wwm-regs. Did you happen to encounter any errors earlier? If yes, add a test for the 16-bit register - nothing to be added for the 32-bit wwm-reg case.

@@ -1650,7 +1650,7 @@ void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,
// are of 32-bit size. SIPreAllocateWWMRegs pass can add tuples into WWM
// reserved registers.
const TargetRegisterClass *RC = TRI->getPhysRegBaseClass(Reg);
if (TRI->getRegSizeInBits(*RC) > 32)
if (TRI->getRegSizeInBits(*RC) != 32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this code needs to filter anything.

I don't understand this comment:

// SIPreAllocateWWMRegs pass can add tuples into WWM reserved registers.

It can, but why? This should only track the 32-bit element registers

Copy link
Contributor Author

@broxigarchen broxigarchen Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matt. I am not quite familiar with this. I think maybe @cdevadas can answer this question.

If we want to change something here I think we probably should do a seperate patch. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIPreallocateWWMRegs pass handpicks VGPRs from the lower end. They might allocate VGPR tuples as well. However, the wwm-regalloc pass gets registers from the tail-end (we reserve them from the higher end to ensure the per-lane VGPR tuple allocation gets sufficient free contiguous registers from the initial scratch range.). This code inserted during PEI is trying to shift them down to the lowest range. Remember, the shifting is only required for those allocated during wwm-regalloc pass. Since we have the unified set WWMReservedRegs that holds all sort of wwm-regs, this loop here is trying to identify only 32-bit registers that are allocated during wwm-regalloc pass (the VGPRs used for SGPR spilling, at the moment). We may also pick the 32-bit regs custom allocated during the SIPreallcoate pass. But that's ok. This filter avoids any VGPR tuple allocated for wwm-operands during the custom allocate pass. The shift-down logic currently Inserted here considers only 32-bit regclasses.
The 16-bit registers weren't enabled earlier. So, it makes sense to change the condition to exactly match 32-bit classes.

@broxigarchen
Copy link
Contributor Author

broxigarchen commented Apr 14, 2025

Please add a test

I checked the wwm reg related test and it seems I could not find any thing that test for the sorting of the reserved reg. Hi @cdevadas are you comfortable with merging this as it, or is there a recommended test that I can look into? Thanks for the hint!

This change you added to skip the 16-bit wwm-regs. Did you happen to encounter any errors earlier? If yes, add a test for the 16-bit register - nothing to be added for the 32-bit wwm-reg case.

Added a test. I am not quite familar with wwm reg reserve so please let me know if the test seems wrong to you. Thanks!

@broxigarchen broxigarchen requested a review from Sisyph April 14, 2025 16:21
@broxigarchen broxigarchen force-pushed the main-merge-true16-fix-siframe branch from f21fd67 to ed6bdc3 Compare April 18, 2025 20:48
@broxigarchen broxigarchen force-pushed the main-merge-true16-fix-siframe branch from ed6bdc3 to c9c76d5 Compare April 21, 2025 15:00
; GCN-NEXT: $sgpr4 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
; GCN-NEXT: $vgpr0_lo16 = IMPLICIT_DEF
; GCN-NEXT: $vgpr0_lo16 = V_CNDMASK_B16_t16_e64 0, killed $vgpr0_hi16, 0, $vgpr0_lo16, $sgpr0, 0, implicit $exec
; GCN-NEXT: $exec_lo = EXIT_STRICT_WWM killed renamable $sgpr4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wwm spill-restore missing for vgpr0_lo16 at the epilogue.

Copy link
Contributor Author

@broxigarchen broxigarchen Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take a further check in this part.

The vulkan benchmark failure that this patch is trying to fix is caused by an entryFunction, and thus the wwm spill/restore are not happening. It's just the wwm reg sorting that is messing up with the 16bit reg. Update the test to just targetting the reg sorting fix.

For the non entry-function case, it seems the CSR reg spill/restore is default using 32bit pseudo which need to be updated to support 16bit registers, and the spill/restore builder need to be changed as well. However, those requires more changes and I think it's better to do those in a seperate patch.

@broxigarchen broxigarchen force-pushed the main-merge-true16-fix-siframe branch from c9c76d5 to e2597cb Compare April 25, 2025 02:08
@broxigarchen broxigarchen force-pushed the main-merge-true16-fix-siframe branch from e2597cb to 25ac05c Compare April 25, 2025 02:08
name: wwm_reg_skip_sort_16bit
tracksRegLiveness: true
machineFunctionInfo:
isEntryFunction: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There won't be any prolog epilog CSR spills inserted for entry functions. This test isn't relevant for this patch, especially the shifting won't happen for the entry functions.

Copy link
Contributor Author

@broxigarchen broxigarchen Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Christ. Then this sounds like a bug. The shifting is happening before the entryfunction check

Should this entryFunction check being moved to early stage at line 1646? or even earlier to 1619?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah my bad. The shift-back is needed to ensure the wwm-regs are finally in the lowest range. This is needed for both entry functions and device functions. But their CSR spills/restores will be ignored for entry functions.

@broxigarchen broxigarchen merged commit 72bc052 into llvm:main Apr 27, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 27, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu running on sie-linux-worker3 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/16853

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
...
PASS: lit :: shtest-not.py (90432 of 90442)
PASS: lit :: allow-retries.py (90433 of 90442)
PASS: lit :: discovery.py (90434 of 90442)
PASS: lit :: shtest-external-shell-kill.py (90435 of 90442)
PASS: lit :: googletest-timeout.py (90436 of 90442)
PASS: lit :: selecting.py (90437 of 90442)
PASS: lit :: shtest-timeout.py (90438 of 90442)
PASS: lit :: max-time.py (90439 of 90442)
PASS: lit :: shtest-shell.py (90440 of 90442)
PASS: lit :: shtest-define.py (90441 of 90442)
command timed out: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1845.341861

jyli0116 pushed a commit to jyli0116/llvm-project that referenced this pull request Apr 28, 2025
…m#135053)

We currently just need to shift down 32bit wwm registers. 

Previous check condition mistakenly select 16bit registers in true16
mode. Update check condition to skip the 16bit register in wmm reg
sorting
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…m#135053)

We currently just need to shift down 32bit wwm registers. 

Previous check condition mistakenly select 16bit registers in true16
mode. Update check condition to skip the 16bit register in wmm reg
sorting
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…m#135053)

We currently just need to shift down 32bit wwm registers. 

Previous check condition mistakenly select 16bit registers in true16
mode. Update check condition to skip the 16bit register in wmm reg
sorting
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…m#135053)

We currently just need to shift down 32bit wwm registers. 

Previous check condition mistakenly select 16bit registers in true16
mode. Update check condition to skip the 16bit register in wmm reg
sorting
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
…m#135053)

We currently just need to shift down 32bit wwm registers. 

Previous check condition mistakenly select 16bit registers in true16
mode. Update check condition to skip the 16bit register in wmm reg
sorting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants