[AMDGPU] Include unused preload kernarg in KD total SGPR count #104743

kerbowa · 2024-08-19T08:08:10Z

Unlike with implicitly preloaded data UserSGPRs firmware is unable to handle cases where SGPRs for kernel arguments contain preloaded data but not are not explicitly referenced in the kernel. We need to include these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation to not clobber SGPRs in adjacent waves.

llvmbot · 2024-08-19T08:08:44Z

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-amdgpu

Author: Austin Kerbow (kerbowa)

Changes

Unlike with implicitly preloaded data UserSGPRs firmware is unable to handle cases where SGPRs for kernel arguments contain prelaoded data but not are not explicitly referenced in the kernel. We need to include these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation to not clobber SGPRs in adjacent waves.

Full diff: https://github.com/llvm/llvm-project/pull/104743.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+9-2)
(added) llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll (+14)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b90d245b7bd394..cfa5216c8c54b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -970,8 +970,15 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
     return SubGPR;
   };
 
-  ProgInfo.SGPRBlocks = GetNumGPRBlocks(ProgInfo.NumSGPRsForWavesPerEU,
-                                        IsaInfo::getSGPREncodingGranule(&STM));
+  // Consider cases where the total number of UserSGPRs plus extra SGPRs is
+  // greater than the number of explicitly referenced SGPRs.
+  const MCExpr *MaxUserSGPRs = MCBinaryExpr::createAdd(
+      CreateExpr(MFI->getNumUserSGPRs()), ExtraSGPRs, Ctx);
+
+  ProgInfo.SGPRBlocks =
+      GetNumGPRBlocks(AMDGPUMCExpr::createMax(
+                          {ProgInfo.NumSGPRsForWavesPerEU, MaxUserSGPRs}, Ctx),
+                      IsaInfo::getSGPREncodingGranule(&STM));
   ProgInfo.VGPRBlocks = GetNumGPRBlocks(ProgInfo.NumVGPRsForWavesPerEU,
                                         IsaInfo::getVGPREncodingGranule(&STM));
 
diff --git a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
new file mode 100644
index 00000000000000..34bef81171e812
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
@@ -0,0 +1,14 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s > %t
+; RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s
+
+; OBJDUMP: Contents of section .rodata:
+; OBJDUMP-NEXT: 0000 00000000 00000000 10010000 00000000
+; OBJDUMP-NEXT: 0010 00000000 00000000 00000000 00000000
+; OBJDUMP-NEXT: 0020 00000000 00000000 00000000 00000000
+; OBJDUMP-NEXT: 0030 4000af00 94130000 1a000400 00000000
+; OBJDUMP-NOT: 0030 0000af00 94130000 1a000400 00000000
+
+; Include preloaded SGPRs that are not explicitly used in the kernel in
+; GRANULATED_WAVEFRONT_SGPR_COUNT.
+
+define amdgpu_kernel void @amdhsa_kernarg_preload_num_sgprs(i128 inreg) { ret void }

arsenm · 2024-08-19T08:19:10Z

llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll

@@ -0,0 +1,14 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s > %t
+; RUN: llvm-objdump -s -j .rodata %t | FileCheck --check-prefix=OBJDUMP %s


Don't need this temporary file

arsenm · 2024-08-19T08:19:38Z

llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll

+; OBJDUMP-NEXT: 0030 4000af00 94130000 1a000400 00000000
+; OBJDUMP-NOT: 0030 0000af00 94130000 1a000400 00000000


We need a human readable asm output for reference, and should check the different kind of SGPR usage numbers

We need a human readable asm output for reference, and should check the different kind of SGPR usage numbers

The annoying part about this bug is we don't directly output a directive for this field anywhere, it's totally derivative. To find the problem I was modifying the KD in the binary directly.

So we cannot go from .ll -> .s -> .o? (i.e., will AMDGPUAsmParser's calculation of SGPRBlocks still be correct?)

I think we should be changing the reported used register count, not just manipulating the encoding fields

arsenm · 2024-08-19T08:20:50Z

llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll

+; Include preloaded SGPRs that are not explicitly used in the kernel in
+; GRANULATED_WAVEFRONT_SGPR_COUNT.
+
+define amdgpu_kernel void @amdhsa_kernarg_preload_num_sgprs(i128 inreg) { ret void }


Test cases with more interaction with ordinary user SGPRs?

arsenm · 2024-08-19T15:30:41Z

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

+      CreateExpr(MFI->getNumUserSGPRs()), ExtraSGPRs, Ctx);
+
+  ProgInfo.SGPRBlocks =
+      GetNumGPRBlocks(AMDGPUMCExpr::createMax(


I think fixing this up here is too late. We should have bumped up the SGPR count in the MFI tracked value to begin with. We have a similar round up for the unused inreg shader arguments, and this is essentially the same thing

github-actions · 2024-09-16T00:13:58Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm · 2024-09-18T14:18:18Z

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

+    const MCExpr *MaxUserSGPRs = MCBinaryExpr::createAdd(
+        CreateExpr(MFI->getNumUserSGPRs()), ExtraSGPRs, Ctx);


I would expect this to be already added to MFI->getNumUserSGPRs. I.e. handle this during the calling convention lowering, not code emission

ExtraSGPRs here doesn't refer to anything from my change. It is the extra SGPRs the HW reserves for architected flat scratch, XNACK, and debugger. I think the name MaxUserSGPRs is maybe a misnomer and confusing though. It's really calculating UserSGPRs including preloads+extra reserved by the HW.

These "extra" SGPRs are not UserSGPRs in the way the HW uses or sets them up, or in the way they are encoded in the KD. So they should not be included in our backends' tracking of them. Preload kernarg SGPRs are already included in MFI->getNumUserSGPRs.

The reason we add the ExtraSGPRs is because they are also added to ProgInfo.NumSGPR above so we need to consider them in the Max expr on the next line here. ExtraSGPRs do need to be included in the KD tally for total SGPR granules.

Unlike with implicitly preloaded data UserSGPRs firmware is unable to handle cases where SGPRs for kernel arguments contain prelaoded data but not are not explicitly referenced in the kernel. We need to include these preloaded SGPRs in the GRANULATED_WAVEFRONT_SGPR_COUNT calculation to not clobber SGPRs in adjacent waves.

This updates LLVM to pull in two fixes we need for AMD: * llvm/llvm-project#110553 * llvm/llvm-project#104743 Fixed `LLVM::CallOp` and `LLVM::CallIntrinsicOp` builder API after * llvm/llvm-project#108933

This reverts ad9afc8 since the issue was fixed by llvm/llvm-project#104743

This updates LLVM to pull in two fixes we need for AMD: * llvm/llvm-project#110553 * llvm/llvm-project#104743 Fixed `LLVM::CallOp` and `LLVM::CallIntrinsicOp` builder API after * llvm/llvm-project#108933

This reverts ad9afc8 since the issue was fixed by llvm/llvm-project#104743

This updates LLVM to pull in two fixes we need for AMD: * llvm/llvm-project#110553 * llvm/llvm-project#104743 Fixed `LLVM::CallOp` and `LLVM::CallIntrinsicOp` builder API after * llvm/llvm-project#108933

This reverts ad9afc8 since the issue was fixed by llvm/llvm-project#104743

kerbowa requested review from arsenm and JanekvO August 19, 2024 08:08

llvmbot added the backend:AMDGPU label Aug 19, 2024

arsenm reviewed Aug 19, 2024

View reviewed changes

kerbowa force-pushed the include-unused-preload-kernarg-in-KD branch from af950b6 to 3f5b993 Compare September 16, 2024 00:10

llvmbot added the mc Machine (object) code label Sep 16, 2024

arsenm reviewed Sep 18, 2024

View reviewed changes

kerbowa force-pushed the include-unused-preload-kernarg-in-KD branch from 3f5b993 to 2e85e2a Compare September 23, 2024 17:05

arsenm approved these changes Sep 23, 2024

View reviewed changes

kerbowa force-pushed the include-unused-preload-kernarg-in-KD branch from 2e85e2a to 1bd17ee Compare September 23, 2024 20:46

kerbowa merged commit 954ab83 into llvm:main Sep 23, 2024
5 of 6 checks passed

kerbowa deleted the include-unused-preload-kernarg-in-KD branch September 24, 2024 15:32

zhanglx13 mentioned this pull request Sep 30, 2024

[AMD] Add back "Hint compiler to preload kernel args" triton-lang/triton#4830

Merged

antiagainst mentioned this pull request Oct 3, 2024

Update llvm/llvm-project@61f8a7f61890 triton-lang/triton#4847

Merged

zhanglx13 added a commit to triton-lang/triton that referenced this pull request Oct 5, 2024

[AMD] Add back "Hint compiler to preload kernel args" (#4830)

2c498ee

This reverts ad9afc8 since the issue was fixed by llvm/llvm-project#104743

sfzhu93 pushed a commit to sfzhu93/triton that referenced this pull request Oct 11, 2024

[AMD] Add back "Hint compiler to preload kernel args" (triton-lang#4830)

697283b

This reverts ad9afc8 since the issue was fixed by llvm/llvm-project#104743

Luosuu pushed a commit to Luosuu/triton that referenced this pull request Nov 13, 2024

[AMD] Add back "Hint compiler to preload kernel args" (triton-lang#4830)

c3cbc3f

This reverts ad9afc8 since the issue was fixed by llvm/llvm-project#104743

bertmaher pushed a commit to bertmaher/triton that referenced this pull request Dec 10, 2024

[AMD] Add back "Hint compiler to preload kernel args" (triton-lang#4830)

9122cd5

This reverts ad9afc8 since the issue was fixed by llvm/llvm-project#104743

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Include unused preload kernarg in KD total SGPR count #104743

[AMDGPU] Include unused preload kernarg in KD total SGPR count #104743

Uh oh!

kerbowa commented Aug 19, 2024 •

edited by arsenm

Loading

Uh oh!

llvmbot commented Aug 19, 2024 •

edited

Loading

Uh oh!

arsenm Aug 19, 2024

Uh oh!

arsenm Aug 19, 2024

Uh oh!

kerbowa Aug 19, 2024

Uh oh!

JanekvO Aug 19, 2024

Uh oh!

arsenm Aug 19, 2024

Uh oh!

arsenm Aug 19, 2024

Uh oh!

arsenm Aug 19, 2024

Uh oh!

github-actions bot commented Sep 16, 2024 •

edited

Loading

Uh oh!

arsenm Sep 18, 2024

Uh oh!

kerbowa Sep 18, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

		@@ -0,0 +1,14 @@
		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -filetype=obj < %s > %t
		; RUN: llvm-objdump -s -j .rodata %t \| FileCheck --check-prefix=OBJDUMP %s

		; OBJDUMP-NEXT: 0030 4000af00 94130000 1a000400 00000000
		; OBJDUMP-NOT: 0030 0000af00 94130000 1a000400 00000000

		const MCExpr *MaxUserSGPRs = MCBinaryExpr::createAdd(
		CreateExpr(MFI->getNumUserSGPRs()), ExtraSGPRs, Ctx);

[AMDGPU] Include unused preload kernarg in KD total SGPR count #104743

[AMDGPU] Include unused preload kernarg in KD total SGPR count #104743

Uh oh!

Conversation

kerbowa commented Aug 19, 2024 • edited by arsenm Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kerbowa Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kerbowa commented Aug 19, 2024 •

edited by arsenm

Loading

llvmbot commented Aug 19, 2024 •

edited

Loading

github-actions bot commented Sep 16, 2024 •

edited

Loading

kerbowa Sep 18, 2024 •

edited

Loading