[AMDGPU] Ignore inactive VGPRs in .vgpr_count #144855

rovka · 2025-06-19T08:52:27Z

When using the amdgcn.init.whole.wave intrinsic, we add dummy VGPR arguments with the purpose of preserving their inactive lanes. The pattern may look something like this:

entry:
  call amdgcn.init.whole.wave
  branch to shader or tail

shader:
  $vInactive = IMPLICIT_DEF ; Tells regalloc it's safe to use the active lanes
  actual code...

tail:
  call amdgcn.cs.chain [...], implicit $vInactive

We should not report these VGPRs in the .vgpr_count metadata. This patch achieves that goal by ignoring IMPLICIT_DEFs and SI_TCRETURNs in functions that use the amdgcn.init.whole.wave intrinsic.

It also simplifies the code in AMDGPUResourceUsageAnalysis to rely more on the TargetRegisterInfo for computing the number of used registers in the simple cases.

This is a reworked version of #133242, which was reverted in #144039.

When using the amdgcn.init.whole.wave intrinsic, we add dummy VGPR arguments with the purpose of preserving their inactive lanes. The pattern may look something like this: ``` entry: call amdgcn.init.whole.wave branch to shader or tail shader: $vInactive = IMPLICIT_DEF ; Tells regalloc it's safe to use the active lanes actual code... tail: call amdgcn.cs.chain [...], implicit $vInactive ``` We should not report these VGPRs in the .vgpr_count metadata. This patch achieves that goal by ignoring IMPLICIT_DEFs and SI_TCRETURNs in functions that use the amdgcn.init.whole.wave intrinsic. It also simplifies the code in AMDGPUResourceUsageAnalysis to rely more on the TargetRegisterInfo for computing the number of used registers in the simple cases. This is a reworked version of llvm#133242, which was reverted in llvm#144039.

llvmbot · 2025-06-19T08:53:02Z

@llvm/pr-subscribers-backend-amdgpu

Author: Diana Picus (rovka)

Changes

When using the amdgcn.init.whole.wave intrinsic, we add dummy VGPR arguments with the purpose of preserving their inactive lanes. The pattern may look something like this:

entry:
  call amdgcn.init.whole.wave
  branch to shader or tail

shader:
  $vInactive = IMPLICIT_DEF ; Tells regalloc it's safe to use the active lanes
  actual code...

tail:
  call amdgcn.cs.chain [...], implicit $vInactive

We should not report these VGPRs in the .vgpr_count metadata. This patch achieves that goal by ignoring IMPLICIT_DEFs and SI_TCRETURNs in functions that use the amdgcn.init.whole.wave intrinsic.

It also simplifies the code in AMDGPUResourceUsageAnalysis to rely more on the TargetRegisterInfo for computing the number of used registers in the simple cases.

This is a reworked version of #133242, which was reverted in #144039.

Patch is 39.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144855.diff

10 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+7-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp (+44-255)
(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+4-4)
(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.h (+4-2)
(added) llvm/test/CodeGen/AMDGPU/init-whole-wave-vgpr-count-large.ll (+76)
(added) llvm/test/CodeGen/AMDGPU/init-whole-wave-vgpr-count-leaf.ll (+50)
(added) llvm/test/CodeGen/AMDGPU/init-whole-wave-vgpr-count-use-inactive.ll (+78)
(added) llvm/test/CodeGen/AMDGPU/init-whole-wave-vgpr-count.ll (+75)
(added) llvm/test/CodeGen/AMDGPU/vgpr-count-compute.ll (+30)
(added) llvm/test/CodeGen/AMDGPU/vgpr-count-graphics.ll (+35)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 84b0f98554097..4f58e5f03d4ee 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -991,7 +991,13 @@ void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
   // dispatch registers are function args.
   unsigned WaveDispatchNumSGPR = 0, WaveDispatchNumVGPR = 0;
 
-  if (isShader(F.getCallingConv())) {
+  // Entry functions need to count input arguments even if they're not used
+  // (i.e. not reported by AMDGPUResourceUsageAnalysis). Other functions can
+  // skip including them. This is especially important for shaders that use the
+  // init.whole.wave intrinsic, since they sometimes have VGPR arguments that
+  // are only added for the purpose of preserving their inactive lanes and
+  // should not be included in the vgpr-count.
+  if (isShader(F.getCallingConv()) && isEntryFunctionCC(F.getCallingConv())) {
     bool IsPixelShader =
         F.getCallingConv() == CallingConv::AMDGPU_PS && !STM.isAmdHsaOS();
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
index 9a609a1752de0..91836f1b33cd1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
@@ -139,268 +139,56 @@ AMDGPUResourceUsageAnalysis::analyzeResourceUsage(
 
   Info.UsesVCC =
       MRI.isPhysRegUsed(AMDGPU::VCC_LO) || MRI.isPhysRegUsed(AMDGPU::VCC_HI);
+  Info.NumExplicitSGPR = TRI.getNumUsedPhysRegs(MRI, AMDGPU::SGPR_32RegClass,
+                                                /*IncludeCalls=*/false);
+  if (ST.hasMAIInsts())
+    Info.NumAGPR = TRI.getNumUsedPhysRegs(MRI, AMDGPU::AGPR_32RegClass,
+                                          /*IncludeCalls=*/false);
 
-  // If there are no calls, MachineRegisterInfo can tell us the used register
-  // count easily.
   // A tail call isn't considered a call for MachineFrameInfo's purposes.
-  if (!FrameInfo.hasCalls() && !FrameInfo.hasTailCall()) {
-    Info.NumVGPR = TRI.getNumUsedPhysRegs(MRI, AMDGPU::VGPR_32RegClass);
-    Info.NumExplicitSGPR = TRI.getNumUsedPhysRegs(MRI, AMDGPU::SGPR_32RegClass);
-    if (ST.hasMAIInsts())
-      Info.NumAGPR = TRI.getNumUsedPhysRegs(MRI, AMDGPU::AGPR_32RegClass);
+  bool HasCalls = FrameInfo.hasCalls() || FrameInfo.hasTailCall();
+  // Functions that use the llvm.amdgcn.init.whole.wave intrinsic often have
+  // VGPR arguments that are only added for the purpose of preserving the
+  // inactive lanes. These should not be included in the number of used VGPRs.
+  bool NeedsExplicitVGPRCount = MFI->hasInitWholeWave();
+  if (!HasCalls && !NeedsExplicitVGPRCount) {
+
+    Info.NumVGPR = TRI.getNumUsedPhysRegs(MRI, AMDGPU::VGPR_32RegClass,
+                                          /*IncludeCalls=*/false);
     return Info;
   }
 
   int32_t MaxVGPR = -1;
-  int32_t MaxAGPR = -1;
-  int32_t MaxSGPR = -1;
   Info.CalleeSegmentSize = 0;
 
   for (const MachineBasicBlock &MBB : MF) {
     for (const MachineInstr &MI : MBB) {
-      // TODO: Check regmasks? Do they occur anywhere except calls?
-      for (const MachineOperand &MO : MI.operands()) {
-        unsigned Width = 0;
-        bool IsSGPR = false;
-        bool IsAGPR = false;
-
-        if (!MO.isReg())
-          continue;
-
-        Register Reg = MO.getReg();
-        switch (Reg) {
-        case AMDGPU::EXEC:
-        case AMDGPU::EXEC_LO:
-        case AMDGPU::EXEC_HI:
-        case AMDGPU::SCC:
-        case AMDGPU::M0:
-        case AMDGPU::M0_LO16:
-        case AMDGPU::M0_HI16:
-        case AMDGPU::SRC_SHARED_BASE_LO:
-        case AMDGPU::SRC_SHARED_BASE:
-        case AMDGPU::SRC_SHARED_LIMIT_LO:
-        case AMDGPU::SRC_SHARED_LIMIT:
-        case AMDGPU::SRC_PRIVATE_BASE_LO:
-        case AMDGPU::SRC_PRIVATE_BASE:
-        case AMDGPU::SRC_PRIVATE_LIMIT_LO:
-        case AMDGPU::SRC_PRIVATE_LIMIT:
-        case AMDGPU::SRC_POPS_EXITING_WAVE_ID:
-        case AMDGPU::SGPR_NULL:
-        case AMDGPU::SGPR_NULL64:
-        case AMDGPU::MODE:
-          continue;
-
-        case AMDGPU::NoRegister:
-          assert(MI.isDebugInstr() &&
-                 "Instruction uses invalid noreg register");
-          continue;
-
-        case AMDGPU::VCC:
-        case AMDGPU::VCC_LO:
-        case AMDGPU::VCC_HI:
-        case AMDGPU::VCC_LO_LO16:
-        case AMDGPU::VCC_LO_HI16:
-        case AMDGPU::VCC_HI_LO16:
-        case AMDGPU::VCC_HI_HI16:
-          Info.UsesVCC = true;
-          continue;
-
-        case AMDGPU::FLAT_SCR:
-        case AMDGPU::FLAT_SCR_LO:
-        case AMDGPU::FLAT_SCR_HI:
-          continue;
-
-        case AMDGPU::XNACK_MASK:
-        case AMDGPU::XNACK_MASK_LO:
-        case AMDGPU::XNACK_MASK_HI:
-          llvm_unreachable("xnack_mask registers should not be used");
-
-        case AMDGPU::LDS_DIRECT:
-          llvm_unreachable("lds_direct register should not be used");
-
-        case AMDGPU::TBA:
-        case AMDGPU::TBA_LO:
-        case AMDGPU::TBA_HI:
-        case AMDGPU::TMA:
-        case AMDGPU::TMA_LO:
-        case AMDGPU::TMA_HI:
-          llvm_unreachable("trap handler registers should not be used");
-
-        case AMDGPU::SRC_VCCZ:
-          llvm_unreachable("src_vccz register should not be used");
-
-        case AMDGPU::SRC_EXECZ:
-          llvm_unreachable("src_execz register should not be used");
-
-        case AMDGPU::SRC_SCC:
-          llvm_unreachable("src_scc register should not be used");
-
-        default:
-          break;
-        }
-
-        if (AMDGPU::SGPR_32RegClass.contains(Reg) ||
-            AMDGPU::SGPR_LO16RegClass.contains(Reg) ||
-            AMDGPU::SGPR_HI16RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 1;
-        } else if (AMDGPU::VGPR_32RegClass.contains(Reg) ||
-                   AMDGPU::VGPR_16RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 1;
-        } else if (AMDGPU::AGPR_32RegClass.contains(Reg) ||
-                   AMDGPU::AGPR_LO16RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 1;
-        } else if (AMDGPU::SGPR_64RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 2;
-        } else if (AMDGPU::VReg_64RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 2;
-        } else if (AMDGPU::AReg_64RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 2;
-        } else if (AMDGPU::VReg_96RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 3;
-        } else if (AMDGPU::SReg_96RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 3;
-        } else if (AMDGPU::AReg_96RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 3;
-        } else if (AMDGPU::SGPR_128RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 4;
-        } else if (AMDGPU::VReg_128RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 4;
-        } else if (AMDGPU::AReg_128RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 4;
-        } else if (AMDGPU::VReg_160RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 5;
-        } else if (AMDGPU::SReg_160RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 5;
-        } else if (AMDGPU::AReg_160RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 5;
-        } else if (AMDGPU::VReg_192RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 6;
-        } else if (AMDGPU::SReg_192RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 6;
-        } else if (AMDGPU::AReg_192RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 6;
-        } else if (AMDGPU::VReg_224RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 7;
-        } else if (AMDGPU::SReg_224RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 7;
-        } else if (AMDGPU::AReg_224RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 7;
-        } else if (AMDGPU::SReg_256RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 8;
-        } else if (AMDGPU::VReg_256RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 8;
-        } else if (AMDGPU::AReg_256RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 8;
-        } else if (AMDGPU::VReg_288RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 9;
-        } else if (AMDGPU::SReg_288RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 9;
-        } else if (AMDGPU::AReg_288RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 9;
-        } else if (AMDGPU::VReg_320RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 10;
-        } else if (AMDGPU::SReg_320RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 10;
-        } else if (AMDGPU::AReg_320RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 10;
-        } else if (AMDGPU::VReg_352RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 11;
-        } else if (AMDGPU::SReg_352RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 11;
-        } else if (AMDGPU::AReg_352RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 11;
-        } else if (AMDGPU::VReg_384RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 12;
-        } else if (AMDGPU::SReg_384RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 12;
-        } else if (AMDGPU::AReg_384RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 12;
-        } else if (AMDGPU::SReg_512RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 16;
-        } else if (AMDGPU::VReg_512RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 16;
-        } else if (AMDGPU::AReg_512RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 16;
-        } else if (AMDGPU::SReg_1024RegClass.contains(Reg)) {
-          IsSGPR = true;
-          Width = 32;
-        } else if (AMDGPU::VReg_1024RegClass.contains(Reg)) {
-          IsSGPR = false;
-          Width = 32;
-        } else if (AMDGPU::AReg_1024RegClass.contains(Reg)) {
-          IsSGPR = false;
-          IsAGPR = true;
-          Width = 32;
-        } else {
-          // We only expect TTMP registers or registers that do not belong to
-          // any RC.
-          assert((AMDGPU::TTMP_32RegClass.contains(Reg) ||
-                  AMDGPU::TTMP_64RegClass.contains(Reg) ||
-                  AMDGPU::TTMP_128RegClass.contains(Reg) ||
-                  AMDGPU::TTMP_256RegClass.contains(Reg) ||
-                  AMDGPU::TTMP_512RegClass.contains(Reg) ||
-                  !TRI.getPhysRegBaseClass(Reg)) &&
-                 "Unknown register class");
-        }
-        unsigned HWReg = TRI.getHWRegIndex(Reg);
-        int MaxUsed = HWReg + Width - 1;
-        if (IsSGPR) {
-          MaxSGPR = MaxUsed > MaxSGPR ? MaxUsed : MaxSGPR;
-        } else if (IsAGPR) {
-          MaxAGPR = MaxUsed > MaxAGPR ? MaxUsed : MaxAGPR;
-        } else {
+      if (NeedsExplicitVGPRCount) {
+        for (unsigned i = 0; i < MI.getNumOperands(); ++i) {
+          const MachineOperand &MO = MI.getOperand(i);
+
+          if (!MO.isReg())
+            continue;
+          Register Reg = MO.getReg();
+          const TargetRegisterClass *RC = TRI.getPhysRegBaseClass(Reg);
+
+          if (!RC || !TRI.isVGPRClass(RC))
+            continue;
+
+          // Skip inactive VGPRs in chain functions with the init.whole.wave
+          // intrinsic. These will only appear as implicit use operands on the
+          // chain call, and as the def of an IMPLICIT_DEF. We're going to skip
+          // implicit defs unconditionally though because if they're important
+          // in a different context then they will be counted when they are
+          // used.
+          bool IsChainCall =
+              MFI->isChainFunction() && MI.getOpcode() == AMDGPU::SI_TCRETURN;
+          if (IsChainCall || MI.isImplicitDef())
+            continue;
+
+          unsigned Width = TRI.getRegSizeInBits(*RC) / 32;
+          unsigned HWReg = TRI.getHWRegIndex(Reg);
+          int MaxUsed = HWReg + Width - 1;
           MaxVGPR = MaxUsed > MaxVGPR ? MaxUsed : MaxVGPR;
         }
       }
@@ -464,9 +252,10 @@ AMDGPUResourceUsageAnalysis::analyzeResourceUsage(
     }
   }
 
-  Info.NumExplicitSGPR = MaxSGPR + 1;
-  Info.NumVGPR = MaxVGPR + 1;
-  Info.NumAGPR = MaxAGPR + 1;
+  if (NeedsExplicitVGPRCount)
+    Info.NumVGPR = MaxVGPR + 1;
+  else
+    Info.NumVGPR = TRI.getNumUsedPhysRegs(MRI, AMDGPU::VGPR_32RegClass, false);
 
   return Info;
 }
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index e41189adfb46f..de75ce7552ef4 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -4046,11 +4046,11 @@ SIRegisterInfo::getSubRegAlignmentNumBits(const TargetRegisterClass *RC,
   return 0;
 }
 
-unsigned
-SIRegisterInfo::getNumUsedPhysRegs(const MachineRegisterInfo &MRI,
-                                   const TargetRegisterClass &RC) const {
+unsigned SIRegisterInfo::getNumUsedPhysRegs(const MachineRegisterInfo &MRI,
+                                            const TargetRegisterClass &RC,
+                                            bool IncludeCalls) const {
   for (MCPhysReg Reg : reverse(RC.getRegisters()))
-    if (MRI.isPhysRegUsed(Reg))
+    if (MRI.isPhysRegUsed(Reg, /*SkipRegMaskTest=*/!IncludeCalls))
       return getHWRegIndex(Reg) + 1;
   return 0;
 }
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index a4b135d5e0b59..f3ce80ac2864e 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -482,9 +482,11 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
                                      unsigned SubReg) const;
 
   // \returns a number of registers of a given \p RC used in a function.
-  // Does not go inside function calls.
+  // Does not go inside function calls. If \p IncludeCalls is true, it will
+  // include registers that may be clobbered by calls.
   unsigned getNumUsedPhysRegs(const MachineRegisterInfo &MRI,
-                              const TargetRegisterClass &RC) const;
+                              const TargetRegisterClass &RC,
+                              bool IncludeCalls = true) const;
 
   std::optional<uint8_t> getVRegFlagValue(StringRef Name) const override {
     return Name == "WWM_REG" ? AMDGPU::VirtRegFlag::WWM_REG
diff --git a/llvm/test/CodeGen/AMDGPU/init-whole-wave-vgpr-count-large.ll b/llvm/test/CodeGen/AMDGPU/init-whole-wave-vgpr-count-large.ll
new file mode 100644
index 0000000000000..e47f5e25ead3a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/init-whole-wave-vgpr-count-large.ll
@@ -0,0 +1,76 @@
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1200 < %s | FileCheck %s
+
+; CHECK-LABEL: .shader_functions:
+
+; Use VGPRs above the input arguments.
+; CHECK-LABEL: _miss_1:
+; CHECK: .vgpr_count:{{.*}}0x1d{{$}}
+
+define amdgpu_cs_chain void @_miss_1(ptr inreg %next.callee, i32 inreg %global.table, i32 inreg %max.outgoing.vgpr.count,
+                                    i32 %vcr, { i32 } %system.data,
+                                    i32 %inactive.vgpr, i32 %inactive.vgpr1, i32 %inactive.vgpr2, i32 %inactive.vgpr3,
+                                    i32 %inactive.vgpr4, i32 %inactive.vgpr5, i32 %inactive.vgpr6, i32 %inactive.vgpr7,
+                                    i32 %inactive.vgpr8, i32 %inactive.vgpr9)
+                                    local_unnamed_addr {
+entry:
+  %system.data.value = extractvalue { i32 } %system.data, 0
+  %dead.val = call i32 @llvm.amdgcn.dead.i32()
+  %is.whole.wave = call i1 @llvm.amdgcn.init.whole.wave()
+  br i1 %is.whole.wave, label %shader, label %tail
+
+shader:
+  %system.data.extract = extractvalue { i32 } %system.data, 0
+  %data.mul = mul i32 %system.data.extract, 2
+  %data.add = add i32 %data.mul, 1
+  call void asm sideeffect "; clobber v28", "~{v28}"()
+  br label %tail
+
+tail:
+  %final.vcr = phi i32 [ %vcr, %entry ], [ %data.mul, %shader ]
+  %final.sys.data = phi i32 [ %system.data.value, %entry ], [ %data.add, %shader ]
+  %final.inactive0 = phi i32 [ %inactive.vgpr, %entry ], [ %dead.val, %shader ]
+  %final.inactive1 = phi i32 [ %inactive.vgpr1, %entry ], [ %dead.val, %shader ]
+  %final.inactive2 = phi i32 [ %inactive.vgpr2, %entry ], [ %dead.val, %shader ]
+  %final.inactive3 = phi i32 [ %inactive.vgpr3, %entry ], [ %dead.val, %shader ]
+  %final.inactive4 = phi i32 [ %inactive.vgpr4, %entry ], [ %dead.val, %shader ]
+  %final.inactive5 = phi i32 [ %inactive.vgpr5, %entry ], [ %dead.val, %shader ]
+  %final.inactive6 = phi i32 [ %inactive.vgpr6, %entry ], [ %dead.val, %shader ]
+  %final.inactive7 = phi i32 [ %inactive.vgpr7, %entry ], [ %dead.val, %shader ]
+  %final.inactive8 = phi i32 [ %inactive.vgpr8, %entry ], [ %dead.val, %shader ]
+  %final.inactive9 = phi i32 [ %inactive.vgpr9, %entry ], [ %dead.val, %shader ]
+
+  %struct.init = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } poison, i32 %final.vcr, 0
+  %struct.with.data = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.init, i32 %final.sys.data, 1
+  %struct.with.inactive0 = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.data, i32 %final.inactive0, 2
+  %struct.with.inactive1 = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.inactive0, i32 %final.inactive1, 3
+  %struct.with.inactive2 = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.inactive1, i32 %final.inactive2, 4
+  %struct.with.inactive3 = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.inactive2, i32 %final.inactive3, 5
+  %struct.with.inactive4 = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.inactive3, i32 %final.inactive4, 6
+  %struct.with.inactive5 = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.inactive4, i32 %final.inactive5, 7
+  %struct.with.inactive6 = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.inactive5, i32 %final.inactive6, 8
+  %struct.with.inactive7 = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.inactive6, i32 %final.inactive7, 9
+  %struct.with.inactive8 = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.inactive7, i32 %final.inactive8, 10
+  %final.struct = insertvalue { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %struct.with.inactive8, i32 %final.inactive9, 11
+
+  %vec.global = insertelement <4 x i32> poison, i32 %global.table, i64 0
+  %vec.max.vgpr = insertelement <4 x i32> %vec.global, i32 %max.outgoing.vgpr.count, i64 1
+  %vec.sys.data = insertelement <4 x i32> %vec.max.vgpr, i32 %final.sys.data, i64 2
+  %final.vec = insertelement <4 x i32> %vec.sys.data, i32 0, i64 3
+
+  call void (ptr, i32, <4 x i32>, { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }, i32, ...)
+        @llvm.amdgcn.cs.chain.p0.i32.v4i32.sl_i32i32i32i32i32i32i32i32i32i32i32i32s(
+        ptr %next.callee, i32 0, <4 x i32> inreg %final.vec,
+        { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 } %final.struct,
+        i32 1, i32 %max.outgoing.vgpr.count, i32 -1, ptr @retry_vgpr_alloc.v4i32)
+  unreachable
+}
+
+declare i32 @llvm.amdgcn.dead.i32()
+declare i1 @llvm.amdgcn.init.whole.wave()
+declare void @llvm.amdgcn.cs.chain.p0.i32.v4i32.sl_i32i32i32i32i32i32i32i32i32i32i32i32s(ptr, i32, <4 x i32>, { i32, ...
[truncated]

rovka · 2025-06-23T12:49:51Z

Ping

arsenm · 2025-06-23T13:58:14Z

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

+  // init.whole.wave intrinsic, since they sometimes have VGPR arguments that
+  // are only added for the purpose of preserving their inactive lanes and
+  // should not be included in the vgpr-count.
+  if (isShader(F.getCallingConv()) && isEntryFunctionCC(F.getCallingConv())) {


Can you fuse these into one function, I don't want to have to do the intersection to figure out the list of CCs this applies to

arsenm · 2025-06-24T12:06:36Z

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

+// arguments that are only added for the purpose of preserving their inactive
+// lanes and should not be included in the vgpr-count.
+LLVM_READNONE
+constexpr bool shouldReportUnusedFuncArgs(CallingConv::ID CC) {


Name should express the reason, not the usage context. Although here I don't understand why you're going out of your way to exclude kernels. The same reasoning should apply when using preloaded arguments

Can you suggest a better name? This is mostly just an implementation detail. Maybe it shouldn't be in AMDGPUBaseInfo in the first place. Should I just move it to AMDGPUAsmPrinter.cpp?

Although here I don't understand why you're going out of your way to exclude kernels. The same reasoning should apply when using preloaded arguments

Graphics and kernels handle hardware-initialized registers a bit differently. For graphics, we're putting them as arguments to the IR functions, and for compute we track them in SIMachineFunctionInfo instead. We do handle the preloaded arguments in the same place in AMDGPUAsmPrinter, just on the else branch of where this helper is used.

rovka requested review from jayfoad, arsenm, nhaehnle and Flakebi June 19, 2025 08:52

llvmbot added the backend:AMDGPU label Jun 19, 2025

arsenm reviewed Jun 23, 2025

View reviewed changes

Extract helper

31619f1

arsenm reviewed Jun 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Ignore inactive VGPRs in .vgpr_count #144855

[AMDGPU] Ignore inactive VGPRs in .vgpr_count #144855

Uh oh!

rovka commented Jun 19, 2025

Uh oh!

llvmbot commented Jun 19, 2025

Uh oh!

rovka commented Jun 23, 2025

Uh oh!

arsenm Jun 23, 2025

Uh oh!

arsenm Jun 24, 2025

Uh oh!

rovka Jun 26, 2025

Uh oh!

Uh oh!

[AMDGPU] Ignore inactive VGPRs in .vgpr_count #144855

Are you sure you want to change the base?

[AMDGPU] Ignore inactive VGPRs in .vgpr_count #144855

Uh oh!

Conversation

rovka commented Jun 19, 2025

Uh oh!

llvmbot commented Jun 19, 2025

Uh oh!

rovka commented Jun 23, 2025

Uh oh!

arsenm Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

rovka Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!