[AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain #130094

rovka · 2025-03-06T12:45:21Z

The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may indicate that we want to reallocate the VGPRs before performing the call.

A call with the following arguments:

llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args,
  /*flags*/0x1, %num_vgprs, %fallback_exec, %fallback_callee

is supposed to do the following:

copy the SGPR and VGPR args into their respective registers
try to change the VGPR allocation
if the allocation has succeeded, set EXEC to %exec and jump to %callee, otherwise set EXEC to %fallback_exec and jump to %fallback_callee

This patch implements the dynamic VGPR behaviour by generating an S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and callee. The rest of the call sequence is left undisturbed (i.e. identical to the case where the flags are 0 and we don't use dynamic VGPRs). We achieve this by introducing some new pseudos (SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason is so that we don't risk other passes (particularly the PostRA scheduler) introducing instructions between the S_ALLOC_VGPR and the jump. Such instructions might end up using VGPRs that have been deallocated, or the wrong EXEC mask. Once the whole backend treats S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use VGPRs, we could in principle move the expansion earlier (but in the absence of a good reason for that my personal preference is to keep it later in order to make debugging easier).

Since the expansion happens after register allocation, we're careful to select constants to immediate operands instead of letting ISel generate S_MOVs which could interfere with register allocation (i.e. make it look like we need more registers than we actually do).

For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out during ISel in wave64 mode. However, we can define the pseudos for wave64 too so it's easy to handle if future generations support it.

This patch only adds the instruction for disassembly support. We neither have an instrinsic nor codegen support, and it is unclear whether we actually want to ever have an intrinsic, given the fragile semantics. For now, it will be generated only by the backend in very specific circumstances.

This represents a hardware mode supported only for wave32 compute shaders. When enabled, we set the `.dynamic_vgpr_en` field of `.compute_registers` to true in the PAL metadata.

In dynamic VGPR mode, Waves must deallocate all VGPRs before exiting. If the shader program does not do this, hardware inserts `S_ALLOC_VGPR 0` before S_ENDPGM, but this may incur some performance cost. Therefore it's better if the compiler proactively generates that instruction. This patch extends `si-insert-waitcnts` to deallocate the VGPRs via a `S_ALLOC_VGPR 0` before any `S_ENDPGM` when in dynamic VGPR mode.

The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may indicate that we want to reallocate the VGPRs before performing the call. A call with the following arguments: ``` llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args, /*flags*/0x1, %num_vgprs, %fallback_exec, %fallback_callee ``` is supposed to do the following: - copy the SGPR and VGPR args into their respective registers - try to change the VGPR allocation - if the allocation has succeeded, set EXEC to %exec and jump to %callee, otherwise set EXEC to %fallback_exec and jump to %fallback_callee This patch implements the dynamic VGPR behaviour by generating an S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and callee. The rest of the call sequence is left undisturbed (i.e. identical to the case where the flags are 0 and we don't use dynamic VGPRs). We achieve this by introducing some new pseudos (SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason is so that we don't risk other passes (particularly the PostRA scheduler) introducing instructions between the S_ALLOC_VGPR and the jump. Such instructions might end up using VGPRs that have been deallocated, or the wrong EXEC mask. Once the whole backend treats S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use VGPRs, we could in principle move the expansion earlier (but in the absence of a good reason for that my personal preference is to keep it later in order to make debugging easier). Since the expansion happens after register allocation, we're careful to select constants to immediate operands instead of letting ISel generate S_MOVs which could interfere with register allocation (i.e. make it look like we need more registers than we actually do). For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out during ISel in wave64 mode. However, we can define the pseudos for wave64 too so it's easy to handle if future generations support it. Co-authored-by: Ana Mihajlovic <[email protected]>

llvmbot · 2025-03-06T12:45:55Z

@llvm/pr-subscribers-backend-amdgpu

Author: Diana Picus (rovka)

Changes

The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may indicate that we want to reallocate the VGPRs before performing the call.

A call with the following arguments:

llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args,
  /*flags*/0x1, %num_vgprs, %fallback_exec, %fallback_callee

is supposed to do the following:

copy the SGPR and VGPR args into their respective registers
try to change the VGPR allocation
if the allocation has succeeded, set EXEC to %exec and jump to %callee, otherwise set EXEC to %fallback_exec and jump to %fallback_callee

This patch implements the dynamic VGPR behaviour by generating an S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and callee. The rest of the call sequence is left undisturbed (i.e. identical to the case where the flags are 0 and we don't use dynamic VGPRs). We achieve this by introducing some new pseudos (SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason is so that we don't risk other passes (particularly the PostRA scheduler) introducing instructions between the S_ALLOC_VGPR and the jump. Such instructions might end up using VGPRs that have been deallocated, or the wrong EXEC mask. Once the whole backend treats S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use VGPRs, we could in principle move the expansion earlier (but in the absence of a good reason for that my personal preference is to keep it later in order to make debugging easier).

Since the expansion happens after register allocation, we're careful to select constants to immediate operands instead of letting ISel generate S_MOVs which could interfere with register allocation (i.e. make it look like we need more registers than we actually do).

For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out during ISel in wave64 mode. However, we can define the pseudos for wave64 too so it's easy to handle if future generations support it.

Patch is 94.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130094.diff

11 Files Affected:

(modified) llvm/include/llvm/CodeGen/SelectionDAGISel.h (+15-14)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+9-4)
(modified) llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp (+95-31)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+66-22)
(modified) llvm/lib/Target/AMDGPU/SIInstructions.td (+51-25)
(modified) llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp (+55-7)
(added) llvm/test/CodeGen/AMDGPU/amdgcn-cs-chain-intrinsic-dyn-vgpr-w32.ll (+97)
(modified) llvm/test/CodeGen/AMDGPU/isel-amdgcn-cs-chain-intrinsic-w32.ll (+12-24)
(modified) llvm/test/CodeGen/AMDGPU/isel-amdgcn-cs-chain-intrinsic-w64.ll (+12-24)
(added) llvm/test/CodeGen/AMDGPU/isel-amdgpu-cs-chain-intrinsic-dyn-vgpr-w32.ll (+315)
(added) llvm/test/CodeGen/AMDGPU/remove-register-flags.mir (+19)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAGISel.h b/llvm/include/llvm/CodeGen/SelectionDAGISel.h
index e9452a6dc6233..55f8f19d437a0 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGISel.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGISel.h
@@ -328,20 +328,21 @@ class SelectionDAGISel {
   };
 
   enum {
-    OPFL_None       = 0,  // Node has no chain or glue input and isn't variadic.
-    OPFL_Chain      = 1,     // Node has a chain input.
-    OPFL_GlueInput  = 2,     // Node has a glue input.
-    OPFL_GlueOutput = 4,     // Node has a glue output.
-    OPFL_MemRefs    = 8,     // Node gets accumulated MemRefs.
-    OPFL_Variadic0  = 1<<4,  // Node is variadic, root has 0 fixed inputs.
-    OPFL_Variadic1  = 2<<4,  // Node is variadic, root has 1 fixed inputs.
-    OPFL_Variadic2  = 3<<4,  // Node is variadic, root has 2 fixed inputs.
-    OPFL_Variadic3  = 4<<4,  // Node is variadic, root has 3 fixed inputs.
-    OPFL_Variadic4  = 5<<4,  // Node is variadic, root has 4 fixed inputs.
-    OPFL_Variadic5  = 6<<4,  // Node is variadic, root has 5 fixed inputs.
-    OPFL_Variadic6  = 7<<4,  // Node is variadic, root has 6 fixed inputs.
-
-    OPFL_VariadicInfo = OPFL_Variadic6
+    OPFL_None = 0,       // Node has no chain or glue input and isn't variadic.
+    OPFL_Chain = 1,      // Node has a chain input.
+    OPFL_GlueInput = 2,  // Node has a glue input.
+    OPFL_GlueOutput = 4, // Node has a glue output.
+    OPFL_MemRefs = 8,    // Node gets accumulated MemRefs.
+    OPFL_Variadic0 = 1 << 4, // Node is variadic, root has 0 fixed inputs.
+    OPFL_Variadic1 = 2 << 4, // Node is variadic, root has 1 fixed inputs.
+    OPFL_Variadic2 = 3 << 4, // Node is variadic, root has 2 fixed inputs.
+    OPFL_Variadic3 = 4 << 4, // Node is variadic, root has 3 fixed inputs.
+    OPFL_Variadic4 = 5 << 4, // Node is variadic, root has 4 fixed inputs.
+    OPFL_Variadic5 = 6 << 4, // Node is variadic, root has 5 fixed inputs.
+    OPFL_Variadic6 = 7 << 4, // Node is variadic, root has 6 fixed inputs.
+    OPFL_Variadic7 = 8 << 4, // Node is variadic, root has 7 fixed inputs.
+
+    OPFL_VariadicInfo = 15 << 4 // Mask for extracting the OPFL_VariadicN bits.
   };
 
   /// getNumFixedFromVariadicInfo - Transform an EmitNode flags word into the
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index d5a07e616236e..ba5f7e03d98e4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -7996,10 +7996,6 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     return;
   }
   case Intrinsic::amdgcn_cs_chain: {
-    assert(I.arg_size() == 5 && "Additional args not supported yet");
-    assert(cast<ConstantInt>(I.getOperand(4))->isZero() &&
-           "Non-zero flags not supported yet");
-
     // At this point we don't care if it's amdgpu_cs_chain or
     // amdgpu_cs_chain_preserve.
     CallingConv::ID CC = CallingConv::AMDGPU_CS_Chain;
@@ -8026,6 +8022,15 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     assert(!Args[1].IsInReg && "VGPR args should not be marked inreg");
     Args[2].IsInReg = true; // EXEC should be inreg
 
+    // Forward the flags and any additional arguments.
+    for (unsigned Idx = 4; Idx < I.arg_size(); ++Idx) {
+      TargetLowering::ArgListEntry Arg;
+      Arg.Node = getValue(I.getOperand(Idx));
+      Arg.Ty = I.getOperand(Idx)->getType();
+      Arg.setAttributes(&I, Idx);
+      Args.push_back(Arg);
+    }
+
     TargetLowering::CallLoweringInfo CLI(DAG);
     CLI.setDebugLoc(getCurSDLoc())
         .setChain(getRoot())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
index 478a4c161fce7..a440617319228 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
@@ -953,8 +953,9 @@ getAssignFnsForCC(CallingConv::ID CC, const SITargetLowering &TLI) {
 }
 
 static unsigned getCallOpcode(const MachineFunction &CallerF, bool IsIndirect,
-                              bool IsTailCall, bool isWave32,
-                              CallingConv::ID CC) {
+                              bool IsTailCall, bool IsWave32,
+                              CallingConv::ID CC,
+                              bool IsDynamicVGPRChainCall = false) {
   // For calls to amdgpu_cs_chain functions, the address is known to be uniform.
   assert((AMDGPU::isChainCC(CC) || !IsIndirect || !IsTailCall) &&
          "Indirect calls can't be tail calls, "
@@ -962,8 +963,12 @@ static unsigned getCallOpcode(const MachineFunction &CallerF, bool IsIndirect,
   if (!IsTailCall)
     return AMDGPU::G_SI_CALL;
 
-  if (AMDGPU::isChainCC(CC))
-    return isWave32 ? AMDGPU::SI_CS_CHAIN_TC_W32 : AMDGPU::SI_CS_CHAIN_TC_W64;
+  if (AMDGPU::isChainCC(CC)) {
+    if (IsDynamicVGPRChainCall)
+      return IsWave32 ? AMDGPU::SI_CS_CHAIN_TC_W32_DVGPR
+                      : AMDGPU::SI_CS_CHAIN_TC_W64_DVGPR;
+    return IsWave32 ? AMDGPU::SI_CS_CHAIN_TC_W32 : AMDGPU::SI_CS_CHAIN_TC_W64;
+  }
 
   return CC == CallingConv::AMDGPU_Gfx ? AMDGPU::SI_TCRETURN_GFX :
                                          AMDGPU::SI_TCRETURN;
@@ -972,7 +977,8 @@ static unsigned getCallOpcode(const MachineFunction &CallerF, bool IsIndirect,
 // Add operands to call instruction to track the callee.
 static bool addCallTargetOperands(MachineInstrBuilder &CallInst,
                                   MachineIRBuilder &MIRBuilder,
-                                  AMDGPUCallLowering::CallLoweringInfo &Info) {
+                                  AMDGPUCallLowering::CallLoweringInfo &Info,
+                                  bool IsDynamicVGPRChainCall = false) {
   if (Info.Callee.isReg()) {
     CallInst.addReg(Info.Callee.getReg());
     CallInst.addImm(0);
@@ -983,7 +989,12 @@ static bool addCallTargetOperands(MachineInstrBuilder &CallInst,
     auto Ptr = MIRBuilder.buildGlobalValue(
       LLT::pointer(GV->getAddressSpace(), 64), GV);
     CallInst.addReg(Ptr.getReg(0));
-    CallInst.add(Info.Callee);
+
+    if (IsDynamicVGPRChainCall)
+      // DynamicVGPR chain calls are always indirect.
+      CallInst.addImm(0);
+    else
+      CallInst.add(Info.Callee);
   } else
     return false;
 
@@ -1177,6 +1188,18 @@ void AMDGPUCallLowering::handleImplicitCallArguments(
   }
 }
 
+namespace {
+// Chain calls have special arguments that we need to handle. These have the
+// same index as they do in the llvm.amdgcn.cs.chain intrinsic.
+enum ChainCallArgIdx {
+  Exec = 1,
+  Flags = 4,
+  NumVGPRs = 5,
+  FallbackExec = 6,
+  FallbackCallee = 7,
+};
+} // anonymous namespace
+
 bool AMDGPUCallLowering::lowerTailCall(
     MachineIRBuilder &MIRBuilder, CallLoweringInfo &Info,
     SmallVectorImpl<ArgInfo> &OutArgs) const {
@@ -1185,6 +1208,8 @@ bool AMDGPUCallLowering::lowerTailCall(
   SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
   const Function &F = MF.getFunction();
   MachineRegisterInfo &MRI = MF.getRegInfo();
+  const SIInstrInfo *TII = ST.getInstrInfo();
+  const SIRegisterInfo *TRI = ST.getRegisterInfo();
   const SITargetLowering &TLI = *getTLI<SITargetLowering>();
 
   // True when we're tail calling, but without -tailcallopt.
@@ -1200,34 +1225,78 @@ bool AMDGPUCallLowering::lowerTailCall(
   if (!IsSibCall)
     CallSeqStart = MIRBuilder.buildInstr(AMDGPU::ADJCALLSTACKUP);
 
-  unsigned Opc =
-      getCallOpcode(MF, Info.Callee.isReg(), true, ST.isWave32(), CalleeCC);
+  bool IsChainCall = AMDGPU::isChainCC(Info.CallConv);
+  bool IsDynamicVGPRChainCall = false;
+
+  if (IsChainCall) {
+    ArgInfo FlagsArg = Info.OrigArgs[ChainCallArgIdx::Flags];
+    const APInt &FlagsValue = cast<ConstantInt>(FlagsArg.OrigValue)->getValue();
+    if (FlagsValue.isZero()) {
+      if (Info.OrigArgs.size() != 5) {
+        LLVM_DEBUG(dbgs() << "No additional args allowed if flags == 0");
+        return false;
+      }
+    } else if (FlagsValue.isOneBitSet(0)) {
+      IsDynamicVGPRChainCall = true;
+
+      if (Info.OrigArgs.size() != 8) {
+        LLVM_DEBUG(dbgs() << "Expected 3 additional args");
+        return false;
+      }
+
+      // On GFX12, we can only change the VGPR allocation for wave32.
+      if (!ST.isWave32()) {
+        LLVM_DEBUG(dbgs() << "Dynamic VGPR mode is only supported for wave32");
+        return false;
+      }
+
+      ArgInfo FallbackExecArg = Info.OrigArgs[ChainCallArgIdx::FallbackExec];
+      assert(FallbackExecArg.Regs.size() == 1 &&
+             "Expected single register for fallback EXEC");
+      if (!FallbackExecArg.Ty->isIntegerTy(ST.getWavefrontSize())) {
+        LLVM_DEBUG(dbgs() << "Bad type for fallback EXEC");
+        return false;
+      }
+    }
+  }
+
+  unsigned Opc = getCallOpcode(MF, Info.Callee.isReg(), /*IsTailCall*/ true,
+                               ST.isWave32(), CalleeCC, IsDynamicVGPRChainCall);
   auto MIB = MIRBuilder.buildInstrNoInsert(Opc);
-  if (!addCallTargetOperands(MIB, MIRBuilder, Info))
+  if (!addCallTargetOperands(MIB, MIRBuilder, Info, IsDynamicVGPRChainCall))
     return false;
 
   // Byte offset for the tail call. When we are sibcalling, this will always
   // be 0.
   MIB.addImm(0);
 
-  // If this is a chain call, we need to pass in the EXEC mask.
-  const SIRegisterInfo *TRI = ST.getRegisterInfo();
-  if (AMDGPU::isChainCC(Info.CallConv)) {
-    ArgInfo ExecArg = Info.OrigArgs[1];
+  // If this is a chain call, we need to pass in the EXEC mask as well as any
+  // other special args.
+  if (IsChainCall) {
+    auto AddRegOrImm = [&](const ArgInfo &Arg) {
+      if (auto CI = dyn_cast<ConstantInt>(Arg.OrigValue)) {
+        MIB.addImm(CI->getSExtValue());
+      } else {
+        MIB.addReg(Arg.Regs[0]);
+        unsigned Idx = MIB->getNumOperands() - 1;
+        MIB->getOperand(Idx).setReg(constrainOperandRegClass(
+            MF, *TRI, MRI, *TII, *ST.getRegBankInfo(), *MIB, MIB->getDesc(),
+            MIB->getOperand(Idx), Idx));
+      }
+    };
+
+    ArgInfo ExecArg = Info.OrigArgs[ChainCallArgIdx::Exec];
     assert(ExecArg.Regs.size() == 1 && "Too many regs for EXEC");
 
-    if (!ExecArg.Ty->isIntegerTy(ST.getWavefrontSize()))
+    if (!ExecArg.Ty->isIntegerTy(ST.getWavefrontSize())) {
+      LLVM_DEBUG(dbgs() << "Bad type for EXEC");
       return false;
-
-    if (const auto *CI = dyn_cast<ConstantInt>(ExecArg.OrigValue)) {
-      MIB.addImm(CI->getSExtValue());
-    } else {
-      MIB.addReg(ExecArg.Regs[0]);
-      unsigned Idx = MIB->getNumOperands() - 1;
-      MIB->getOperand(Idx).setReg(constrainOperandRegClass(
-          MF, *TRI, MRI, *ST.getInstrInfo(), *ST.getRegBankInfo(), *MIB,
-          MIB->getDesc(), MIB->getOperand(Idx), Idx));
     }
+
+    AddRegOrImm(ExecArg);
+    if (IsDynamicVGPRChainCall)
+      std::for_each(Info.OrigArgs.begin() + ChainCallArgIdx::NumVGPRs,
+                    Info.OrigArgs.end(), AddRegOrImm);
   }
 
   // Tell the call which registers are clobbered.
@@ -1329,9 +1398,9 @@ bool AMDGPUCallLowering::lowerTailCall(
   // FIXME: We should define regbankselectable call instructions to handle
   // divergent call targets.
   if (MIB->getOperand(0).isReg()) {
-    MIB->getOperand(0).setReg(constrainOperandRegClass(
-        MF, *TRI, MRI, *ST.getInstrInfo(), *ST.getRegBankInfo(), *MIB,
-        MIB->getDesc(), MIB->getOperand(0), 0));
+    MIB->getOperand(0).setReg(
+        constrainOperandRegClass(MF, *TRI, MRI, *TII, *ST.getRegBankInfo(),
+                                 *MIB, MIB->getDesc(), MIB->getOperand(0), 0));
   }
 
   MF.getFrameInfo().setHasTailCall();
@@ -1345,11 +1414,6 @@ bool AMDGPUCallLowering::lowerChainCall(MachineIRBuilder &MIRBuilder,
   ArgInfo Callee = Info.OrigArgs[0];
   ArgInfo SGPRArgs = Info.OrigArgs[2];
   ArgInfo VGPRArgs = Info.OrigArgs[3];
-  ArgInfo Flags = Info.OrigArgs[4];
-
-  assert(cast<ConstantInt>(Flags.OrigValue)->isZero() &&
-         "Non-zero flags aren't supported yet.");
-  assert(Info.OrigArgs.size() == 5 && "Additional args aren't supported yet.");
 
   MachineFunction &MF = MIRBuilder.getMF();
   const Function &F = MF.getFunction();
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index fe095414e5172..5438c6f50dea2 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -3657,6 +3657,19 @@ bool SITargetLowering::mayBeEmittedAsTailCall(const CallInst *CI) const {
   return true;
 }
 
+namespace {
+// Chain calls have special arguments that we need to handle. These are
+// tagging along at the end of the arguments list(s), after the SGPR and VGPR
+// arguments (index 0 and 1 respectively).
+enum ChainCallArgIdx {
+  Exec = 2,
+  Flags,
+  NumVGPRs,
+  FallbackExec,
+  FallbackCallee
+};
+} // anonymous namespace
+
 // The wave scratch offset register is used as the global base pointer.
 SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
                                     SmallVectorImpl<SDValue> &InVals) const {
@@ -3665,37 +3678,67 @@ SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
 
   SelectionDAG &DAG = CLI.DAG;
 
-  TargetLowering::ArgListEntry RequestedExec;
-  if (IsChainCallConv) {
-    // The last argument should be the value that we need to put in EXEC.
-    // Pop it out of CLI.Outs and CLI.OutVals before we do any processing so we
-    // don't treat it like the rest of the arguments.
-    RequestedExec = CLI.Args.back();
-    assert(RequestedExec.Node && "No node for EXEC");
+  const SDLoc &DL = CLI.DL;
+  SDValue Chain = CLI.Chain;
+  SDValue Callee = CLI.Callee;
 
-    if (!RequestedExec.Ty->isIntegerTy(Subtarget->getWavefrontSize()))
+  llvm::SmallVector<SDValue, 6> ChainCallSpecialArgs;
+  if (IsChainCallConv) {
+    // The last arguments should be the value that we need to put in EXEC,
+    // followed by the flags and any other arguments with special meanings.
+    // Pop them out of CLI.Outs and CLI.OutVals before we do any processing so
+    // we don't treat them like the "real" arguments.
+    auto RequestedExecIt = std::find_if(
+        CLI.Outs.begin(), CLI.Outs.end(),
+        [](const ISD::OutputArg &Arg) { return Arg.OrigArgIndex == 2; });
+    assert(RequestedExecIt != CLI.Outs.end() && "No node for EXEC");
+
+    size_t SpecialArgsBeginIdx = RequestedExecIt - CLI.Outs.begin();
+    CLI.OutVals.erase(CLI.OutVals.begin() + SpecialArgsBeginIdx,
+                      CLI.OutVals.end());
+    CLI.Outs.erase(RequestedExecIt, CLI.Outs.end());
+
+    assert(CLI.Outs.back().OrigArgIndex < 2 &&
+           "Haven't popped all the special args");
+
+    TargetLowering::ArgListEntry RequestedExecArg =
+        CLI.Args[ChainCallArgIdx::Exec];
+    if (!RequestedExecArg.Ty->isIntegerTy(Subtarget->getWavefrontSize()))
       return lowerUnhandledCall(CLI, InVals, "Invalid value for EXEC");
 
-    assert(CLI.Outs.back().OrigArgIndex == 2 && "Unexpected last arg");
-    CLI.Outs.pop_back();
-    CLI.OutVals.pop_back();
+    // Convert constants into TargetConstants, so they become immediate operands
+    // instead of being selected into S_MOV.
+    auto PushNodeOrTargetConstant = [&](TargetLowering::ArgListEntry Arg) {
+      if (auto ArgNode = dyn_cast<ConstantSDNode>(Arg.Node))
+        ChainCallSpecialArgs.push_back(DAG.getTargetConstant(
+            ArgNode->getAPIntValue(), DL, ArgNode->getValueType(0)));
+      else
+        ChainCallSpecialArgs.push_back(Arg.Node);
+    };
 
-    if (RequestedExec.Ty->isIntegerTy(64)) {
-      assert(CLI.Outs.back().OrigArgIndex == 2 && "Exec wasn't split up");
-      CLI.Outs.pop_back();
-      CLI.OutVals.pop_back();
-    }
+    PushNodeOrTargetConstant(RequestedExecArg);
+
+    // Process any other special arguments depending on the value of the flags.
+    TargetLowering::ArgListEntry Flags = CLI.Args[ChainCallArgIdx::Flags];
+
+    const APInt &FlagsValue = cast<ConstantSDNode>(Flags.Node)->getAPIntValue();
+    if (FlagsValue.isZero()) {
+      if (CLI.Args.size() > ChainCallArgIdx::Flags + 1)
+        return lowerUnhandledCall(CLI, InVals,
+                                  "No additional args allowed if flags == 0");
+    } else if (FlagsValue.isOneBitSet(0)) {
+      if (CLI.Args.size() != ChainCallArgIdx::FallbackCallee + 1) {
+        return lowerUnhandledCall(CLI, InVals, "Expected 3 additional args");
+      }
 
-    assert(CLI.Outs.back().OrigArgIndex != 2 &&
-           "Haven't popped all the pieces of the EXEC mask");
+      std::for_each(CLI.Args.begin() + ChainCallArgIdx::NumVGPRs,
+                    CLI.Args.end(), PushNodeOrTargetConstant);
+    }
   }
 
-  const SDLoc &DL = CLI.DL;
   SmallVector<ISD::OutputArg, 32> &Outs = CLI.Outs;
   SmallVector<SDValue, 32> &OutVals = CLI.OutVals;
   SmallVector<ISD::InputArg, 32> &Ins = CLI.Ins;
-  SDValue Chain = CLI.Chain;
-  SDValue Callee = CLI.Callee;
   bool &IsTailCall = CLI.IsTailCall;
   bool IsVarArg = CLI.IsVarArg;
   bool IsSibCall = false;
@@ -3983,7 +4026,8 @@ SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
   }
 
   if (IsChainCallConv)
-    Ops.push_back(RequestedExec.Node);
+    Ops.insert(Ops.end(), ChainCallSpecialArgs.begin(),
+               ChainCallSpecialArgs.end());
 
   // Add argument registers to the end of the list so that they are known live
   // into the call.
diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 63f66023837a2..e5af35f2da686 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -692,36 +692,42 @@ def : GCNPat<
   (SI_TCRETURN_GFX Gfx_CCR_SGPR_64:$src0, (i64 0), i32imm:$fpdiff)
 >;
 
-// Pseudo for the llvm.amdgcn.cs.chain intrinsic.
-// This is essentially a tail call, but it also takes a mask to put in EXEC
-// right before jumping to the callee.
-class SI_CS_CHAIN_TC<
+// Pseudos for the llvm.amdgcn.cs.chain intrinsic.
+multiclass SI_CS_CHAIN_TC<
     ValueType execvt, Predicate wavesizepred,
-    RegisterOperand execrc = getSOPSrcForVT<execvt>.ret>
-    : SPseudoInstSI <(outs),
-      (ins CCR_SGPR_64:$src0, unknown:$callee, i32imm:$fpdiff, execrc:$exec)> {
-  let FixedSize = 0;
-  let isCall = 1;
-  let isTerminator = 1;
-  let isBarrier = 1;
-  let isReturn = 1;
-  let UseNamedOperandTable = 1;
-  let SchedRW = [WriteBranch];
-  let isConvergent = 1;
-
-  let WaveSizePredicate = wavesizepred;
-}
-
-def SI_CS_CHAIN_TC_W32 : SI_CS_CHAIN_TC<i32, isWave32>;
-def SI_CS_CHAIN_TC_W64 : SI_CS_CHAIN_TC<i64, isWave64>;
+    RegisterOperand execrc = getSOPSrcForVT<execvt>.ret> {
+  let FixedSize = 0,
+      isCall = 1,
+      isTerminator = 1,
+      isBarrier = 1,
+      isReturn = 1,
+      UseNamedOperandTable = 1,
+      SchedRW = [WriteBranch],
+      isConvergent = 1,
+      WaveSizePredicate = wavesizepred in {
+  // This is essentially a tail call, but it also takes a mask to put in EXEC
+  // right before jumping to the callee.
+  def NAME: SPseudoInstSI <(outs),
+      (ins CCR_SGPR_64:$src0, unknown:$callee, i32imm:$fpdiff, execrc:$exec)>;
+
+  // Same as above, but it will first try to reallocate the VGPRs, and choose an
+  // EXEC mask and a callee depending on the success of the reallocation attempt.
+  def _DVGPR : SPseudoInstSI <(outs),
+      (ins CCR_SGPR_64:$src0, i64imm:$callee, i32imm:$fpdiff, execrc:$exec,
+           SSrc_b32:$numvgprs, execrc:$fbexec, CCR_SGPR_64:$fbcallee)>;
+  } // End FixedSize = 0 etc
+}
+
+defm SI_CS_CHAIN_TC_W32 : SI_CS_CHAIN_TC<i32, isWave32>;
+defm SI_CS_CHAIN_TC_W64 : SI_CS_CHAIN_TC<i64, isWave64>;
 
 // Handle selecting direct & indirect calls via SI_CS_CHAIN_TC_W32/64
 multiclass si_cs_chain_tc_pattern<
   dag callee, ValueType execvt, RegisterOperand execrc, Instruction tc> {
-def : GCNPat<
-  (AMDGPUtc_return_chain i64:$src0, callee, (i32 timm:$fpdiff), execvt:$exec),
-  (tc CCR_SGPR_64:$src0, callee, i32imm:$fpdiff, execrc:$exec)
->;
+  def : GCNPat<
+    (AMDGPUtc_return_chain i64:$src0, callee, (i32 timm:$fpdiff), execvt:$exec),
+    (tc CCR_SGPR_64:$src0, callee, i32imm:$fpdiff, execrc:$exec)
+  >;
 }
 
 multiclass si_cs_chain_tc_patterns<
@@ -736,6 +742,26 @@ multiclass si_cs_chain_tc_patterns<
 defm : si_cs_chain_tc_patterns<i32>;
 defm : si_cs_chain_tc_patterns<i64>;
 
+// Match dynamic VGPR case. This is always indirect since we choose the callee
+// dynamically based on the result of the VGPR reall...
[truncated]

llvmbot · 2025-03-06T12:45:56Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Diana Picus (rovka)

Changes

The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may indicate that we want to reallocate the VGPRs before performing the call.

A call with the following arguments:

llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args,
  /*flags*/0x1, %num_vgprs, %fallback_exec, %fallback_callee

is supposed to do the following:

copy the SGPR and VGPR args into their respective registers
try to change the VGPR allocation
if the allocation has succeeded, set EXEC to %exec and jump to %callee, otherwise set EXEC to %fallback_exec and jump to %fallback_callee

This patch implements the dynamic VGPR behaviour by generating an S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and callee. The rest of the call sequence is left undisturbed (i.e. identical to the case where the flags are 0 and we don't use dynamic VGPRs). We achieve this by introducing some new pseudos (SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason is so that we don't risk other passes (particularly the PostRA scheduler) introducing instructions between the S_ALLOC_VGPR and the jump. Such instructions might end up using VGPRs that have been deallocated, or the wrong EXEC mask. Once the whole backend treats S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use VGPRs, we could in principle move the expansion earlier (but in the absence of a good reason for that my personal preference is to keep it later in order to make debugging easier).

Since the expansion happens after register allocation, we're careful to select constants to immediate operands instead of letting ISel generate S_MOVs which could interfere with register allocation (i.e. make it look like we need more registers than we actually do).

For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out during ISel in wave64 mode. However, we can define the pseudos for wave64 too so it's easy to handle if future generations support it.

Patch is 94.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130094.diff

11 Files Affected:

(modified) llvm/include/llvm/CodeGen/SelectionDAGISel.h (+15-14)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+9-4)
(modified) llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp (+95-31)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+66-22)
(modified) llvm/lib/Target/AMDGPU/SIInstructions.td (+51-25)
(modified) llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp (+55-7)
(added) llvm/test/CodeGen/AMDGPU/amdgcn-cs-chain-intrinsic-dyn-vgpr-w32.ll (+97)
(modified) llvm/test/CodeGen/AMDGPU/isel-amdgcn-cs-chain-intrinsic-w32.ll (+12-24)
(modified) llvm/test/CodeGen/AMDGPU/isel-amdgcn-cs-chain-intrinsic-w64.ll (+12-24)
(added) llvm/test/CodeGen/AMDGPU/isel-amdgpu-cs-chain-intrinsic-dyn-vgpr-w32.ll (+315)
(added) llvm/test/CodeGen/AMDGPU/remove-register-flags.mir (+19)

diff --git a/llvm/include/llvm/CodeGen/SelectionDAGISel.h b/llvm/include/llvm/CodeGen/SelectionDAGISel.h
index e9452a6dc6233..55f8f19d437a0 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGISel.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGISel.h
@@ -328,20 +328,21 @@ class SelectionDAGISel {
   };
 
   enum {
-    OPFL_None       = 0,  // Node has no chain or glue input and isn't variadic.
-    OPFL_Chain      = 1,     // Node has a chain input.
-    OPFL_GlueInput  = 2,     // Node has a glue input.
-    OPFL_GlueOutput = 4,     // Node has a glue output.
-    OPFL_MemRefs    = 8,     // Node gets accumulated MemRefs.
-    OPFL_Variadic0  = 1<<4,  // Node is variadic, root has 0 fixed inputs.
-    OPFL_Variadic1  = 2<<4,  // Node is variadic, root has 1 fixed inputs.
-    OPFL_Variadic2  = 3<<4,  // Node is variadic, root has 2 fixed inputs.
-    OPFL_Variadic3  = 4<<4,  // Node is variadic, root has 3 fixed inputs.
-    OPFL_Variadic4  = 5<<4,  // Node is variadic, root has 4 fixed inputs.
-    OPFL_Variadic5  = 6<<4,  // Node is variadic, root has 5 fixed inputs.
-    OPFL_Variadic6  = 7<<4,  // Node is variadic, root has 6 fixed inputs.
-
-    OPFL_VariadicInfo = OPFL_Variadic6
+    OPFL_None = 0,       // Node has no chain or glue input and isn't variadic.
+    OPFL_Chain = 1,      // Node has a chain input.
+    OPFL_GlueInput = 2,  // Node has a glue input.
+    OPFL_GlueOutput = 4, // Node has a glue output.
+    OPFL_MemRefs = 8,    // Node gets accumulated MemRefs.
+    OPFL_Variadic0 = 1 << 4, // Node is variadic, root has 0 fixed inputs.
+    OPFL_Variadic1 = 2 << 4, // Node is variadic, root has 1 fixed inputs.
+    OPFL_Variadic2 = 3 << 4, // Node is variadic, root has 2 fixed inputs.
+    OPFL_Variadic3 = 4 << 4, // Node is variadic, root has 3 fixed inputs.
+    OPFL_Variadic4 = 5 << 4, // Node is variadic, root has 4 fixed inputs.
+    OPFL_Variadic5 = 6 << 4, // Node is variadic, root has 5 fixed inputs.
+    OPFL_Variadic6 = 7 << 4, // Node is variadic, root has 6 fixed inputs.
+    OPFL_Variadic7 = 8 << 4, // Node is variadic, root has 7 fixed inputs.
+
+    OPFL_VariadicInfo = 15 << 4 // Mask for extracting the OPFL_VariadicN bits.
   };
 
   /// getNumFixedFromVariadicInfo - Transform an EmitNode flags word into the
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index d5a07e616236e..ba5f7e03d98e4 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -7996,10 +7996,6 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     return;
   }
   case Intrinsic::amdgcn_cs_chain: {
-    assert(I.arg_size() == 5 && "Additional args not supported yet");
-    assert(cast<ConstantInt>(I.getOperand(4))->isZero() &&
-           "Non-zero flags not supported yet");
-
     // At this point we don't care if it's amdgpu_cs_chain or
     // amdgpu_cs_chain_preserve.
     CallingConv::ID CC = CallingConv::AMDGPU_CS_Chain;
@@ -8026,6 +8022,15 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     assert(!Args[1].IsInReg && "VGPR args should not be marked inreg");
     Args[2].IsInReg = true; // EXEC should be inreg
 
+    // Forward the flags and any additional arguments.
+    for (unsigned Idx = 4; Idx < I.arg_size(); ++Idx) {
+      TargetLowering::ArgListEntry Arg;
+      Arg.Node = getValue(I.getOperand(Idx));
+      Arg.Ty = I.getOperand(Idx)->getType();
+      Arg.setAttributes(&I, Idx);
+      Args.push_back(Arg);
+    }
+
     TargetLowering::CallLoweringInfo CLI(DAG);
     CLI.setDebugLoc(getCurSDLoc())
         .setChain(getRoot())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
index 478a4c161fce7..a440617319228 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
@@ -953,8 +953,9 @@ getAssignFnsForCC(CallingConv::ID CC, const SITargetLowering &TLI) {
 }
 
 static unsigned getCallOpcode(const MachineFunction &CallerF, bool IsIndirect,
-                              bool IsTailCall, bool isWave32,
-                              CallingConv::ID CC) {
+                              bool IsTailCall, bool IsWave32,
+                              CallingConv::ID CC,
+                              bool IsDynamicVGPRChainCall = false) {
   // For calls to amdgpu_cs_chain functions, the address is known to be uniform.
   assert((AMDGPU::isChainCC(CC) || !IsIndirect || !IsTailCall) &&
          "Indirect calls can't be tail calls, "
@@ -962,8 +963,12 @@ static unsigned getCallOpcode(const MachineFunction &CallerF, bool IsIndirect,
   if (!IsTailCall)
     return AMDGPU::G_SI_CALL;
 
-  if (AMDGPU::isChainCC(CC))
-    return isWave32 ? AMDGPU::SI_CS_CHAIN_TC_W32 : AMDGPU::SI_CS_CHAIN_TC_W64;
+  if (AMDGPU::isChainCC(CC)) {
+    if (IsDynamicVGPRChainCall)
+      return IsWave32 ? AMDGPU::SI_CS_CHAIN_TC_W32_DVGPR
+                      : AMDGPU::SI_CS_CHAIN_TC_W64_DVGPR;
+    return IsWave32 ? AMDGPU::SI_CS_CHAIN_TC_W32 : AMDGPU::SI_CS_CHAIN_TC_W64;
+  }
 
   return CC == CallingConv::AMDGPU_Gfx ? AMDGPU::SI_TCRETURN_GFX :
                                          AMDGPU::SI_TCRETURN;
@@ -972,7 +977,8 @@ static unsigned getCallOpcode(const MachineFunction &CallerF, bool IsIndirect,
 // Add operands to call instruction to track the callee.
 static bool addCallTargetOperands(MachineInstrBuilder &CallInst,
                                   MachineIRBuilder &MIRBuilder,
-                                  AMDGPUCallLowering::CallLoweringInfo &Info) {
+                                  AMDGPUCallLowering::CallLoweringInfo &Info,
+                                  bool IsDynamicVGPRChainCall = false) {
   if (Info.Callee.isReg()) {
     CallInst.addReg(Info.Callee.getReg());
     CallInst.addImm(0);
@@ -983,7 +989,12 @@ static bool addCallTargetOperands(MachineInstrBuilder &CallInst,
     auto Ptr = MIRBuilder.buildGlobalValue(
       LLT::pointer(GV->getAddressSpace(), 64), GV);
     CallInst.addReg(Ptr.getReg(0));
-    CallInst.add(Info.Callee);
+
+    if (IsDynamicVGPRChainCall)
+      // DynamicVGPR chain calls are always indirect.
+      CallInst.addImm(0);
+    else
+      CallInst.add(Info.Callee);
   } else
     return false;
 
@@ -1177,6 +1188,18 @@ void AMDGPUCallLowering::handleImplicitCallArguments(
   }
 }
 
+namespace {
+// Chain calls have special arguments that we need to handle. These have the
+// same index as they do in the llvm.amdgcn.cs.chain intrinsic.
+enum ChainCallArgIdx {
+  Exec = 1,
+  Flags = 4,
+  NumVGPRs = 5,
+  FallbackExec = 6,
+  FallbackCallee = 7,
+};
+} // anonymous namespace
+
 bool AMDGPUCallLowering::lowerTailCall(
     MachineIRBuilder &MIRBuilder, CallLoweringInfo &Info,
     SmallVectorImpl<ArgInfo> &OutArgs) const {
@@ -1185,6 +1208,8 @@ bool AMDGPUCallLowering::lowerTailCall(
   SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
   const Function &F = MF.getFunction();
   MachineRegisterInfo &MRI = MF.getRegInfo();
+  const SIInstrInfo *TII = ST.getInstrInfo();
+  const SIRegisterInfo *TRI = ST.getRegisterInfo();
   const SITargetLowering &TLI = *getTLI<SITargetLowering>();
 
   // True when we're tail calling, but without -tailcallopt.
@@ -1200,34 +1225,78 @@ bool AMDGPUCallLowering::lowerTailCall(
   if (!IsSibCall)
     CallSeqStart = MIRBuilder.buildInstr(AMDGPU::ADJCALLSTACKUP);
 
-  unsigned Opc =
-      getCallOpcode(MF, Info.Callee.isReg(), true, ST.isWave32(), CalleeCC);
+  bool IsChainCall = AMDGPU::isChainCC(Info.CallConv);
+  bool IsDynamicVGPRChainCall = false;
+
+  if (IsChainCall) {
+    ArgInfo FlagsArg = Info.OrigArgs[ChainCallArgIdx::Flags];
+    const APInt &FlagsValue = cast<ConstantInt>(FlagsArg.OrigValue)->getValue();
+    if (FlagsValue.isZero()) {
+      if (Info.OrigArgs.size() != 5) {
+        LLVM_DEBUG(dbgs() << "No additional args allowed if flags == 0");
+        return false;
+      }
+    } else if (FlagsValue.isOneBitSet(0)) {
+      IsDynamicVGPRChainCall = true;
+
+      if (Info.OrigArgs.size() != 8) {
+        LLVM_DEBUG(dbgs() << "Expected 3 additional args");
+        return false;
+      }
+
+      // On GFX12, we can only change the VGPR allocation for wave32.
+      if (!ST.isWave32()) {
+        LLVM_DEBUG(dbgs() << "Dynamic VGPR mode is only supported for wave32");
+        return false;
+      }
+
+      ArgInfo FallbackExecArg = Info.OrigArgs[ChainCallArgIdx::FallbackExec];
+      assert(FallbackExecArg.Regs.size() == 1 &&
+             "Expected single register for fallback EXEC");
+      if (!FallbackExecArg.Ty->isIntegerTy(ST.getWavefrontSize())) {
+        LLVM_DEBUG(dbgs() << "Bad type for fallback EXEC");
+        return false;
+      }
+    }
+  }
+
+  unsigned Opc = getCallOpcode(MF, Info.Callee.isReg(), /*IsTailCall*/ true,
+                               ST.isWave32(), CalleeCC, IsDynamicVGPRChainCall);
   auto MIB = MIRBuilder.buildInstrNoInsert(Opc);
-  if (!addCallTargetOperands(MIB, MIRBuilder, Info))
+  if (!addCallTargetOperands(MIB, MIRBuilder, Info, IsDynamicVGPRChainCall))
     return false;
 
   // Byte offset for the tail call. When we are sibcalling, this will always
   // be 0.
   MIB.addImm(0);
 
-  // If this is a chain call, we need to pass in the EXEC mask.
-  const SIRegisterInfo *TRI = ST.getRegisterInfo();
-  if (AMDGPU::isChainCC(Info.CallConv)) {
-    ArgInfo ExecArg = Info.OrigArgs[1];
+  // If this is a chain call, we need to pass in the EXEC mask as well as any
+  // other special args.
+  if (IsChainCall) {
+    auto AddRegOrImm = [&](const ArgInfo &Arg) {
+      if (auto CI = dyn_cast<ConstantInt>(Arg.OrigValue)) {
+        MIB.addImm(CI->getSExtValue());
+      } else {
+        MIB.addReg(Arg.Regs[0]);
+        unsigned Idx = MIB->getNumOperands() - 1;
+        MIB->getOperand(Idx).setReg(constrainOperandRegClass(
+            MF, *TRI, MRI, *TII, *ST.getRegBankInfo(), *MIB, MIB->getDesc(),
+            MIB->getOperand(Idx), Idx));
+      }
+    };
+
+    ArgInfo ExecArg = Info.OrigArgs[ChainCallArgIdx::Exec];
     assert(ExecArg.Regs.size() == 1 && "Too many regs for EXEC");
 
-    if (!ExecArg.Ty->isIntegerTy(ST.getWavefrontSize()))
+    if (!ExecArg.Ty->isIntegerTy(ST.getWavefrontSize())) {
+      LLVM_DEBUG(dbgs() << "Bad type for EXEC");
       return false;
-
-    if (const auto *CI = dyn_cast<ConstantInt>(ExecArg.OrigValue)) {
-      MIB.addImm(CI->getSExtValue());
-    } else {
-      MIB.addReg(ExecArg.Regs[0]);
-      unsigned Idx = MIB->getNumOperands() - 1;
-      MIB->getOperand(Idx).setReg(constrainOperandRegClass(
-          MF, *TRI, MRI, *ST.getInstrInfo(), *ST.getRegBankInfo(), *MIB,
-          MIB->getDesc(), MIB->getOperand(Idx), Idx));
     }
+
+    AddRegOrImm(ExecArg);
+    if (IsDynamicVGPRChainCall)
+      std::for_each(Info.OrigArgs.begin() + ChainCallArgIdx::NumVGPRs,
+                    Info.OrigArgs.end(), AddRegOrImm);
   }
 
   // Tell the call which registers are clobbered.
@@ -1329,9 +1398,9 @@ bool AMDGPUCallLowering::lowerTailCall(
   // FIXME: We should define regbankselectable call instructions to handle
   // divergent call targets.
   if (MIB->getOperand(0).isReg()) {
-    MIB->getOperand(0).setReg(constrainOperandRegClass(
-        MF, *TRI, MRI, *ST.getInstrInfo(), *ST.getRegBankInfo(), *MIB,
-        MIB->getDesc(), MIB->getOperand(0), 0));
+    MIB->getOperand(0).setReg(
+        constrainOperandRegClass(MF, *TRI, MRI, *TII, *ST.getRegBankInfo(),
+                                 *MIB, MIB->getDesc(), MIB->getOperand(0), 0));
   }
 
   MF.getFrameInfo().setHasTailCall();
@@ -1345,11 +1414,6 @@ bool AMDGPUCallLowering::lowerChainCall(MachineIRBuilder &MIRBuilder,
   ArgInfo Callee = Info.OrigArgs[0];
   ArgInfo SGPRArgs = Info.OrigArgs[2];
   ArgInfo VGPRArgs = Info.OrigArgs[3];
-  ArgInfo Flags = Info.OrigArgs[4];
-
-  assert(cast<ConstantInt>(Flags.OrigValue)->isZero() &&
-         "Non-zero flags aren't supported yet.");
-  assert(Info.OrigArgs.size() == 5 && "Additional args aren't supported yet.");
 
   MachineFunction &MF = MIRBuilder.getMF();
   const Function &F = MF.getFunction();
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index fe095414e5172..5438c6f50dea2 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -3657,6 +3657,19 @@ bool SITargetLowering::mayBeEmittedAsTailCall(const CallInst *CI) const {
   return true;
 }
 
+namespace {
+// Chain calls have special arguments that we need to handle. These are
+// tagging along at the end of the arguments list(s), after the SGPR and VGPR
+// arguments (index 0 and 1 respectively).
+enum ChainCallArgIdx {
+  Exec = 2,
+  Flags,
+  NumVGPRs,
+  FallbackExec,
+  FallbackCallee
+};
+} // anonymous namespace
+
 // The wave scratch offset register is used as the global base pointer.
 SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
                                     SmallVectorImpl<SDValue> &InVals) const {
@@ -3665,37 +3678,67 @@ SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
 
   SelectionDAG &DAG = CLI.DAG;
 
-  TargetLowering::ArgListEntry RequestedExec;
-  if (IsChainCallConv) {
-    // The last argument should be the value that we need to put in EXEC.
-    // Pop it out of CLI.Outs and CLI.OutVals before we do any processing so we
-    // don't treat it like the rest of the arguments.
-    RequestedExec = CLI.Args.back();
-    assert(RequestedExec.Node && "No node for EXEC");
+  const SDLoc &DL = CLI.DL;
+  SDValue Chain = CLI.Chain;
+  SDValue Callee = CLI.Callee;
 
-    if (!RequestedExec.Ty->isIntegerTy(Subtarget->getWavefrontSize()))
+  llvm::SmallVector<SDValue, 6> ChainCallSpecialArgs;
+  if (IsChainCallConv) {
+    // The last arguments should be the value that we need to put in EXEC,
+    // followed by the flags and any other arguments with special meanings.
+    // Pop them out of CLI.Outs and CLI.OutVals before we do any processing so
+    // we don't treat them like the "real" arguments.
+    auto RequestedExecIt = std::find_if(
+        CLI.Outs.begin(), CLI.Outs.end(),
+        [](const ISD::OutputArg &Arg) { return Arg.OrigArgIndex == 2; });
+    assert(RequestedExecIt != CLI.Outs.end() && "No node for EXEC");
+
+    size_t SpecialArgsBeginIdx = RequestedExecIt - CLI.Outs.begin();
+    CLI.OutVals.erase(CLI.OutVals.begin() + SpecialArgsBeginIdx,
+                      CLI.OutVals.end());
+    CLI.Outs.erase(RequestedExecIt, CLI.Outs.end());
+
+    assert(CLI.Outs.back().OrigArgIndex < 2 &&
+           "Haven't popped all the special args");
+
+    TargetLowering::ArgListEntry RequestedExecArg =
+        CLI.Args[ChainCallArgIdx::Exec];
+    if (!RequestedExecArg.Ty->isIntegerTy(Subtarget->getWavefrontSize()))
       return lowerUnhandledCall(CLI, InVals, "Invalid value for EXEC");
 
-    assert(CLI.Outs.back().OrigArgIndex == 2 && "Unexpected last arg");
-    CLI.Outs.pop_back();
-    CLI.OutVals.pop_back();
+    // Convert constants into TargetConstants, so they become immediate operands
+    // instead of being selected into S_MOV.
+    auto PushNodeOrTargetConstant = [&](TargetLowering::ArgListEntry Arg) {
+      if (auto ArgNode = dyn_cast<ConstantSDNode>(Arg.Node))
+        ChainCallSpecialArgs.push_back(DAG.getTargetConstant(
+            ArgNode->getAPIntValue(), DL, ArgNode->getValueType(0)));
+      else
+        ChainCallSpecialArgs.push_back(Arg.Node);
+    };
 
-    if (RequestedExec.Ty->isIntegerTy(64)) {
-      assert(CLI.Outs.back().OrigArgIndex == 2 && "Exec wasn't split up");
-      CLI.Outs.pop_back();
-      CLI.OutVals.pop_back();
-    }
+    PushNodeOrTargetConstant(RequestedExecArg);
+
+    // Process any other special arguments depending on the value of the flags.
+    TargetLowering::ArgListEntry Flags = CLI.Args[ChainCallArgIdx::Flags];
+
+    const APInt &FlagsValue = cast<ConstantSDNode>(Flags.Node)->getAPIntValue();
+    if (FlagsValue.isZero()) {
+      if (CLI.Args.size() > ChainCallArgIdx::Flags + 1)
+        return lowerUnhandledCall(CLI, InVals,
+                                  "No additional args allowed if flags == 0");
+    } else if (FlagsValue.isOneBitSet(0)) {
+      if (CLI.Args.size() != ChainCallArgIdx::FallbackCallee + 1) {
+        return lowerUnhandledCall(CLI, InVals, "Expected 3 additional args");
+      }
 
-    assert(CLI.Outs.back().OrigArgIndex != 2 &&
-           "Haven't popped all the pieces of the EXEC mask");
+      std::for_each(CLI.Args.begin() + ChainCallArgIdx::NumVGPRs,
+                    CLI.Args.end(), PushNodeOrTargetConstant);
+    }
   }
 
-  const SDLoc &DL = CLI.DL;
   SmallVector<ISD::OutputArg, 32> &Outs = CLI.Outs;
   SmallVector<SDValue, 32> &OutVals = CLI.OutVals;
   SmallVector<ISD::InputArg, 32> &Ins = CLI.Ins;
-  SDValue Chain = CLI.Chain;
-  SDValue Callee = CLI.Callee;
   bool &IsTailCall = CLI.IsTailCall;
   bool IsVarArg = CLI.IsVarArg;
   bool IsSibCall = false;
@@ -3983,7 +4026,8 @@ SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
   }
 
   if (IsChainCallConv)
-    Ops.push_back(RequestedExec.Node);
+    Ops.insert(Ops.end(), ChainCallSpecialArgs.begin(),
+               ChainCallSpecialArgs.end());
 
   // Add argument registers to the end of the list so that they are known live
   // into the call.
diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 63f66023837a2..e5af35f2da686 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -692,36 +692,42 @@ def : GCNPat<
   (SI_TCRETURN_GFX Gfx_CCR_SGPR_64:$src0, (i64 0), i32imm:$fpdiff)
 >;
 
-// Pseudo for the llvm.amdgcn.cs.chain intrinsic.
-// This is essentially a tail call, but it also takes a mask to put in EXEC
-// right before jumping to the callee.
-class SI_CS_CHAIN_TC<
+// Pseudos for the llvm.amdgcn.cs.chain intrinsic.
+multiclass SI_CS_CHAIN_TC<
     ValueType execvt, Predicate wavesizepred,
-    RegisterOperand execrc = getSOPSrcForVT<execvt>.ret>
-    : SPseudoInstSI <(outs),
-      (ins CCR_SGPR_64:$src0, unknown:$callee, i32imm:$fpdiff, execrc:$exec)> {
-  let FixedSize = 0;
-  let isCall = 1;
-  let isTerminator = 1;
-  let isBarrier = 1;
-  let isReturn = 1;
-  let UseNamedOperandTable = 1;
-  let SchedRW = [WriteBranch];
-  let isConvergent = 1;
-
-  let WaveSizePredicate = wavesizepred;
-}
-
-def SI_CS_CHAIN_TC_W32 : SI_CS_CHAIN_TC<i32, isWave32>;
-def SI_CS_CHAIN_TC_W64 : SI_CS_CHAIN_TC<i64, isWave64>;
+    RegisterOperand execrc = getSOPSrcForVT<execvt>.ret> {
+  let FixedSize = 0,
+      isCall = 1,
+      isTerminator = 1,
+      isBarrier = 1,
+      isReturn = 1,
+      UseNamedOperandTable = 1,
+      SchedRW = [WriteBranch],
+      isConvergent = 1,
+      WaveSizePredicate = wavesizepred in {
+  // This is essentially a tail call, but it also takes a mask to put in EXEC
+  // right before jumping to the callee.
+  def NAME: SPseudoInstSI <(outs),
+      (ins CCR_SGPR_64:$src0, unknown:$callee, i32imm:$fpdiff, execrc:$exec)>;
+
+  // Same as above, but it will first try to reallocate the VGPRs, and choose an
+  // EXEC mask and a callee depending on the success of the reallocation attempt.
+  def _DVGPR : SPseudoInstSI <(outs),
+      (ins CCR_SGPR_64:$src0, i64imm:$callee, i32imm:$fpdiff, execrc:$exec,
+           SSrc_b32:$numvgprs, execrc:$fbexec, CCR_SGPR_64:$fbcallee)>;
+  } // End FixedSize = 0 etc
+}
+
+defm SI_CS_CHAIN_TC_W32 : SI_CS_CHAIN_TC<i32, isWave32>;
+defm SI_CS_CHAIN_TC_W64 : SI_CS_CHAIN_TC<i64, isWave64>;
 
 // Handle selecting direct & indirect calls via SI_CS_CHAIN_TC_W32/64
 multiclass si_cs_chain_tc_pattern<
   dag callee, ValueType execvt, RegisterOperand execrc, Instruction tc> {
-def : GCNPat<
-  (AMDGPUtc_return_chain i64:$src0, callee, (i32 timm:$fpdiff), execvt:$exec),
-  (tc CCR_SGPR_64:$src0, callee, i32imm:$fpdiff, execrc:$exec)
->;
+  def : GCNPat<
+    (AMDGPUtc_return_chain i64:$src0, callee, (i32 timm:$fpdiff), execvt:$exec),
+    (tc CCR_SGPR_64:$src0, callee, i32imm:$fpdiff, execrc:$exec)
+  >;
 }
 
 multiclass si_cs_chain_tc_patterns<
@@ -736,6 +742,26 @@ multiclass si_cs_chain_tc_patterns<
 defm : si_cs_chain_tc_patterns<i32>;
 defm : si_cs_chain_tc_patterns<i64>;
 
+// Match dynamic VGPR case. This is always indirect since we choose the callee
+// dynamically based on the result of the VGPR reall...
[truncated]

rovka · 2025-03-10T04:33:59Z

Ping?

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

arsenm · 2025-03-10T05:07:36Z

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

+      if (!FallbackExecArg.Ty->isIntegerTy(ST.getWavefrontSize())) {
+        LLVM_DEBUG(dbgs() << "Bad type for fallback EXEC");
+        return false;
+      }


For consistency with other contexts, should this accept a wave64 signature when executing wave32, with implicit truncate to i32?

Ok, but also for consistency we'd probably want to do that for all chain calls, not just when dynamic VGPRs are in use. I'll send a separate patch for that.

llvm/test/CodeGen/AMDGPU/amdgcn-cs-chain-intrinsic-dyn-vgpr-w32.ll

llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp

arsenm · 2025-03-10T05:10:28Z

llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp

+  if (Op.isReg())
+    MIB.addReg(Op.getReg());
+  else
+    MIB->addOperand(Op);


Don't understand this, the addOperand always works? MachineInstrBuilder:add is also the same as MIB->addOperand

This is because the same register may be used twice, and if it has a killed flag then that might get copied to an "earlier" instruction. This is what remove-register-flags.mir is testing. For now I renamed the helper, I should probably add a comment too...

Comment added. Hope it's clearer now.

arsenm · 2025-03-10T05:11:16Z

llvm/lib/Target/AMDGPU/SIInstructions.td

+multiclass si_cs_chain_tc_dvgpr_patterns<
+  ValueType execvt, RegisterOperand execrc = getSOPSrcForVT<execvt>.ret,
+  Instruction tc = SI_CS_CHAIN_TC_W32_DVGPR> {
+  let AddedComplexity = 90 in {


Why does this need AddedComplexity?

To be honest, I wrote this >1 year ago, so I don't really remember why I chose to do it this way. I reworked it to use a different ISD node for dynamic VGPRs, I hope it's cleaner this way.

Co-authored-by: Matt Arsenault <[email protected]>

github-actions · 2025-03-10T13:15:19Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Co-authored-by: Matt Arsenault <[email protected]>

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp

Co-authored-by: Matt Arsenault <[email protected]>

…0094

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

Co-authored-by: Matt Arsenault <[email protected]>

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

llvm/test/CodeGen/AMDGPU/unsupported-cs-chain.ll

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

arsenm · 2025-03-18T02:13:51Z

llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp

+  if (Op.isReg())
+    MIB.addReg(Op.getReg());
+  else
+    MIB.add(Op);


There's probably a cleaner way to do this, but we should really finish the removal of kill flags

Co-authored-by: Matt Arsenault <[email protected]>

llvm-ci · 2025-03-20T07:58:37Z

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux running on sanitizer-buildbot8 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/12893

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[5336/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o
[5337/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
[5338/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetTransformInfo.cpp.o
[5339/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreEmitPeephole.cpp.o
[5340/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFormMemoryClauses.cpp.o
[5341/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[5342/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIProgramInfo.cpp.o
[5343/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeExecMasking.cpp.o
[5344/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelLowering.cpp.o
[5345/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp:173:64: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
  173 |   for (unsigned OpIdx = MI.getNumExplicitOperands() - 1; OpIdx >= ExecIdx;
      |                                                          ~~~~~ ^  ~~~~~~~
1 error generated.
[5346/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMachineScheduler.cpp.o
[5347/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIShrinkInstructions.cpp.o
[5348/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o
[5349/5460] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUBaseInfo.cpp.o
[5350/5460] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCExpr.cpp.o
[5351/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFoldOperands.cpp.o
[5352/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertWaitcnts.cpp.o
[5353/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o
[5354/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFrameLowering.cpp.o
[5355/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
[5356/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelDAGToDAG.cpp.o
[5357/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeVGPRLiveRange.cpp.o
[5358/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMachineFunctionInfo.cpp.o
[5359/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
[5360/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInstrInfo.cpp.o
[5361/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o
[5362/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o
[5363/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetMachine.cpp.o
[5364/5460] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[5365/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o
[5366/5460] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCCodeEmitter.cpp.o
[5367/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIISelLowering.cpp.o
[5368/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetMachine.cpp.o
[5369/5460] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
[5370/5460] Building CXX object lib/Target/AMDGPU/Disassembler/CMakeFiles/LLVMAMDGPUDisassembler.dir/AMDGPUDisassembler.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild


@@@STEP_FAILURE@@@

@@@STEP_FAILURE@@@
Step 8 (build compiler-rt symbolizer) failure: build compiler-rt symbolizer (failure)
...
[5336/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPULegalizerInfo.cpp.o
[5337/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelDAGToDAG.cpp.o
[5338/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetTransformInfo.cpp.o
[5339/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreEmitPeephole.cpp.o
[5340/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFormMemoryClauses.cpp.o
[5341/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUISelLowering.cpp.o
[5342/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIProgramInfo.cpp.o
[5343/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeExecMasking.cpp.o
[5344/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelLowering.cpp.o
[5345/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp:173:64: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
  173 |   for (unsigned OpIdx = MI.getNumExplicitOperands() - 1; OpIdx >= ExecIdx;
      |                                                          ~~~~~ ^  ~~~~~~~
1 error generated.
[5346/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMachineScheduler.cpp.o
[5347/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIShrinkInstructions.cpp.o
[5348/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILowerSGPRSpills.cpp.o
[5349/5460] Building CXX object lib/Target/AMDGPU/Utils/CMakeFiles/LLVMAMDGPUUtils.dir/AMDGPUBaseInfo.cpp.o
[5350/5460] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCExpr.cpp.o
[5351/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFoldOperands.cpp.o
[5352/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInsertWaitcnts.cpp.o
[5353/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o
[5354/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFrameLowering.cpp.o
[5355/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIAnnotateControlFlow.cpp.o
[5356/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600ISelDAGToDAG.cpp.o
[5357/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIOptimizeVGPRLiveRange.cpp.o
[5358/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIMachineFunctionInfo.cpp.o
[5359/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o
[5360/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIInstrInfo.cpp.o
[5361/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIWholeQuadMode.cpp.o
[5362/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIPreAllocateWWMRegs.cpp.o
[5363/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/R600TargetMachine.cpp.o
[5364/5460] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCTargetDesc.cpp.o
[5365/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIRegisterInfo.cpp.o
[5366/5460] Building CXX object lib/Target/AMDGPU/MCTargetDesc/CMakeFiles/LLVMAMDGPUDesc.dir/AMDGPUMCCodeEmitter.cpp.o
[5367/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIISelLowering.cpp.o
[5368/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetMachine.cpp.o
[5369/5460] Building CXX object lib/Target/AMDGPU/AsmParser/CMakeFiles/LLVMAMDGPUAsmParser.dir/AMDGPUAsmParser.cpp.o
[5370/5460] Building CXX object lib/Target/AMDGPU/Disassembler/CMakeFiles/LLVMAMDGPUDisassembler.dir/AMDGPUDisassembler.cpp.o
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild




Step 9 (test compiler-rt symbolizer) failure: test compiler-rt symbolizer (failure)
...
[2/41] Linking CXX static library lib/libLLVMAMDGPUDesc.a
[3/41] Linking CXX static library lib/libLLVMAMDGPUDisassembler.a
[4/41] Linking CXX static library lib/libLLVMAMDGPUAsmParser.a
[5/41] Linking CXX executable bin/sancov
[6/41] Linking CXX executable bin/llvm-nm
[7/41] Linking CXX executable bin/llvm-ar
[8/41] Generating ../../bin/llvm-ranlib
[9/41] Linking CXX executable bin/llvm-objdump
[10/41] Linking CXX executable bin/llvm-jitlink
[11/41] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp:173:64: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
  173 |   for (unsigned OpIdx = MI.getNumExplicitOperands() - 1; OpIdx >= ExecIdx;
      |                                                          ~~~~~ ^  ~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild




Step 10 (build compiler-rt debug) failure: build compiler-rt debug (failure)
...
[5385/5460] Generating ../../bin/llvm-ml64
[5386/5460] Linking CXX executable bin/llvm-mca
[5387/5460] Linking CXX executable bin/llvm-nm
[5388/5460] Linking CXX executable bin/llvm-jitlink
[5389/5460] Linking CXX executable bin/llvm-rtdyld
[5390/5460] Linking CXX executable bin/llvm-objdump
[5391/5460] Linking CXX executable bin/llvm-profgen
[5392/5460] Generating ../../bin/llvm-otool
[5393/5460] Linking CXX executable bin/sancov
[5394/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp:173:64: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
  173 |   for (unsigned OpIdx = MI.getNumExplicitOperands() - 1; OpIdx >= ExecIdx;
      |                                                          ~~~~~ ^  ~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild




Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
@@@BUILD_STEP test compiler-rt debug@@@
ninja: Entering directory `build_default'
[1/31] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp:173:64: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
  173 |   for (unsigned OpIdx = MI.getNumExplicitOperands() - 1; OpIdx >= ExecIdx;
      |                                                          ~~~~~ ^  ~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild




Step 12 (build compiler-rt tsan_debug) failure: build compiler-rt tsan_debug (failure)
...
[5366/5441] Linking CXX executable bin/llvm-ml
[5367/5441] Generating ../../bin/llvm-ml64
[5368/5441] Linking CXX executable bin/llvm-objdump
[5369/5441] Generating ../../bin/llvm-otool
[5370/5441] Linking CXX executable bin/llvm-jitlink
[5371/5441] Linking CXX executable bin/llvm-rtdyld
[5372/5441] Linking CXX executable bin/llvm-profgen
[5373/5441] Linking CXX executable bin/sancov
[5374/5441] Linking CXX executable bin/llvm-nm
[5375/5441] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp:173:64: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
  173 |   for (unsigned OpIdx = MI.getNumExplicitOperands() - 1; OpIdx >= ExecIdx;
      |                                                          ~~~~~ ^  ~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild




Step 13 (build compiler-rt default) failure: build compiler-rt default (failure)
...
[5385/5460] Generating ../../bin/llvm-ml64
[5386/5460] Linking CXX executable bin/llvm-mca
[5387/5460] Linking CXX executable bin/llvm-jitlink
[5388/5460] Linking CXX executable bin/llvm-nm
[5389/5460] Linking CXX executable bin/llvm-objdump
[5390/5460] Generating ../../bin/llvm-otool
[5391/5460] Linking CXX executable bin/llvm-rtdyld
[5392/5460] Linking CXX executable bin/sancov
[5393/5460] Linking CXX executable bin/llvm-profgen
[5394/5460] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp:173:64: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
  173 |   for (unsigned OpIdx = MI.getNumExplicitOperands() - 1; OpIdx >= ExecIdx;
      |                                                          ~~~~~ ^  ~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild




Step 14 (test compiler-rt default) failure: test compiler-rt default (failure)
@@@BUILD_STEP test compiler-rt default@@@
ninja: Entering directory `build_default'
[1/31] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SILateBranchLowering.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp:173:64: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]
  173 |   for (unsigned OpIdx = MI.getNumExplicitOperands() - 1; OpIdx >= ExecIdx;
      |                                                          ~~~~~ ^  ~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild




Step 15 (build standalone compiler-rt) failure: build standalone compiler-rt (failure)
...
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
-- The ASM compiler identification is unknown
-- Didn't find assembler
CMake Error at CMakeLists.txt:22 (project):
  The CMAKE_C_COMPILER:

    /home/b/sanitizer-aarch64-linux/build/build_default/bin/clang

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
  the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:22 (project):
  The CMAKE_CXX_COMPILER:

    /home/b/sanitizer-aarch64-linux/build/build_default/bin/clang++

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:22 (project):
  No CMAKE_ASM_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "ASM" or the CMake cache entry CMAKE_ASM_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.
-- Warning: Did not find file Compiler/-ASM
-- Configuring incomplete, errors occurred!

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild




ninja: Entering directory `compiler_rt_build'
ninja: error: loading 'build.ninja': No such file or directory

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild




Step 16 (test standalone compiler-rt) failure: test standalone compiler-rt (failure)
@@@BUILD_STEP test standalone compiler-rt@@@
ninja: Entering directory `compiler_rt_build'
ninja: error: loading 'build.ninja': No such file or directory

How to reproduce locally: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

kazutakahirata · 2025-03-20T08:05:17Z

@rovka I've landed 14006c8 to fix a warning from this PR. Thanks!

rovka · 2025-03-20T08:05:50Z

@rovka I've landed 14006c8 to fix a warning from this PR. Thanks!

Thanks @kazutakahirata, you beat me to it!

jasilvanus and others added 4 commits March 6, 2025 09:57

[AMDGPU] Add SubtargetFeature for dynamic VGPR mode

b2a7bdc

This represents a hardware mode supported only for wave32 compute shaders. When enabled, we set the `.dynamic_vgpr_en` field of `.compute_registers` to true in the PAL metadata.

rovka requested review from jayfoad, arsenm, ruiling and mbrkusanin March 6, 2025 12:45

llvmbot added backend:AMDGPU llvm:SelectionDAG SelectionDAGISel as well labels Mar 6, 2025

arsenm reviewed Mar 10, 2025

View reviewed changes

rovka and others added 4 commits March 10, 2025 14:09

Update llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

84fe000

Co-authored-by: Matt Arsenault <[email protected]>

Update llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

21758f3

Co-authored-by: Matt Arsenault <[email protected]>

Update llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

1a99cab

Co-authored-by: Matt Arsenault <[email protected]>

Update llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp

296a9db

Co-authored-by: Matt Arsenault <[email protected]>

rovka added 3 commits March 10, 2025 14:34

Remove wave size mattr from tests

eb9955e

Diagnose unsupported

7decd1d

debug loc & s/unsigned/int

e063564

rovka and others added 5 commits March 11, 2025 10:54

Fix tablegen indent

51d1111

Explain removal of op flags

9ea5a9c

Use specific ISD node for dvgpr case

cd932e6

Update comment in llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp

8e17cbe

Co-authored-by: Matt Arsenault <[email protected]>

Fixup formatting

776eb73

arsenm reviewed Mar 12, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp Outdated Show resolved Hide resolved

Update error msg in llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

c7a45e6

Co-authored-by: Matt Arsenault <[email protected]>

rovka and others added 2 commits March 12, 2025 09:03

Update err msg in llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

6f26c9f

Co-authored-by: Matt Arsenault <[email protected]>

s/addOperand/add llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp

a30443c

Co-authored-by: Matt Arsenault <[email protected]>

qiaojbao added a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Mar 13, 2025

[AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain llvm#130094

cbadf87

qiaojbao added a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Mar 13, 2025

Update [AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain llvm#13…

833fdac

…0094

arsenm reviewed Mar 13, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp Outdated Show resolved Hide resolved

rovka and others added 2 commits March 13, 2025 09:42

Update dbg string llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

f685823

Co-authored-by: Matt Arsenault <[email protected]>

Test for unsupported dgvpr on w64

1d610fd

mbrkusanin reviewed Mar 14, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp Show resolved Hide resolved

arsenm reviewed Mar 14, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/unsupported-cs-chain.ll Outdated Show resolved Hide resolved

Capitalization

b848934

arsenm approved these changes Mar 18, 2025

View reviewed changes

rovka force-pushed the users/rovka/dvgpr-3 branch from 3d09c94 to 6b7d174 Compare March 18, 2025 11:27

Base automatically changed from users/rovka/dvgpr-3 to main March 19, 2025 08:00

rovka and others added 2 commits March 19, 2025 10:33

Style issues in llvm/lib/Target/AMDGPU/SIISelLowering.cpp

d6c4844

Co-authored-by: Matt Arsenault <[email protected]>

Merge branch 'main' into users/rovka/dvgpr-4

586679e

rovka merged commit e17b3cd into main Mar 20, 2025
11 checks passed

rovka deleted the users/rovka/dvgpr-4 branch March 20, 2025 07:38

[AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain #130094

[AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain #130094

Uh oh!

Conversation

rovka commented Mar 6, 2025

Uh oh!

llvmbot commented Mar 6, 2025

Uh oh!

llvmbot commented Mar 6, 2025

Uh oh!

rovka commented Mar 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Mar 20, 2025

Uh oh!

kazutakahirata commented Mar 20, 2025

Uh oh!

rovka commented Mar 20, 2025

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2025 •

edited

Loading