Skip to content

[AMDGPU] Promote uniform ops to I32 in DAGISel #106383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 19, 2024

Conversation

Pierre-vh
Copy link
Contributor

@Pierre-vh Pierre-vh commented Aug 28, 2024

Promote uniform binops, selects and setcc between 2 and 16 bits to 32 bits in DAGISel

Solves #64591

@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2024

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Pierre van Houtryve (Pierre-vh)

Changes

See #106382 for NFC test updates.

Promote uniform binops, selects and setcc in Global & DAGISel instead of CGP.

Solves #64591


Patch is 1.35 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106383.diff

88 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+1-1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+10-9)
  • (modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+6-4)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp (+4-4)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCombine.td (+27-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+33-2)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp (+113)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+149-7)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (+1-1)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+2-1)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.h (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/add.v2i16.ll (+33-37)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll (+60-54)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll (+100-63)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll (+72-48)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll (+78-52)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll (+442-412)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll (+107-42)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll (+15-62)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll (+60-54)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sext_inreg.ll (+68-101)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/shl-ext-reduce.ll (+6-4)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll (+49-39)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sub.v2i16.ll (+25-29)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll (+4-22)
  • (modified) llvm/test/CodeGen/AMDGPU/add.v2i16.ll (+11-11)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll (+3-4)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+2-650)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/anyext.ll (+2-6)
  • (modified) llvm/test/CodeGen/AMDGPU/bitreverse.ll (+2-5)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/bug-sdag-emitcopyfromreg.ll (+2-62)
  • (modified) llvm/test/CodeGen/AMDGPU/calling-conventions.ll (+900-839)
  • (modified) llvm/test/CodeGen/AMDGPU/cgp-bitfield-extract.ll (+4-7)
  • (modified) llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/ctlz.ll (+5-21)
  • (modified) llvm/test/CodeGen/AMDGPU/ctlz_zero_undef.ll (+14-11)
  • (modified) llvm/test/CodeGen/AMDGPU/cttz.ll (+3-11)
  • (modified) llvm/test/CodeGen/AMDGPU/cttz_zero_undef.ll (+14-24)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-select.ll (+2-3)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_dynelt.ll (+1010-309)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_elt-i8.ll (+539-119)
  • (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll (+51-50)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg.ll (+3-10)
  • (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll (-532)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll (+21-12)
  • (modified) llvm/test/CodeGen/AMDGPU/idiv-licm.ll (+235-228)
  • (modified) llvm/test/CodeGen/AMDGPU/imm16.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-49)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+910-993)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll (+74-86)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.bf16.ll (+20-20)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i1.ll (+3212-3431)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i8.ll (+2431-2404)
  • (modified) llvm/test/CodeGen/AMDGPU/load-global-i8.ll (+5-10)
  • (modified) llvm/test/CodeGen/AMDGPU/load-local-i8.ll (+5-10)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-lds-struct-aa-memcpy.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/lshr.v2i16.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/min.ll (+225-170)
  • (modified) llvm/test/CodeGen/AMDGPU/mul.ll (+27-24)
  • (modified) llvm/test/CodeGen/AMDGPU/permute_i8.ll (+44-29)
  • (modified) llvm/test/CodeGen/AMDGPU/preload-kernargs.ll (+108-119)
  • (modified) llvm/test/CodeGen/AMDGPU/scalar_to_vector.ll (+21-14)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+100-99)
  • (modified) llvm/test/CodeGen/AMDGPU/select-i1.ll (+4-9)
  • (modified) llvm/test/CodeGen/AMDGPU/select-vectors.ll (+2-3)
  • (modified) llvm/test/CodeGen/AMDGPU/setcc-opt.ll (+5-12)
  • (modified) llvm/test/CodeGen/AMDGPU/sext-in-reg.ll (+4-10)
  • (modified) llvm/test/CodeGen/AMDGPU/shl.ll (+3-2)
  • (modified) llvm/test/CodeGen/AMDGPU/shl.v2i16.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sign_extend.ll (+9-10)
  • (modified) llvm/test/CodeGen/AMDGPU/smed3.ll (+17-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sminmax.v2i16.ll (+1013-83)
  • (modified) llvm/test/CodeGen/AMDGPU/sra.ll (+40-40)
  • (modified) llvm/test/CodeGen/AMDGPU/srem.ll (+19-17)
  • (modified) llvm/test/CodeGen/AMDGPU/sub.v2i16.ll (+16-18)
  • (modified) llvm/test/CodeGen/AMDGPU/trunc-combine.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/trunc-store.ll (+80-56)
  • (modified) llvm/test/CodeGen/AMDGPU/uaddo.ll (+9-6)
  • (modified) llvm/test/CodeGen/AMDGPU/usubo.ll (+9-6)
  • (modified) llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll (+11-10)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-alloca-bitcast.ll (+1-2)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+4-2)
  • (modified) llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll (+35-26)
  • (modified) llvm/test/CodeGen/AMDGPU/zero_extend.ll (+6-5)
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index eda38cd8a564d6..85310a4911b8ed 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -3299,7 +3299,7 @@ class TargetLoweringBase {
   /// Return true if it's profitable to narrow operations of type SrcVT to
   /// DestVT. e.g. on x86, it's profitable to narrow from i32 to i8 but not from
   /// i32 to i16.
-  virtual bool isNarrowingProfitable(EVT SrcVT, EVT DestVT) const {
+  virtual bool isNarrowingProfitable(SDNode *N, EVT SrcVT, EVT DestVT) const {
     return false;
   }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index b0a906743f29ff..513ad392cb360a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -7031,7 +7031,7 @@ SDValue DAGCombiner::visitAND(SDNode *N) {
     if (N1C->getAPIntValue().countLeadingZeros() >= (BitWidth - SrcBitWidth) &&
         TLI.isTruncateFree(VT, SrcVT) && TLI.isZExtFree(SrcVT, VT) &&
         TLI.isTypeDesirableForOp(ISD::AND, SrcVT) &&
-        TLI.isNarrowingProfitable(VT, SrcVT))
+        TLI.isNarrowingProfitable(N, VT, SrcVT))
       return DAG.getNode(ISD::ZERO_EXTEND, DL, VT,
                          DAG.getNode(ISD::AND, DL, SrcVT, N0Op0,
                                      DAG.getZExtOrTrunc(N1, DL, SrcVT)));
@@ -14574,7 +14574,7 @@ SDValue DAGCombiner::reduceLoadWidth(SDNode *N) {
   // ShLeftAmt will indicate how much a narrowed load should be shifted left.
   unsigned ShLeftAmt = 0;
   if (ShAmt == 0 && N0.getOpcode() == ISD::SHL && N0.hasOneUse() &&
-      ExtVT == VT && TLI.isNarrowingProfitable(N0.getValueType(), VT)) {
+      ExtVT == VT && TLI.isNarrowingProfitable(N, N0.getValueType(), VT)) {
     if (ConstantSDNode *N01 = dyn_cast<ConstantSDNode>(N0.getOperand(1))) {
       ShLeftAmt = N01->getZExtValue();
       N0 = N0.getOperand(0);
@@ -15118,9 +15118,11 @@ SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
   }
 
   // trunc (select c, a, b) -> select c, (trunc a), (trunc b)
-  if (N0.getOpcode() == ISD::SELECT && N0.hasOneUse()) {
-    if ((!LegalOperations || TLI.isOperationLegal(ISD::SELECT, SrcVT)) &&
-        TLI.isTruncateFree(SrcVT, VT)) {
+  if (N0.getOpcode() == ISD::SELECT && N0.hasOneUse() &&
+      TLI.isTruncateFree(SrcVT, VT)) {
+    if (!LegalOperations ||
+        (TLI.isOperationLegal(ISD::SELECT, SrcVT) &&
+         TLI.isNarrowingProfitable(N0.getNode(), N0.getValueType(), VT))) {
       SDLoc SL(N0);
       SDValue Cond = N0.getOperand(0);
       SDValue TruncOp0 = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(1));
@@ -20061,10 +20063,9 @@ SDValue DAGCombiner::ReduceLoadOpStoreWidth(SDNode *N) {
     EVT NewVT = EVT::getIntegerVT(*DAG.getContext(), NewBW);
     // The narrowing should be profitable, the load/store operation should be
     // legal (or custom) and the store size should be equal to the NewVT width.
-    while (NewBW < BitWidth &&
-           (NewVT.getStoreSizeInBits() != NewBW ||
-            !TLI.isOperationLegalOrCustom(Opc, NewVT) ||
-            !TLI.isNarrowingProfitable(VT, NewVT))) {
+    while (NewBW < BitWidth && (NewVT.getStoreSizeInBits() != NewBW ||
+                                !TLI.isOperationLegalOrCustom(Opc, NewVT) ||
+                                !TLI.isNarrowingProfitable(N, VT, NewVT))) {
       NewBW = NextPowerOf2(NewBW);
       NewVT = EVT::getIntegerVT(*DAG.getContext(), NewBW);
     }
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4e796289cff0a1..97e10b3551db1a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1841,7 +1841,7 @@ bool TargetLowering::SimplifyDemandedBits(
         for (unsigned SmallVTBits = llvm::bit_ceil(DemandedSize);
              SmallVTBits < BitWidth; SmallVTBits = NextPowerOf2(SmallVTBits)) {
           EVT SmallVT = EVT::getIntegerVT(*TLO.DAG.getContext(), SmallVTBits);
-          if (isNarrowingProfitable(VT, SmallVT) &&
+          if (isNarrowingProfitable(Op.getNode(), VT, SmallVT) &&
               isTypeDesirableForOp(ISD::SHL, SmallVT) &&
               isTruncateFree(VT, SmallVT) && isZExtFree(SmallVT, VT) &&
               (!TLO.LegalOperations() || isOperationLegal(ISD::SHL, SmallVT))) {
@@ -1865,7 +1865,7 @@ bool TargetLowering::SimplifyDemandedBits(
       if ((BitWidth % 2) == 0 && !VT.isVector() && ShAmt < HalfWidth &&
           DemandedBits.countLeadingOnes() >= HalfWidth) {
         EVT HalfVT = EVT::getIntegerVT(*TLO.DAG.getContext(), HalfWidth);
-        if (isNarrowingProfitable(VT, HalfVT) &&
+        if (isNarrowingProfitable(Op.getNode(), VT, HalfVT) &&
             isTypeDesirableForOp(ISD::SHL, HalfVT) &&
             isTruncateFree(VT, HalfVT) && isZExtFree(HalfVT, VT) &&
             (!TLO.LegalOperations() || isOperationLegal(ISD::SHL, HalfVT))) {
@@ -1984,7 +1984,7 @@ bool TargetLowering::SimplifyDemandedBits(
       if ((BitWidth % 2) == 0 && !VT.isVector()) {
         APInt HiBits = APInt::getHighBitsSet(BitWidth, BitWidth / 2);
         EVT HalfVT = EVT::getIntegerVT(*TLO.DAG.getContext(), BitWidth / 2);
-        if (isNarrowingProfitable(VT, HalfVT) &&
+        if (isNarrowingProfitable(Op.getNode(), VT, HalfVT) &&
             isTypeDesirableForOp(ISD::SRL, HalfVT) &&
             isTruncateFree(VT, HalfVT) && isZExtFree(HalfVT, VT) &&
             (!TLO.LegalOperations() || isOperationLegal(ISD::SRL, HalfVT)) &&
@@ -4762,9 +4762,11 @@ SDValue TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
       case ISD::SETULT:
       case ISD::SETULE: {
         EVT newVT = N0.getOperand(0).getValueType();
+        // FIXME: Should use isNarrowingProfitable.
         if (DCI.isBeforeLegalizeOps() ||
             (isOperationLegal(ISD::SETCC, newVT) &&
-             isCondCodeLegal(Cond, newVT.getSimpleVT()))) {
+             isCondCodeLegal(Cond, newVT.getSimpleVT()) &&
+             isTypeDesirableForOp(ISD::SETCC, newVT))) {
           EVT NewSetCCVT = getSetCCResultType(Layout, *DAG.getContext(), newVT);
           SDValue NewConst = DAG.getConstant(C1.trunc(InSize), dl, newVT);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
index 052e1140533f3f..f689fcf62fe8eb 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
@@ -46,10 +46,10 @@ static cl::opt<bool> WidenLoads(
   cl::init(false));
 
 static cl::opt<bool> Widen16BitOps(
-  "amdgpu-codegenprepare-widen-16-bit-ops",
-  cl::desc("Widen uniform 16-bit instructions to 32-bit in AMDGPUCodeGenPrepare"),
-  cl::ReallyHidden,
-  cl::init(true));
+    "amdgpu-codegenprepare-widen-16-bit-ops",
+    cl::desc(
+        "Widen uniform 16-bit instructions to 32-bit in AMDGPUCodeGenPrepare"),
+    cl::ReallyHidden, cl::init(false));
 
 static cl::opt<bool>
     BreakLargePHIs("amdgpu-codegenprepare-break-large-phis",
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index b2a3f9392157d1..01e96159babd03 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -145,6 +145,31 @@ def expand_promoted_fmed3 : GICombineRule<
 
 } // End Predicates = [NotHasMed3_16]
 
+def promote_i16_uniform_binops_frag : GICombinePatFrag<
+  (outs root:$dst), (ins),
+  !foreach(op, [G_ADD, G_SUB, G_SHL, G_ASHR, G_LSHR, G_AND, G_XOR, G_OR, G_MUL],
+          (pattern (op i16:$dst, i16:$lhs, i16:$rhs)))>;
+
+def promote_i16_uniform_binops : GICombineRule<
+  (defs root:$dst),
+  (match (promote_i16_uniform_binops_frag i16:$dst):$mi,
+    [{ return matchPromote16to32(*${mi}); }]),
+  (apply [{ applyPromote16to32(*${mi}); }])
+>;
+
+def promote_i16_uniform_ternary_frag : GICombinePatFrag<
+  (outs root:$dst), (ins),
+  !foreach(op, [G_ICMP, G_SELECT],
+          (pattern (op i16:$dst, $first, i16:$lhs, i16:$rhs)))>;
+
+def promote_i16_uniform_ternary : GICombineRule<
+  (defs root:$dst),
+  (match (promote_i16_uniform_ternary_frag i16:$dst):$mi,
+    [{ return matchPromote16to32(*${mi}); }]),
+  (apply [{ applyPromote16to32(*${mi}); }])
+>;
+
+
 // Combines which should only apply on SI/CI
 def gfx6gfx7_combines : GICombineGroup<[fcmp_select_to_fmin_fmax_legacy]>;
 
@@ -169,5 +194,6 @@ def AMDGPURegBankCombiner : GICombiner<
   "AMDGPURegBankCombinerImpl",
   [unmerge_merge, unmerge_cst, unmerge_undef,
    zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
-   fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp]> {
+   fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
+   promote_i16_uniform_binops, promote_i16_uniform_ternary]> {
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 96143d688801aa..1a596cc80c0c9c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -1017,14 +1017,45 @@ bool AMDGPUTargetLowering::isZExtFree(EVT Src, EVT Dest) const {
   return Src == MVT::i32 && Dest == MVT::i64;
 }
 
-bool AMDGPUTargetLowering::isNarrowingProfitable(EVT SrcVT, EVT DestVT) const {
+bool AMDGPUTargetLowering::isNarrowingProfitable(SDNode *N, EVT SrcVT,
+                                                 EVT DestVT) const {
+  switch (N->getOpcode()) {
+  case ISD::ADD:
+  case ISD::SUB:
+  case ISD::SHL:
+  case ISD::SRL:
+  case ISD::SRA:
+  case ISD::AND:
+  case ISD::OR:
+  case ISD::XOR:
+  case ISD::MUL:
+  case ISD::SETCC:
+  case ISD::SELECT:
+    if (Subtarget->has16BitInsts() &&
+        (DestVT.isVector() ? !Subtarget->hasVOP3PInsts() : true)) {
+      // Don't narrow back down to i16 if promoted to i32 already.
+      if (!N->isDivergent() && DestVT.isInteger() &&
+          DestVT.getScalarSizeInBits() > 1 &&
+          DestVT.getScalarSizeInBits() <= 16 &&
+          SrcVT.getScalarSizeInBits() > 16) {
+        return false;
+      }
+    }
+    return true;
+  default:
+    break;
+  }
+
   // There aren't really 64-bit registers, but pairs of 32-bit ones and only a
   // limited number of native 64-bit operations. Shrinking an operation to fit
   // in a single 32-bit register should always be helpful. As currently used,
   // this is much less general than the name suggests, and is only used in
   // places trying to reduce the sizes of loads. Shrinking loads to < 32-bits is
   // not profitable, and may actually be harmful.
-  return SrcVT.getSizeInBits() > 32 && DestVT.getSizeInBits() == 32;
+  if (isa<LoadSDNode>(N))
+    return SrcVT.getSizeInBits() > 32 && DestVT.getSizeInBits() == 32;
+
+  return true;
 }
 
 bool AMDGPUTargetLowering::isDesirableToCommuteWithShift(
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
index 59f640ea99de3e..4dfa7ac052a5ba 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
@@ -201,7 +201,7 @@ class AMDGPUTargetLowering : public TargetLowering {
                                NegatibleCost &Cost,
                                unsigned Depth) const override;
 
-  bool isNarrowingProfitable(EVT SrcVT, EVT DestVT) const override;
+  bool isNarrowingProfitable(SDNode *N, EVT SrcVT, EVT DestVT) const override;
 
   bool isDesirableToCommuteWithShift(const SDNode *N,
                                      CombineLevel Level) const override;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index e236a5d7522e02..3b4faa35b93738 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,9 @@ class AMDGPURegBankCombinerImpl : public Combiner {
   void applyMed3(MachineInstr &MI, Med3MatchInfo &MatchInfo) const;
   void applyClamp(MachineInstr &MI, Register &Reg) const;
 
+  bool matchPromote16to32(MachineInstr &MI) const;
+  void applyPromote16to32(MachineInstr &MI) const;
+
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -348,6 +351,116 @@ bool AMDGPURegBankCombinerImpl::matchFPMed3ToClamp(MachineInstr &MI,
   return false;
 }
 
+bool AMDGPURegBankCombinerImpl::matchPromote16to32(MachineInstr &MI) const {
+  Register Dst = MI.getOperand(0).getReg();
+  LLT DstTy = MRI.getType(Dst);
+  const auto *RB = MRI.getRegBankOrNull(Dst);
+
+  // Only promote uniform instructions.
+  if (RB->getID() != AMDGPU::SGPRRegBankID)
+    return false;
+
+  // Promote only if:
+  //    - We have 16 bit insts (not true 16 bit insts).
+  //    - We don't have packed instructions (for vector types only).
+  // TODO: For vector types, the set of packed operations is more limited, so
+  // may want to promote some anyway.
+  return STI.has16BitInsts() &&
+         (DstTy.isVector() ? !STI.hasVOP3PInsts() : true);
+}
+
+static unsigned getExtOpcodeForPromotedOp(MachineInstr &MI) {
+  switch (MI.getOpcode()) {
+  case AMDGPU::G_ASHR:
+    return AMDGPU::G_SEXT;
+  case AMDGPU::G_ADD:
+  case AMDGPU::G_SUB:
+  case AMDGPU::G_FSHR:
+    return AMDGPU::G_ZEXT;
+  case AMDGPU::G_AND:
+  case AMDGPU::G_OR:
+  case AMDGPU::G_XOR:
+  case AMDGPU::G_SHL:
+  case AMDGPU::G_SELECT:
+  case AMDGPU::G_MUL:
+    // operation result won't be influenced by garbage high bits.
+    // TODO: are all of those cases correct, and are there more?
+    return AMDGPU::G_ANYEXT;
+  case AMDGPU::G_ICMP: {
+    return CmpInst::isSigned(cast<GICmp>(MI).getCond()) ? AMDGPU::G_SEXT
+                                                        : AMDGPU::G_ZEXT;
+  }
+  default:
+    llvm_unreachable("unexpected opcode!");
+  }
+}
+
+void AMDGPURegBankCombinerImpl::applyPromote16to32(MachineInstr &MI) const {
+  const unsigned Opc = MI.getOpcode();
+  assert(Opc == AMDGPU::G_ADD || Opc == AMDGPU::G_SUB || Opc == AMDGPU::G_SHL ||
+         Opc == AMDGPU::G_LSHR || Opc == AMDGPU::G_ASHR ||
+         Opc == AMDGPU::G_AND || Opc == AMDGPU::G_OR || Opc == AMDGPU::G_XOR ||
+         Opc == AMDGPU::G_MUL || Opc == AMDGPU::G_SELECT ||
+         Opc == AMDGPU::G_ICMP);
+
+  Register Dst = MI.getOperand(0).getReg();
+
+  bool IsSelectOrCmp = (Opc == AMDGPU::G_SELECT || Opc == AMDGPU::G_ICMP);
+  Register LHS = MI.getOperand(IsSelectOrCmp + 1).getReg();
+  Register RHS = MI.getOperand(IsSelectOrCmp + 2).getReg();
+
+  assert(MRI.getType(Dst) == LLT::scalar(16));
+  assert(MRI.getType(LHS) == LLT::scalar(16));
+  assert(MRI.getType(RHS) == LLT::scalar(16));
+
+  assert(MRI.getRegBankOrNull(Dst)->getID() == AMDGPU::SGPRRegBankID);
+  assert(MRI.getRegBankOrNull(LHS)->getID() == AMDGPU::SGPRRegBankID);
+  assert(MRI.getRegBankOrNull(RHS)->getID() == AMDGPU::SGPRRegBankID);
+  const RegisterBank &RB = *MRI.getRegBankOrNull(Dst);
+
+  LLT S32 = LLT::scalar(32);
+
+  B.setInstrAndDebugLoc(MI);
+  const unsigned ExtOpc = getExtOpcodeForPromotedOp(MI);
+  LHS = B.buildInstr(ExtOpc, {S32}, {LHS}).getReg(0);
+  RHS = B.buildInstr(ExtOpc, {S32}, {RHS}).getReg(0);
+
+  MRI.setRegBank(LHS, RB);
+  MRI.setRegBank(RHS, RB);
+
+  MachineInstr *NewInst;
+  if (IsSelectOrCmp)
+    NewInst = B.buildInstr(Opc, {Dst}, {MI.getOperand(1), LHS, RHS});
+  else
+    NewInst = B.buildInstr(Opc, {S32}, {LHS, RHS});
+
+  if (Opc != AMDGPU::G_ICMP) {
+    Register Dst32 = NewInst->getOperand(0).getReg();
+    MRI.setRegBank(Dst32, RB);
+    B.buildTrunc(Dst, Dst32);
+  }
+
+  switch (Opc) {
+  case AMDGPU::G_ADD:
+  case AMDGPU::G_SHL:
+    NewInst->setFlag(MachineInstr::NoUWrap);
+    NewInst->setFlag(MachineInstr::NoSWrap);
+    break;
+  case AMDGPU::G_SUB:
+    if (MI.getFlag(MachineInstr::NoUWrap))
+      NewInst->setFlag(MachineInstr::NoUWrap);
+    NewInst->setFlag(MachineInstr::NoSWrap);
+    break;
+  case AMDGPU::G_MUL:
+    NewInst->setFlag(MachineInstr::NoUWrap);
+    if (MI.getFlag(MachineInstr::NoUWrap))
+      NewInst->setFlag(MachineInstr::NoUWrap);
+    break;
+  }
+
+  MI.eraseFromParent();
+}
+
 void AMDGPURegBankCombinerImpl::applyClamp(MachineInstr &MI,
                                            Register &Reg) const {
   B.buildInstr(AMDGPU::G_AMDGPU_CLAMP, {MI.getOperand(0)}, {Reg},
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1437f3d58b5e79..96a59acd751a62 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -894,6 +894,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
                        ISD::UADDO_CARRY,
                        ISD::SUB,
                        ISD::USUBO_CARRY,
+                       ISD::MUL,
                        ISD::FADD,
                        ISD::FSUB,
                        ISD::FDIV,
@@ -909,9 +910,17 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
                        ISD::UMIN,
                        ISD::UMAX,
                        ISD::SETCC,
+                       ISD::SELECT,
+                       ISD::SMIN,
+                       ISD::SMAX,
+                       ISD::UMIN,
+                       ISD::UMAX,
                        ISD::AND,
                        ISD::OR,
                        ISD::XOR,
+                       ISD::SHL,
+                       ISD::SRL,
+                       ISD::SRA,
                        ISD::FSHR,
                        ISD::SINT_TO_FP,
                        ISD::UINT_TO_FP,
@@ -1935,13 +1944,6 @@ bool SITargetLowering::isTypeDesirableForOp(unsigned Op, EVT VT) const {
     switch (Op) {
     case ISD::LOAD:
     case ISD::STORE:
-
-    // These operations are done with 32-bit instructions anyway.
-    case ISD::AND:
-    case ISD::OR:
-    case ISD::XOR:
-    case ISD::SELECT:
-      // TODO: Extensions?
       return true;
     default:
       return false;
@@ -6746,6 +6748,122 @@ SDValue SITargetLowering::lowerFLDEXP(SDValue Op, SelectionDAG &DAG) const {
   return DAG.getNode(ISD::FLDEXP, DL, VT, Op.getOperand(0), TruncExp);
 }
 
+static unsigned getExtOpcodeForPromotedOp(SDValue Op) {
+  switch (Op->getOpcode()) {
+  case ISD::SRA:
+  case ISD::SMIN:
+  case ISD::SMAX:
+    return ISD::SIGN_EXTEND;
+  case ISD::ADD:
+  case ISD::SUB:
+  case ISD::SRL:
+  case ISD::UMIN:
+  case ISD::UMAX:
+    return ISD::ZERO_EXTEND;
+  case ISD::AND:
+  case ISD::OR:
+  case ISD::XOR:
+  case ISD::SHL:
+  case ISD::SELECT:
+  case ISD::MUL:
+    // operation result won't be influenced by garbage high bits.
+    // TODO: are all of those cases correct, and are there more?
+    return ISD::ANY_EXTEND;
+  case ISD::SETCC: {
+    ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(2))->get();
+    return ISD::isSignedIntSetCC(CC) ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
+  }
+  default:
+    llvm_unreachable("unexpected opcode!");
+  }
+}
+
+SDValue SITargetLowering::promoteUniformOpToI32(SDValue Op,
+                                                DAGCombinerInfo &DCI) const {
+  const unsigned Opc = Op.getOpcode();
+  assert(Opc == ISD::ADD || Opc == ISD::SUB || Opc == ISD::SHL ||
+         Opc == ISD::SRL || Opc == ISD::SRA || Opc == ISD::AND ||
+         Opc == ISD::OR || Opc == ISD::XOR || Opc == ISD::MUL ||
+         Opc == ISD::SETCC || Opc == ISD::SELECT || Opc == ISD::SMIN ||
+         Opc == ISD::SMAX || Opc == ISD::UMIN || Opc == ISD::UMAX);
+
+  EVT OpTy = (Opc != ISD::SETCC) ? Op.getValueType()
+                                 : Op->getOperand(0).getValueType();
+
+  if (DCI.isBeforeLegalizeOps())
+    return SDValue();
+
+  // Promote only if:
+  //    - We have 16 bit insts (not true 16 bit insts).
+  //    - We don't have packed instructions (for vector types only).
+  // TODO: For vector types, the set of packed operations is more limited, so
+  // may want to promote some anyway.
+  if (!Subtarget->has16BitInsts() ||
+      (OpTy.isVector() ? Subtarget->hasVOP3PInsts() : false))
+    return SDValue();
+
+  // Promote uniform scalar and vector integers between 2 and 16 bits.
+  if (Op->isDivergent() || !OpTy.isInteger() ||
+      OpTy.getScalarSizeInBits() == 1 || OpTy.getScalarSizeInBits() > 16)
+    return SDValue();
+
+  auto &DAG = DCI.DAG;
+
+  SDLoc DL(Op);
+  SDValue LHS;
+  SDValue RHS;
+  if (Opc == ISD::SELECT) {
+    LHS = Op->getOperand(1);
+    RHS = Op->getOperand(2);
+  } else {
+    LHS = Op->getOperand(0)...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2024

@llvm/pr-subscribers-llvm-globalisel

Author: Pierre van Houtryve (Pierre-vh)

Changes

See #106382 for NFC test updates.

Promote uniform binops, selects and setcc in Global & DAGISel instead of CGP.

Solves #64591


Patch is 1.35 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106383.diff

88 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+1-1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+10-9)
  • (modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+6-4)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp (+4-4)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCombine.td (+27-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+33-2)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp (+113)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+149-7)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (+1-1)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+2-1)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.h (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/add.v2i16.ll (+33-37)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll (+60-54)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll (+100-63)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll (+72-48)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll (+78-52)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll (+442-412)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll (+107-42)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll (+15-62)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll (+60-54)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sext_inreg.ll (+68-101)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/shl-ext-reduce.ll (+6-4)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll (+49-39)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sub.v2i16.ll (+25-29)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll (+4-22)
  • (modified) llvm/test/CodeGen/AMDGPU/add.v2i16.ll (+11-11)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll (+3-4)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+2-650)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/anyext.ll (+2-6)
  • (modified) llvm/test/CodeGen/AMDGPU/bitreverse.ll (+2-5)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/bug-sdag-emitcopyfromreg.ll (+2-62)
  • (modified) llvm/test/CodeGen/AMDGPU/calling-conventions.ll (+900-839)
  • (modified) llvm/test/CodeGen/AMDGPU/cgp-bitfield-extract.ll (+4-7)
  • (modified) llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/ctlz.ll (+5-21)
  • (modified) llvm/test/CodeGen/AMDGPU/ctlz_zero_undef.ll (+14-11)
  • (modified) llvm/test/CodeGen/AMDGPU/cttz.ll (+3-11)
  • (modified) llvm/test/CodeGen/AMDGPU/cttz_zero_undef.ll (+14-24)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-select.ll (+2-3)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_dynelt.ll (+1010-309)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_elt-i8.ll (+539-119)
  • (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll (+51-50)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg.ll (+3-10)
  • (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll (-532)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll (+21-12)
  • (modified) llvm/test/CodeGen/AMDGPU/idiv-licm.ll (+235-228)
  • (modified) llvm/test/CodeGen/AMDGPU/imm16.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-49)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+910-993)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll (+74-86)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.bf16.ll (+20-20)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i1.ll (+3212-3431)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i8.ll (+2431-2404)
  • (modified) llvm/test/CodeGen/AMDGPU/load-global-i8.ll (+5-10)
  • (modified) llvm/test/CodeGen/AMDGPU/load-local-i8.ll (+5-10)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-lds-struct-aa-memcpy.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/lshr.v2i16.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/min.ll (+225-170)
  • (modified) llvm/test/CodeGen/AMDGPU/mul.ll (+27-24)
  • (modified) llvm/test/CodeGen/AMDGPU/permute_i8.ll (+44-29)
  • (modified) llvm/test/CodeGen/AMDGPU/preload-kernargs.ll (+108-119)
  • (modified) llvm/test/CodeGen/AMDGPU/scalar_to_vector.ll (+21-14)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+100-99)
  • (modified) llvm/test/CodeGen/AMDGPU/select-i1.ll (+4-9)
  • (modified) llvm/test/CodeGen/AMDGPU/select-vectors.ll (+2-3)
  • (modified) llvm/test/CodeGen/AMDGPU/setcc-opt.ll (+5-12)
  • (modified) llvm/test/CodeGen/AMDGPU/sext-in-reg.ll (+4-10)
  • (modified) llvm/test/CodeGen/AMDGPU/shl.ll (+3-2)
  • (modified) llvm/test/CodeGen/AMDGPU/shl.v2i16.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sign_extend.ll (+9-10)
  • (modified) llvm/test/CodeGen/AMDGPU/smed3.ll (+17-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sminmax.v2i16.ll (+1013-83)
  • (modified) llvm/test/CodeGen/AMDGPU/sra.ll (+40-40)
  • (modified) llvm/test/CodeGen/AMDGPU/srem.ll (+19-17)
  • (modified) llvm/test/CodeGen/AMDGPU/sub.v2i16.ll (+16-18)
  • (modified) llvm/test/CodeGen/AMDGPU/trunc-combine.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/trunc-store.ll (+80-56)
  • (modified) llvm/test/CodeGen/AMDGPU/uaddo.ll (+9-6)
  • (modified) llvm/test/CodeGen/AMDGPU/usubo.ll (+9-6)
  • (modified) llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll (+11-10)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-alloca-bitcast.ll (+1-2)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+4-2)
  • (modified) llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll (+35-26)
  • (modified) llvm/test/CodeGen/AMDGPU/zero_extend.ll (+6-5)
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index eda38cd8a564d6..85310a4911b8ed 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -3299,7 +3299,7 @@ class TargetLoweringBase {
   /// Return true if it's profitable to narrow operations of type SrcVT to
   /// DestVT. e.g. on x86, it's profitable to narrow from i32 to i8 but not from
   /// i32 to i16.
-  virtual bool isNarrowingProfitable(EVT SrcVT, EVT DestVT) const {
+  virtual bool isNarrowingProfitable(SDNode *N, EVT SrcVT, EVT DestVT) const {
     return false;
   }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index b0a906743f29ff..513ad392cb360a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -7031,7 +7031,7 @@ SDValue DAGCombiner::visitAND(SDNode *N) {
     if (N1C->getAPIntValue().countLeadingZeros() >= (BitWidth - SrcBitWidth) &&
         TLI.isTruncateFree(VT, SrcVT) && TLI.isZExtFree(SrcVT, VT) &&
         TLI.isTypeDesirableForOp(ISD::AND, SrcVT) &&
-        TLI.isNarrowingProfitable(VT, SrcVT))
+        TLI.isNarrowingProfitable(N, VT, SrcVT))
       return DAG.getNode(ISD::ZERO_EXTEND, DL, VT,
                          DAG.getNode(ISD::AND, DL, SrcVT, N0Op0,
                                      DAG.getZExtOrTrunc(N1, DL, SrcVT)));
@@ -14574,7 +14574,7 @@ SDValue DAGCombiner::reduceLoadWidth(SDNode *N) {
   // ShLeftAmt will indicate how much a narrowed load should be shifted left.
   unsigned ShLeftAmt = 0;
   if (ShAmt == 0 && N0.getOpcode() == ISD::SHL && N0.hasOneUse() &&
-      ExtVT == VT && TLI.isNarrowingProfitable(N0.getValueType(), VT)) {
+      ExtVT == VT && TLI.isNarrowingProfitable(N, N0.getValueType(), VT)) {
     if (ConstantSDNode *N01 = dyn_cast<ConstantSDNode>(N0.getOperand(1))) {
       ShLeftAmt = N01->getZExtValue();
       N0 = N0.getOperand(0);
@@ -15118,9 +15118,11 @@ SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
   }
 
   // trunc (select c, a, b) -> select c, (trunc a), (trunc b)
-  if (N0.getOpcode() == ISD::SELECT && N0.hasOneUse()) {
-    if ((!LegalOperations || TLI.isOperationLegal(ISD::SELECT, SrcVT)) &&
-        TLI.isTruncateFree(SrcVT, VT)) {
+  if (N0.getOpcode() == ISD::SELECT && N0.hasOneUse() &&
+      TLI.isTruncateFree(SrcVT, VT)) {
+    if (!LegalOperations ||
+        (TLI.isOperationLegal(ISD::SELECT, SrcVT) &&
+         TLI.isNarrowingProfitable(N0.getNode(), N0.getValueType(), VT))) {
       SDLoc SL(N0);
       SDValue Cond = N0.getOperand(0);
       SDValue TruncOp0 = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(1));
@@ -20061,10 +20063,9 @@ SDValue DAGCombiner::ReduceLoadOpStoreWidth(SDNode *N) {
     EVT NewVT = EVT::getIntegerVT(*DAG.getContext(), NewBW);
     // The narrowing should be profitable, the load/store operation should be
     // legal (or custom) and the store size should be equal to the NewVT width.
-    while (NewBW < BitWidth &&
-           (NewVT.getStoreSizeInBits() != NewBW ||
-            !TLI.isOperationLegalOrCustom(Opc, NewVT) ||
-            !TLI.isNarrowingProfitable(VT, NewVT))) {
+    while (NewBW < BitWidth && (NewVT.getStoreSizeInBits() != NewBW ||
+                                !TLI.isOperationLegalOrCustom(Opc, NewVT) ||
+                                !TLI.isNarrowingProfitable(N, VT, NewVT))) {
       NewBW = NextPowerOf2(NewBW);
       NewVT = EVT::getIntegerVT(*DAG.getContext(), NewBW);
     }
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4e796289cff0a1..97e10b3551db1a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1841,7 +1841,7 @@ bool TargetLowering::SimplifyDemandedBits(
         for (unsigned SmallVTBits = llvm::bit_ceil(DemandedSize);
              SmallVTBits < BitWidth; SmallVTBits = NextPowerOf2(SmallVTBits)) {
           EVT SmallVT = EVT::getIntegerVT(*TLO.DAG.getContext(), SmallVTBits);
-          if (isNarrowingProfitable(VT, SmallVT) &&
+          if (isNarrowingProfitable(Op.getNode(), VT, SmallVT) &&
               isTypeDesirableForOp(ISD::SHL, SmallVT) &&
               isTruncateFree(VT, SmallVT) && isZExtFree(SmallVT, VT) &&
               (!TLO.LegalOperations() || isOperationLegal(ISD::SHL, SmallVT))) {
@@ -1865,7 +1865,7 @@ bool TargetLowering::SimplifyDemandedBits(
       if ((BitWidth % 2) == 0 && !VT.isVector() && ShAmt < HalfWidth &&
           DemandedBits.countLeadingOnes() >= HalfWidth) {
         EVT HalfVT = EVT::getIntegerVT(*TLO.DAG.getContext(), HalfWidth);
-        if (isNarrowingProfitable(VT, HalfVT) &&
+        if (isNarrowingProfitable(Op.getNode(), VT, HalfVT) &&
             isTypeDesirableForOp(ISD::SHL, HalfVT) &&
             isTruncateFree(VT, HalfVT) && isZExtFree(HalfVT, VT) &&
             (!TLO.LegalOperations() || isOperationLegal(ISD::SHL, HalfVT))) {
@@ -1984,7 +1984,7 @@ bool TargetLowering::SimplifyDemandedBits(
       if ((BitWidth % 2) == 0 && !VT.isVector()) {
         APInt HiBits = APInt::getHighBitsSet(BitWidth, BitWidth / 2);
         EVT HalfVT = EVT::getIntegerVT(*TLO.DAG.getContext(), BitWidth / 2);
-        if (isNarrowingProfitable(VT, HalfVT) &&
+        if (isNarrowingProfitable(Op.getNode(), VT, HalfVT) &&
             isTypeDesirableForOp(ISD::SRL, HalfVT) &&
             isTruncateFree(VT, HalfVT) && isZExtFree(HalfVT, VT) &&
             (!TLO.LegalOperations() || isOperationLegal(ISD::SRL, HalfVT)) &&
@@ -4762,9 +4762,11 @@ SDValue TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
       case ISD::SETULT:
       case ISD::SETULE: {
         EVT newVT = N0.getOperand(0).getValueType();
+        // FIXME: Should use isNarrowingProfitable.
         if (DCI.isBeforeLegalizeOps() ||
             (isOperationLegal(ISD::SETCC, newVT) &&
-             isCondCodeLegal(Cond, newVT.getSimpleVT()))) {
+             isCondCodeLegal(Cond, newVT.getSimpleVT()) &&
+             isTypeDesirableForOp(ISD::SETCC, newVT))) {
           EVT NewSetCCVT = getSetCCResultType(Layout, *DAG.getContext(), newVT);
           SDValue NewConst = DAG.getConstant(C1.trunc(InSize), dl, newVT);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
index 052e1140533f3f..f689fcf62fe8eb 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
@@ -46,10 +46,10 @@ static cl::opt<bool> WidenLoads(
   cl::init(false));
 
 static cl::opt<bool> Widen16BitOps(
-  "amdgpu-codegenprepare-widen-16-bit-ops",
-  cl::desc("Widen uniform 16-bit instructions to 32-bit in AMDGPUCodeGenPrepare"),
-  cl::ReallyHidden,
-  cl::init(true));
+    "amdgpu-codegenprepare-widen-16-bit-ops",
+    cl::desc(
+        "Widen uniform 16-bit instructions to 32-bit in AMDGPUCodeGenPrepare"),
+    cl::ReallyHidden, cl::init(false));
 
 static cl::opt<bool>
     BreakLargePHIs("amdgpu-codegenprepare-break-large-phis",
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index b2a3f9392157d1..01e96159babd03 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -145,6 +145,31 @@ def expand_promoted_fmed3 : GICombineRule<
 
 } // End Predicates = [NotHasMed3_16]
 
+def promote_i16_uniform_binops_frag : GICombinePatFrag<
+  (outs root:$dst), (ins),
+  !foreach(op, [G_ADD, G_SUB, G_SHL, G_ASHR, G_LSHR, G_AND, G_XOR, G_OR, G_MUL],
+          (pattern (op i16:$dst, i16:$lhs, i16:$rhs)))>;
+
+def promote_i16_uniform_binops : GICombineRule<
+  (defs root:$dst),
+  (match (promote_i16_uniform_binops_frag i16:$dst):$mi,
+    [{ return matchPromote16to32(*${mi}); }]),
+  (apply [{ applyPromote16to32(*${mi}); }])
+>;
+
+def promote_i16_uniform_ternary_frag : GICombinePatFrag<
+  (outs root:$dst), (ins),
+  !foreach(op, [G_ICMP, G_SELECT],
+          (pattern (op i16:$dst, $first, i16:$lhs, i16:$rhs)))>;
+
+def promote_i16_uniform_ternary : GICombineRule<
+  (defs root:$dst),
+  (match (promote_i16_uniform_ternary_frag i16:$dst):$mi,
+    [{ return matchPromote16to32(*${mi}); }]),
+  (apply [{ applyPromote16to32(*${mi}); }])
+>;
+
+
 // Combines which should only apply on SI/CI
 def gfx6gfx7_combines : GICombineGroup<[fcmp_select_to_fmin_fmax_legacy]>;
 
@@ -169,5 +194,6 @@ def AMDGPURegBankCombiner : GICombiner<
   "AMDGPURegBankCombinerImpl",
   [unmerge_merge, unmerge_cst, unmerge_undef,
    zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
-   fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp]> {
+   fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
+   promote_i16_uniform_binops, promote_i16_uniform_ternary]> {
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 96143d688801aa..1a596cc80c0c9c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -1017,14 +1017,45 @@ bool AMDGPUTargetLowering::isZExtFree(EVT Src, EVT Dest) const {
   return Src == MVT::i32 && Dest == MVT::i64;
 }
 
-bool AMDGPUTargetLowering::isNarrowingProfitable(EVT SrcVT, EVT DestVT) const {
+bool AMDGPUTargetLowering::isNarrowingProfitable(SDNode *N, EVT SrcVT,
+                                                 EVT DestVT) const {
+  switch (N->getOpcode()) {
+  case ISD::ADD:
+  case ISD::SUB:
+  case ISD::SHL:
+  case ISD::SRL:
+  case ISD::SRA:
+  case ISD::AND:
+  case ISD::OR:
+  case ISD::XOR:
+  case ISD::MUL:
+  case ISD::SETCC:
+  case ISD::SELECT:
+    if (Subtarget->has16BitInsts() &&
+        (DestVT.isVector() ? !Subtarget->hasVOP3PInsts() : true)) {
+      // Don't narrow back down to i16 if promoted to i32 already.
+      if (!N->isDivergent() && DestVT.isInteger() &&
+          DestVT.getScalarSizeInBits() > 1 &&
+          DestVT.getScalarSizeInBits() <= 16 &&
+          SrcVT.getScalarSizeInBits() > 16) {
+        return false;
+      }
+    }
+    return true;
+  default:
+    break;
+  }
+
   // There aren't really 64-bit registers, but pairs of 32-bit ones and only a
   // limited number of native 64-bit operations. Shrinking an operation to fit
   // in a single 32-bit register should always be helpful. As currently used,
   // this is much less general than the name suggests, and is only used in
   // places trying to reduce the sizes of loads. Shrinking loads to < 32-bits is
   // not profitable, and may actually be harmful.
-  return SrcVT.getSizeInBits() > 32 && DestVT.getSizeInBits() == 32;
+  if (isa<LoadSDNode>(N))
+    return SrcVT.getSizeInBits() > 32 && DestVT.getSizeInBits() == 32;
+
+  return true;
 }
 
 bool AMDGPUTargetLowering::isDesirableToCommuteWithShift(
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
index 59f640ea99de3e..4dfa7ac052a5ba 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
@@ -201,7 +201,7 @@ class AMDGPUTargetLowering : public TargetLowering {
                                NegatibleCost &Cost,
                                unsigned Depth) const override;
 
-  bool isNarrowingProfitable(EVT SrcVT, EVT DestVT) const override;
+  bool isNarrowingProfitable(SDNode *N, EVT SrcVT, EVT DestVT) const override;
 
   bool isDesirableToCommuteWithShift(const SDNode *N,
                                      CombineLevel Level) const override;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index e236a5d7522e02..3b4faa35b93738 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,9 @@ class AMDGPURegBankCombinerImpl : public Combiner {
   void applyMed3(MachineInstr &MI, Med3MatchInfo &MatchInfo) const;
   void applyClamp(MachineInstr &MI, Register &Reg) const;
 
+  bool matchPromote16to32(MachineInstr &MI) const;
+  void applyPromote16to32(MachineInstr &MI) const;
+
 private:
   SIModeRegisterDefaults getMode() const;
   bool getIEEE() const;
@@ -348,6 +351,116 @@ bool AMDGPURegBankCombinerImpl::matchFPMed3ToClamp(MachineInstr &MI,
   return false;
 }
 
+bool AMDGPURegBankCombinerImpl::matchPromote16to32(MachineInstr &MI) const {
+  Register Dst = MI.getOperand(0).getReg();
+  LLT DstTy = MRI.getType(Dst);
+  const auto *RB = MRI.getRegBankOrNull(Dst);
+
+  // Only promote uniform instructions.
+  if (RB->getID() != AMDGPU::SGPRRegBankID)
+    return false;
+
+  // Promote only if:
+  //    - We have 16 bit insts (not true 16 bit insts).
+  //    - We don't have packed instructions (for vector types only).
+  // TODO: For vector types, the set of packed operations is more limited, so
+  // may want to promote some anyway.
+  return STI.has16BitInsts() &&
+         (DstTy.isVector() ? !STI.hasVOP3PInsts() : true);
+}
+
+static unsigned getExtOpcodeForPromotedOp(MachineInstr &MI) {
+  switch (MI.getOpcode()) {
+  case AMDGPU::G_ASHR:
+    return AMDGPU::G_SEXT;
+  case AMDGPU::G_ADD:
+  case AMDGPU::G_SUB:
+  case AMDGPU::G_FSHR:
+    return AMDGPU::G_ZEXT;
+  case AMDGPU::G_AND:
+  case AMDGPU::G_OR:
+  case AMDGPU::G_XOR:
+  case AMDGPU::G_SHL:
+  case AMDGPU::G_SELECT:
+  case AMDGPU::G_MUL:
+    // operation result won't be influenced by garbage high bits.
+    // TODO: are all of those cases correct, and are there more?
+    return AMDGPU::G_ANYEXT;
+  case AMDGPU::G_ICMP: {
+    return CmpInst::isSigned(cast<GICmp>(MI).getCond()) ? AMDGPU::G_SEXT
+                                                        : AMDGPU::G_ZEXT;
+  }
+  default:
+    llvm_unreachable("unexpected opcode!");
+  }
+}
+
+void AMDGPURegBankCombinerImpl::applyPromote16to32(MachineInstr &MI) const {
+  const unsigned Opc = MI.getOpcode();
+  assert(Opc == AMDGPU::G_ADD || Opc == AMDGPU::G_SUB || Opc == AMDGPU::G_SHL ||
+         Opc == AMDGPU::G_LSHR || Opc == AMDGPU::G_ASHR ||
+         Opc == AMDGPU::G_AND || Opc == AMDGPU::G_OR || Opc == AMDGPU::G_XOR ||
+         Opc == AMDGPU::G_MUL || Opc == AMDGPU::G_SELECT ||
+         Opc == AMDGPU::G_ICMP);
+
+  Register Dst = MI.getOperand(0).getReg();
+
+  bool IsSelectOrCmp = (Opc == AMDGPU::G_SELECT || Opc == AMDGPU::G_ICMP);
+  Register LHS = MI.getOperand(IsSelectOrCmp + 1).getReg();
+  Register RHS = MI.getOperand(IsSelectOrCmp + 2).getReg();
+
+  assert(MRI.getType(Dst) == LLT::scalar(16));
+  assert(MRI.getType(LHS) == LLT::scalar(16));
+  assert(MRI.getType(RHS) == LLT::scalar(16));
+
+  assert(MRI.getRegBankOrNull(Dst)->getID() == AMDGPU::SGPRRegBankID);
+  assert(MRI.getRegBankOrNull(LHS)->getID() == AMDGPU::SGPRRegBankID);
+  assert(MRI.getRegBankOrNull(RHS)->getID() == AMDGPU::SGPRRegBankID);
+  const RegisterBank &RB = *MRI.getRegBankOrNull(Dst);
+
+  LLT S32 = LLT::scalar(32);
+
+  B.setInstrAndDebugLoc(MI);
+  const unsigned ExtOpc = getExtOpcodeForPromotedOp(MI);
+  LHS = B.buildInstr(ExtOpc, {S32}, {LHS}).getReg(0);
+  RHS = B.buildInstr(ExtOpc, {S32}, {RHS}).getReg(0);
+
+  MRI.setRegBank(LHS, RB);
+  MRI.setRegBank(RHS, RB);
+
+  MachineInstr *NewInst;
+  if (IsSelectOrCmp)
+    NewInst = B.buildInstr(Opc, {Dst}, {MI.getOperand(1), LHS, RHS});
+  else
+    NewInst = B.buildInstr(Opc, {S32}, {LHS, RHS});
+
+  if (Opc != AMDGPU::G_ICMP) {
+    Register Dst32 = NewInst->getOperand(0).getReg();
+    MRI.setRegBank(Dst32, RB);
+    B.buildTrunc(Dst, Dst32);
+  }
+
+  switch (Opc) {
+  case AMDGPU::G_ADD:
+  case AMDGPU::G_SHL:
+    NewInst->setFlag(MachineInstr::NoUWrap);
+    NewInst->setFlag(MachineInstr::NoSWrap);
+    break;
+  case AMDGPU::G_SUB:
+    if (MI.getFlag(MachineInstr::NoUWrap))
+      NewInst->setFlag(MachineInstr::NoUWrap);
+    NewInst->setFlag(MachineInstr::NoSWrap);
+    break;
+  case AMDGPU::G_MUL:
+    NewInst->setFlag(MachineInstr::NoUWrap);
+    if (MI.getFlag(MachineInstr::NoUWrap))
+      NewInst->setFlag(MachineInstr::NoUWrap);
+    break;
+  }
+
+  MI.eraseFromParent();
+}
+
 void AMDGPURegBankCombinerImpl::applyClamp(MachineInstr &MI,
                                            Register &Reg) const {
   B.buildInstr(AMDGPU::G_AMDGPU_CLAMP, {MI.getOperand(0)}, {Reg},
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1437f3d58b5e79..96a59acd751a62 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -894,6 +894,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
                        ISD::UADDO_CARRY,
                        ISD::SUB,
                        ISD::USUBO_CARRY,
+                       ISD::MUL,
                        ISD::FADD,
                        ISD::FSUB,
                        ISD::FDIV,
@@ -909,9 +910,17 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
                        ISD::UMIN,
                        ISD::UMAX,
                        ISD::SETCC,
+                       ISD::SELECT,
+                       ISD::SMIN,
+                       ISD::SMAX,
+                       ISD::UMIN,
+                       ISD::UMAX,
                        ISD::AND,
                        ISD::OR,
                        ISD::XOR,
+                       ISD::SHL,
+                       ISD::SRL,
+                       ISD::SRA,
                        ISD::FSHR,
                        ISD::SINT_TO_FP,
                        ISD::UINT_TO_FP,
@@ -1935,13 +1944,6 @@ bool SITargetLowering::isTypeDesirableForOp(unsigned Op, EVT VT) const {
     switch (Op) {
     case ISD::LOAD:
     case ISD::STORE:
-
-    // These operations are done with 32-bit instructions anyway.
-    case ISD::AND:
-    case ISD::OR:
-    case ISD::XOR:
-    case ISD::SELECT:
-      // TODO: Extensions?
       return true;
     default:
       return false;
@@ -6746,6 +6748,122 @@ SDValue SITargetLowering::lowerFLDEXP(SDValue Op, SelectionDAG &DAG) const {
   return DAG.getNode(ISD::FLDEXP, DL, VT, Op.getOperand(0), TruncExp);
 }
 
+static unsigned getExtOpcodeForPromotedOp(SDValue Op) {
+  switch (Op->getOpcode()) {
+  case ISD::SRA:
+  case ISD::SMIN:
+  case ISD::SMAX:
+    return ISD::SIGN_EXTEND;
+  case ISD::ADD:
+  case ISD::SUB:
+  case ISD::SRL:
+  case ISD::UMIN:
+  case ISD::UMAX:
+    return ISD::ZERO_EXTEND;
+  case ISD::AND:
+  case ISD::OR:
+  case ISD::XOR:
+  case ISD::SHL:
+  case ISD::SELECT:
+  case ISD::MUL:
+    // operation result won't be influenced by garbage high bits.
+    // TODO: are all of those cases correct, and are there more?
+    return ISD::ANY_EXTEND;
+  case ISD::SETCC: {
+    ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(2))->get();
+    return ISD::isSignedIntSetCC(CC) ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
+  }
+  default:
+    llvm_unreachable("unexpected opcode!");
+  }
+}
+
+SDValue SITargetLowering::promoteUniformOpToI32(SDValue Op,
+                                                DAGCombinerInfo &DCI) const {
+  const unsigned Opc = Op.getOpcode();
+  assert(Opc == ISD::ADD || Opc == ISD::SUB || Opc == ISD::SHL ||
+         Opc == ISD::SRL || Opc == ISD::SRA || Opc == ISD::AND ||
+         Opc == ISD::OR || Opc == ISD::XOR || Opc == ISD::MUL ||
+         Opc == ISD::SETCC || Opc == ISD::SELECT || Opc == ISD::SMIN ||
+         Opc == ISD::SMAX || Opc == ISD::UMIN || Opc == ISD::UMAX);
+
+  EVT OpTy = (Opc != ISD::SETCC) ? Op.getValueType()
+                                 : Op->getOperand(0).getValueType();
+
+  if (DCI.isBeforeLegalizeOps())
+    return SDValue();
+
+  // Promote only if:
+  //    - We have 16 bit insts (not true 16 bit insts).
+  //    - We don't have packed instructions (for vector types only).
+  // TODO: For vector types, the set of packed operations is more limited, so
+  // may want to promote some anyway.
+  if (!Subtarget->has16BitInsts() ||
+      (OpTy.isVector() ? Subtarget->hasVOP3PInsts() : false))
+    return SDValue();
+
+  // Promote uniform scalar and vector integers between 2 and 16 bits.
+  if (Op->isDivergent() || !OpTy.isInteger() ||
+      OpTy.getScalarSizeInBits() == 1 || OpTy.getScalarSizeInBits() > 16)
+    return SDValue();
+
+  auto &DAG = DCI.DAG;
+
+  SDLoc DL(Op);
+  SDValue LHS;
+  SDValue RHS;
+  if (Opc == ISD::SELECT) {
+    LHS = Op->getOperand(1);
+    RHS = Op->getOperand(2);
+  } else {
+    LHS = Op->getOperand(0)...
[truncated]

@Pierre-vh Pierre-vh force-pushed the i16-to-i32-in-isel branch 2 times, most recently from 3f23227 to dc9f16d Compare August 29, 2024 13:27
@Pierre-vh Pierre-vh changed the title [AMDGPU] Promote uniform ops to I32 in ISel [AMDGPU] Promote uniform ops to I32 in DAGISel Aug 29, 2024
Pierre-vh added a commit to Pierre-vh/llvm-project that referenced this pull request Aug 29, 2024
Please only review the last commit, see llvm#106383 for DAGIsel changes.

GlobalISel counterpart of llvm#106383

See #llvm#64591
Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my comments on #106557 also apply here.

@Pierre-vh Pierre-vh requested review from arsenm and jayfoad September 5, 2024 09:19
Promote uniform binops, selects and setcc in Global & DAGISel instead of CGP.

Solves llvm#64591
@Pierre-vh Pierre-vh requested a review from arsenm September 17, 2024 05:27
@Pierre-vh Pierre-vh merged commit 758444c into llvm:main Sep 19, 2024
8 checks passed
@Pierre-vh Pierre-vh deleted the i16-to-i32-in-isel branch September 19, 2024 07:00
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-sles-build-only running on rocm-worker-hw-04-sles while building llvm at step 8 "Add check check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/140/builds/6913

Here is the relevant piece of the build log for the reference
Step 8 (Add check check-llvm) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck -check-prefix=GFX6 /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck -check-prefix=GFX6 /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck -check-prefix=GFX8 /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck -check-prefix=GFX8 /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/5908

Here is the relevant piece of the build log for the reference
Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[38/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/memmove-hip-6.0.2.dir/memmove.hip.o -o External/HIP/memmove-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/memmove.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/memmove.reference_output-hip-6.0.2
[39/40] /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /buildbot/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
[40/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -o External/HIP/TheNextWeek-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/TheNextWeek.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/TheNextWeek.reference_output-hip-6.0.2
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

Input 1:
Memory access fault by GPU node-1 (Agent handle: 0x561df895fac0) on address (nil). Reason: Page not present or supervisor privilege.
exit 134

Input 2:
image width = 1200 height = 675
block size = (16, 16) grid size = (75, 43)
Start rendering by GPU.
Done.
gpu.ppm and ref.ppm are the same.
exit 0

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 325.13s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

Input 1:
Memory access fault by GPU node-1 (Agent handle: 0x561df895fac0) on address (nil). Reason: Page not present or supervisor privilege.
exit 134

Input 2:
image width = 1200 height = 675
block size = (16, 16) grid size = (75, 43)
Start rendering by GPU.
Done.
gpu.ppm and ref.ppm are the same.
exit 0

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 325.13s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
program finished with exit code 1
elapsedTime=605.823405

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu running on sie-linux-worker3 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/5380

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck -check-prefix=GFX6 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck -check-prefix=GFX6 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
RUN: at line 3: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck -check-prefix=GFX8 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/FileCheck -check-prefix=GFX8 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
�[1m/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: �[0m�[0;1;31merror: �[0m�[1mGFX8-NEXT: is not on the line after the previous match
�[0m; GFX8-NEXT: s_waitcnt vmcnt(0)
�[0;1;32m             ^
�[0m�[1m<stdin>:4279:2: �[0m�[0;1;30mnote: �[0m�[1m'next' match was here
�[0m s_waitcnt vmcnt(0)
�[0;1;32m ^
�[0m�[1m<stdin>:4228:64: �[0m�[0;1;30mnote: �[0m�[1mprevious match ended here
�[0m buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
�[0;1;32m                                                               ^
�[0m�[1m<stdin>:4229:1: �[0m�[0;1;30mnote: �[0m�[1mnon-matching line after previous match is here
�[0m buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
�[0;1;32m^
�[0m
Input file: <stdin>
Check file: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m              1: �[0m�[1m�[0;1;46m .text �[0m
�[0;1;30m              2: �[0m�[1m�[0;1;46m .section .AMDGPU.config,"",@progbits �[0m
�[0;1;30m              3: �[0m�[1m�[0;1;46m .long 47176 �[0m
�[0;1;30m              4: �[0m�[1m�[0;1;46m .long 11469504 �[0m
�[0;1;30m              5: �[0m�[1m�[0;1;46m .long 47180 �[0m
�[0;1;30m              6: �[0m�[1m�[0;1;46m .long 5004 �[0m
�[0;1;30m              7: �[0m�[1m�[0;1;46m .long 47200 �[0m
�[0;1;30m              8: �[0m�[1m�[0;1;46m .long 0 �[0m
�[0;1;30m              9: �[0m�[1m�[0;1;46m .long 4 �[0m
�[0;1;30m             10: �[0m�[1m�[0;1;46m .long 0 �[0m
�[0;1;30m             11: �[0m�[1m�[0;1;46m .long 8 �[0m
�[0;1;30m             12: �[0m�[1m�[0;1;46m .long 0 �[0m
�[0;1;30m             13: �[0m�[1m�[0;1;46m .text �[0m
�[0;1;30m             14: �[0m�[1m�[0;1;46m .globl constant_load_i1 ; -- Begin function constant_load_i1 �[0m
�[0;1;30m             15: �[0m�[1m�[0;1;46m .p2align 8 �[0m
�[0;1;30m             16: �[0m�[1m�[0;1;46m .type constant_load_i1,@function �[0m
�[0;1;30m             17: �[0m�[1m�[0;1;46m�[0mconstant_load_i1:�[0;1;46m ; @constant_load_i1 �[0m
�[0;1;32mlabel:26'0       ^~~~~~~~~~~~~~~~~
�[0m�[0;1;32mlabel:26'1       ^~~~~~~~~~~~~~~~~
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder ml-opt-devrel-x86-64 running on ml-opt-devrel-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/5468

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefix=GFX6 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefix=GFX6 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefix=GFX8 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/ml-opt-devrel-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck -check-prefix=GFX8 /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder ml-opt-rel-x86-64 running on ml-opt-rel-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/5450

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefix=GFX6 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefix=GFX6 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
RUN: at line 3: /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefix=GFX8 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck -check-prefix=GFX8 /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/ml-opt-rel-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/5507

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefix=GFX6 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefix=GFX6 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefix=GFX8 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/ml-opt-dev-x86-64-b1/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck -check-prefix=GFX8 /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 8 "Add check check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/5762

Here is the relevant piece of the build log for the reference
Step 8 (Add check check-llvm) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/FileCheck -check-prefix=GFX6 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/FileCheck -check-prefix=GFX6 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/FileCheck -check-prefix=GFX8 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/FileCheck -check-prefix=GFX8 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

Pierre-vh added a commit that referenced this pull request Sep 19, 2024
@Pierre-vh
Copy link
Contributor Author

da1a222

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/9376

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefix=GFX6 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefix=GFX6 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefix=GFX8 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /build/buildbot/premerge-monolithic-linux/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -check-prefix=GFX8 /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-test-suite running on ppc64le-clang-test-suite while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/95/builds/4003

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/bin/FileCheck -check-prefix=GFX6 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/bin/FileCheck -check-prefix=GFX6 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/bin/FileCheck -check-prefix=GFX8 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/build/bin/FileCheck -check-prefix=GFX8 /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-test-suite/clang-ppc64le-test-suite/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder clang-debian-cpp20 running on clang-debian-cpp20 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/108/builds/3894

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/FileCheck -check-prefix=GFX6 /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/FileCheck -check-prefix=GFX6 /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/FileCheck -check-prefix=GFX8 /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/FileCheck -check-prefix=GFX8 /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder ppc64le-lld-multistage-test running on ppc64le-lld-multistage-test while building llvm at step 7 "test-build-stage1-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/3495

Here is the relevant piece of the build log for the reference
Step 7 (test-build-stage1-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/FileCheck -check-prefix=GFX6 /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/FileCheck -check-prefix=GFX6 /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/FileCheck -check-prefix=GFX8 /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage1/bin/FileCheck -check-prefix=GFX8 /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...
Step 13 (test-build-stage2-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/bin/FileCheck -check-prefix=GFX6 /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/bin/FileCheck -check-prefix=GFX6 /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/bin/FileCheck -check-prefix=GFX8 /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/build/stage2/bin/FileCheck -check-prefix=GFX8 /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-ubuntu running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/1287

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefix=GFX6 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefix=GFX6 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefix=GFX8 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefix=GFX8 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/3284

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefix=GFX6 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefix=GFX6 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefix=GFX8 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck -check-prefix=GFX8 /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vls-2stage running on linaro-g3-04 while building llvm at step 12 "ninja check 2".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/4/builds/2253

Here is the relevant piece of the build log for the reference
Step 12 (ninja check 2) failure: stage 2 checked (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage2/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage2/bin/FileCheck -check-prefix=GFX6 /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage2/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage2/bin/FileCheck -check-prefix=GFX6 /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage2/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage2/bin/FileCheck -check-prefix=GFX8 /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage2/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/stage2/bin/FileCheck -check-prefix=GFX8 /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-ubsan running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/25/builds/2594

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 85713 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.
FAIL: LLVM :: CodeGen/AMDGPU/load-constant-i1.ll (31446 of 85713)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck -check-prefix=GFX6 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck -check-prefix=GFX6 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck -check-prefix=GFX8 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck -check-prefix=GFX8 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
/home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
Step 10 (stage2/ubsan check) failure: stage2/ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 85713 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.
FAIL: LLVM :: CodeGen/AMDGPU/load-constant-i1.ll (31446 of 85713)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck -check-prefix=GFX6 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck -check-prefix=GFX6 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck -check-prefix=GFX8 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck -check-prefix=GFX8 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
/home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
Step 13 (stage3/ubsan check) failure: stage3/ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 82866 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.
FAIL: LLVM :: CodeGen/AMDGPU/load-constant-i1.ll (31446 of 82866)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/FileCheck -check-prefix=GFX6 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/FileCheck -check-prefix=GFX6 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
RUN: at line 3: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/FileCheck -check-prefix=GFX8 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build2_ubsan/bin/FileCheck -check-prefix=GFX8 /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder clang-x86_64-debian-fast running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/7778

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefix=GFX6 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefix=GFX6 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefix=GFX8 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck -check-prefix=GFX8 /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 19, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/5621

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefix=GFX6 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefix=GFX6 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefix=GFX8 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefix=GFX8 /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
Promote uniform binops, selects and setcc between 2 and 16 bits to 32
bits in DAGISel

Solves llvm#64591
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 20, 2024

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/8010

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/load-constant-i1.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefix=GFX6 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn-- -verify-machineinstrs
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefix=GFX6 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
RUN: at line 3: /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefix=GFX8 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck -check-prefix=GFX8 /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll:8972:14: error: GFX8-NEXT: is not on the line after the previous match
; GFX8-NEXT: s_waitcnt vmcnt(0)
             ^
<stdin>:4279:2: note: 'next' match was here
 s_waitcnt vmcnt(0)
 ^
<stdin>:4228:64: note: previous match ended here
 buffer_store_dword v12, off, s[88:91], 0 ; 4-byte Folded Spill
                                                               ^
<stdin>:4229:1: note: non-matching line after previous match is here
 buffer_store_dword v13, off, s[88:91], 0 offset:4 ; 4-byte Folded Spill
^

Input file: <stdin>
Check file: /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/CodeGen/AMDGPU/load-constant-i1.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
        4274:  v_mov_b32_e32 v2, s10 
        4275:  v_mov_b32_e32 v3, s11 
        4276:  v_mov_b32_e32 v9, s13 
        4277:  v_mov_b32_e32 v10, s14 
        4278:  v_mov_b32_e32 v11, s15 
        4279:  s_waitcnt vmcnt(0) 
next:8972      !~~~~~~~~~~~~~~~~~  error: match on wrong line
        4280:  flat_store_dwordx4 v[18:19], v[28:31] 
        4281:  flat_store_dwordx4 v[59:60], v[32:35] 
        4282:  flat_store_dwordx4 v[61:62], v[36:39] 
        4283:  flat_store_dwordx4 v[45:46], v[40:43] 
        4284:  flat_store_dwordx4 v[12:13], v[4:7] 
           .
           .
           .
>>>>>>
...

@@ -18,189 +18,33 @@ declare hidden half @_Z4pownDhi(half, i32)
; --------------------------------------------------------------------

define half @test_pow_fast_f16(half %x, half %y) {
; CHECK-LABEL: test_pow_fast_f16:
; CHECK: ; %bb.0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file lost all the test checks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in 528bcf3

Pierre-vh added a commit to Pierre-vh/llvm-project that referenced this pull request Sep 23, 2024
Comment on lines +6802 to +6803
if (Op.getOpcode() == ISD::SRA || Op.getOpcode() == ISD::SRL ||
Op.getOpcode() == ISD::SRA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Op.getOpcode() == ISD::SRA || Op.getOpcode() == ISD::SRL ||
Op.getOpcode() == ISD::SRA)
if (Op.getOpcode() == ISD::SHL || Op.getOpcode() == ISD::SRL ||
Op.getOpcode() == ISD::SRA)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pierre-vh ping - this looks like it was a simple typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry, I'll fix it right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants