[GlobalISel] Fold G_ICMP if possible #86357

shiltian · 2024-03-22T23:22:16Z

This patch tries to fold G_ICMP if possible.

llvm/lib/CodeGen/GlobalISel/Utils.cpp

github-actions · 2024-03-22T23:24:47Z

✅ With the latest revision this PR passed the Python code formatter.

github-actions · 2024-03-22T23:24:47Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

shiltian · 2024-03-25T21:10:29Z

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddo.mir

    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
    ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[ADD]](s32), [[COPY]]
    ; CHECK-NEXT: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[COPY1]](s32), [[C]]
    ; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s1) = G_XOR [[ICMP1]], [[ICMP]]
    ; CHECK-NEXT: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[XOR]](s1)
-    ; CHECK-NEXT: $vgpr0 = COPY [[ADD]](s32)
+    ; CHECK-NEXT: $vgpr0 = COPY [[COPY2]](s32)


I suppose we have a pass to eliminate the copy later in the pipeline?

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

llvmbot · 2024-03-28T13:44:11Z

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Shilei Tian (shiltian)

Changes

This patch tries to fold G_ICMP if possible.

Patch is 57.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/86357.diff

9 Files Affected:

(modified) llvm/include/llvm/CodeGen/GlobalISel/Utils.h (+4)
(modified) llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp (+14)
(modified) llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp (+11-5)
(modified) llvm/lib/CodeGen/GlobalISel/Utils.cpp (+68)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomic-cmpxchg-with-success.mir (+10-5)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddo.mir (+18-13)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddsat.mir (+27-18)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ssubo.mir (+18-13)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ssubsat.mir (+27-18)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/Utils.h b/llvm/include/llvm/CodeGen/GlobalISel/Utils.h
index f8900f3434ccaa..fc4ebf795a334e 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/Utils.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/Utils.h
@@ -313,6 +313,10 @@ std::optional<APFloat> ConstantFoldIntToFloat(unsigned Opcode, LLT DstTy,
 std::optional<SmallVector<unsigned>>
 ConstantFoldCTLZ(Register Src, const MachineRegisterInfo &MRI);
 
+std::optional<SmallVector<APInt>>
+ConstantFoldICmp(unsigned Pred, const Register Op1, const Register Op2,
+                 const MachineRegisterInfo &MRI);
+
 /// Test if the given value is known to have exactly one bit set. This differs
 /// from computeKnownBits in that it doesn't necessarily determine which bit is
 /// set.
diff --git a/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp b/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
index 1869e0d41a51f6..44e1be2a5e080d 100644
--- a/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp
@@ -174,6 +174,20 @@ MachineInstrBuilder CSEMIRBuilder::buildInstr(unsigned Opc,
   switch (Opc) {
   default:
     break;
+  case TargetOpcode::G_ICMP: {
+    assert(SrcOps.size() == 3 && "Invalid sources");
+    assert(DstOps.size() == 1 && "Invalid dsts");
+    LLT SrcTy = SrcOps[1].getLLTTy(*getMRI());
+
+    if (std::optional<SmallVector<APInt>> Cst =
+            ConstantFoldICmp(SrcOps[0].getPredicate(), SrcOps[1].getReg(),
+                             SrcOps[2].getReg(), *getMRI())) {
+      if (SrcTy.isVector())
+        return buildBuildVectorConstant(DstOps[0], *Cst);
+      return buildConstant(DstOps[0], Cst->front());
+    }
+    break;
+  }
   case TargetOpcode::G_ADD:
   case TargetOpcode::G_PTR_ADD:
   case TargetOpcode::G_AND:
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 1b25da8833e4fb..852cffda82ddef 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -3741,9 +3741,11 @@ LegalizerHelper::lower(MachineInstr &MI, unsigned TypeIdx, LLT LowerHintTy) {
   }
   case TargetOpcode::G_ATOMIC_CMPXCHG_WITH_SUCCESS: {
     auto [OldValRes, SuccessRes, Addr, CmpVal, NewVal] = MI.getFirst5Regs();
-    MIRBuilder.buildAtomicCmpXchg(OldValRes, Addr, CmpVal, NewVal,
+    auto Tmp = MRI.createGenericVirtualRegister(MRI.getType(OldValRes));
+    MIRBuilder.buildAtomicCmpXchg(Tmp, Addr, CmpVal, NewVal,
                                   **MI.memoperands_begin());
-    MIRBuilder.buildICmp(CmpInst::ICMP_EQ, SuccessRes, OldValRes, CmpVal);
+    MIRBuilder.buildCopy(OldValRes, Tmp);
+    MIRBuilder.buildICmp(CmpInst::ICMP_EQ, SuccessRes, Tmp, CmpVal);
     MI.eraseFromParent();
     return Legalized;
   }
@@ -7622,10 +7624,14 @@ LegalizerHelper::lowerSADDO_SSUBO(MachineInstr &MI) {
   LLT Ty = Dst0Ty;
   LLT BoolTy = Dst1Ty;
 
+  auto Tmp = MRI.createGenericVirtualRegister(MRI.getType(Dst0));
+
   if (IsAdd)
-    MIRBuilder.buildAdd(Dst0, LHS, RHS);
+    MIRBuilder.buildAdd(Tmp, LHS, RHS);
   else
-    MIRBuilder.buildSub(Dst0, LHS, RHS);
+    MIRBuilder.buildSub(Tmp, LHS, RHS);
+
+  MIRBuilder.buildCopy(Dst0, Tmp);
 
   // TODO: If SADDSAT/SSUBSAT is legal, compare results to detect overflow.
 
@@ -7638,7 +7644,7 @@ LegalizerHelper::lowerSADDO_SSUBO(MachineInstr &MI) {
   // (LHS) if and only if the other operand (RHS) is (non-zero) positive,
   // otherwise there will be overflow.
   auto ResultLowerThanLHS =
-      MIRBuilder.buildICmp(CmpInst::ICMP_SLT, BoolTy, Dst0, LHS);
+      MIRBuilder.buildICmp(CmpInst::ICMP_SLT, BoolTy, Tmp, LHS);
   auto ConditionRHS = MIRBuilder.buildICmp(
       IsAdd ? CmpInst::ICMP_SLT : CmpInst::ICMP_SGT, BoolTy, RHS, Zero);
 
diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index a9fa73b60a097f..ce3062270f8e6d 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -996,6 +996,74 @@ llvm::ConstantFoldCTLZ(Register Src, const MachineRegisterInfo &MRI) {
   return std::nullopt;
 }
 
+std::optional<SmallVector<APInt>>
+llvm::ConstantFoldICmp(unsigned Pred, const Register Op1, const Register Op2,
+                       const MachineRegisterInfo &MRI) {
+  LLT Ty = MRI.getType(Op1);
+  if (Ty != MRI.getType(Op2))
+    return std::nullopt;
+
+  auto TryFoldScalar = [&MRI, Pred](Register LHS,
+                                    Register RHS) -> std::optional<APInt> {
+    auto LHSCst = getIConstantVRegVal(LHS, MRI);
+    auto RHSCst = getIConstantVRegVal(RHS, MRI);
+    if (!LHSCst || !RHSCst)
+      return std::nullopt;
+
+    switch (Pred) {
+    case CmpInst::Predicate::ICMP_EQ:
+      return APInt(/*numBits=*/1, LHSCst->eq(*RHSCst));
+    case CmpInst::Predicate::ICMP_NE:
+      return APInt(/*numBits=*/1, LHSCst->ne(*RHSCst));
+    case CmpInst::Predicate::ICMP_UGT:
+      return APInt(/*numBits=*/1, LHSCst->ugt(*RHSCst));
+    case CmpInst::Predicate::ICMP_UGE:
+      return APInt(/*numBits=*/1, LHSCst->uge(*RHSCst));
+    case CmpInst::Predicate::ICMP_ULT:
+      return APInt(/*numBits=*/1, LHSCst->ult(*RHSCst));
+    case CmpInst::Predicate::ICMP_ULE:
+      return APInt(/*numBits=*/1, LHSCst->ule(*RHSCst));
+    case CmpInst::Predicate::ICMP_SGT:
+      return APInt(/*numBits=*/1, LHSCst->sgt(*RHSCst));
+    case CmpInst::Predicate::ICMP_SGE:
+      return APInt(/*numBits=*/1, LHSCst->sge(*RHSCst));
+    case CmpInst::Predicate::ICMP_SLT:
+      return APInt(/*numBits=*/1, LHSCst->slt(*RHSCst));
+    case CmpInst::Predicate::ICMP_SLE:
+      return APInt(/*numBits=*/1, LHSCst->sle(*RHSCst));
+    default:
+      return std::nullopt;
+    }
+  };
+
+  SmallVector<APInt> FoldedICmps;
+
+  if (Ty.isVector()) {
+    // Try to constant fold each element.
+    auto *BV1 = getOpcodeDef<GBuildVector>(Op1, MRI);
+    auto *BV2 = getOpcodeDef<GBuildVector>(Op2, MRI);
+    if (!BV1 || !BV2)
+      return std::nullopt;
+    assert(BV1->getNumSources() == BV2->getNumSources() && "Invalid vectors");
+    for (unsigned I = 0; I < BV1->getNumSources(); ++I) {
+      if (auto MaybeFold =
+              TryFoldScalar(BV1->getSourceReg(I), BV2->getSourceReg(I))) {
+        FoldedICmps.emplace_back(*MaybeFold);
+        continue;
+      }
+      return std::nullopt;
+    }
+    return FoldedICmps;
+  }
+
+  if (auto MaybeCst = TryFoldScalar(Op1, Op2)) {
+    FoldedICmps.emplace_back(*MaybeCst);
+    return FoldedICmps;
+  }
+
+  return std::nullopt;
+}
+
 bool llvm::isKnownToBeAPowerOfTwo(Register Reg, const MachineRegisterInfo &MRI,
                                   GISelKnownBits *KB) {
   std::optional<DefinitionAndSourceRegister> DefSrcReg =
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomic-cmpxchg-with-success.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomic-cmpxchg-with-success.mir
index e288d9d5ab3c09..171a8551ae69a9 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomic-cmpxchg-with-success.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-atomic-cmpxchg-with-success.mir
@@ -15,8 +15,9 @@ body: |
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr3
     ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[COPY2]](s32), [[COPY1]](s32)
     ; CHECK-NEXT: [[AMDGPU_ATOMIC_CMPXCHG:%[0-9]+]]:_(s32) = G_AMDGPU_ATOMIC_CMPXCHG [[COPY]](p1), [[BUILD_VECTOR]] :: (load store syncscope("agent-one-as") monotonic monotonic (s32), addrspace 1)
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY [[AMDGPU_ATOMIC_CMPXCHG]](s32)
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[AMDGPU_ATOMIC_CMPXCHG]](s32), [[COPY1]]
-    ; CHECK-NEXT: S_ENDPGM 0, implicit [[AMDGPU_ATOMIC_CMPXCHG]](s32), implicit [[ICMP]](s1)
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[COPY3]](s32), implicit [[ICMP]](s1)
   %0:_(p1) = COPY $vgpr0_vgpr1
   %1:_(s32) = COPY $vgpr2
   %2:_(s32) = COPY $vgpr3
@@ -39,8 +40,9 @@ body: |
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr3
     ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[COPY2]](s32), [[COPY1]](s32)
     ; CHECK-NEXT: [[AMDGPU_ATOMIC_CMPXCHG:%[0-9]+]]:_(s32) = G_AMDGPU_ATOMIC_CMPXCHG [[COPY]](p0), [[BUILD_VECTOR]] :: (load store syncscope("agent-one-as") monotonic monotonic (s32))
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY [[AMDGPU_ATOMIC_CMPXCHG]](s32)
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[AMDGPU_ATOMIC_CMPXCHG]](s32), [[COPY1]]
-    ; CHECK-NEXT: S_ENDPGM 0, implicit [[AMDGPU_ATOMIC_CMPXCHG]](s32), implicit [[ICMP]](s1)
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[COPY3]](s32), implicit [[ICMP]](s1)
   %0:_(p0) = COPY $vgpr0_vgpr1
   %1:_(s32) = COPY $vgpr2
   %2:_(s32) = COPY $vgpr3
@@ -62,8 +64,9 @@ body: |
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
     ; CHECK-NEXT: [[ATOMIC_CMPXCHG:%[0-9]+]]:_(s32) = G_ATOMIC_CMPXCHG [[COPY]](p3), [[COPY1]], [[COPY2]] :: (load store syncscope("agent-one-as") monotonic monotonic (s32), addrspace 3)
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY [[ATOMIC_CMPXCHG]](s32)
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[ATOMIC_CMPXCHG]](s32), [[COPY1]]
-    ; CHECK-NEXT: S_ENDPGM 0, implicit [[ATOMIC_CMPXCHG]](s32), implicit [[ICMP]](s1)
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[COPY3]](s32), implicit [[ICMP]](s1)
   %0:_(p3) = COPY $vgpr0
   %1:_(s32) = COPY $vgpr1
   %2:_(s32) = COPY $vgpr2
@@ -86,8 +89,9 @@ body: |
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s64) = COPY $vgpr4_vgpr5
     ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR [[COPY2]](s64), [[COPY1]](s64)
     ; CHECK-NEXT: [[AMDGPU_ATOMIC_CMPXCHG:%[0-9]+]]:_(s64) = G_AMDGPU_ATOMIC_CMPXCHG [[COPY]](p1), [[BUILD_VECTOR]] :: (load store syncscope("agent-one-as") monotonic monotonic (s64), addrspace 1)
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s64) = COPY [[AMDGPU_ATOMIC_CMPXCHG]](s64)
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[AMDGPU_ATOMIC_CMPXCHG]](s64), [[COPY1]]
-    ; CHECK-NEXT: S_ENDPGM 0, implicit [[AMDGPU_ATOMIC_CMPXCHG]](s64), implicit [[ICMP]](s1)
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[COPY3]](s64), implicit [[ICMP]](s1)
   %0:_(p1) = COPY $vgpr0_vgpr1
   %1:_(s64) = COPY $vgpr2_vgpr3
   %2:_(s64) = COPY $vgpr4_vgpr5
@@ -109,8 +113,9 @@ body: |
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr1_vgpr2
     ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s64) = COPY $vgpr3_vgpr4
     ; CHECK-NEXT: [[ATOMIC_CMPXCHG:%[0-9]+]]:_(s64) = G_ATOMIC_CMPXCHG [[COPY]](p3), [[COPY1]], [[COPY2]] :: (load store syncscope("agent-one-as") monotonic monotonic (s64), addrspace 3)
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s64) = COPY [[ATOMIC_CMPXCHG]](s64)
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[ATOMIC_CMPXCHG]](s64), [[COPY1]]
-    ; CHECK-NEXT: S_ENDPGM 0, implicit [[ATOMIC_CMPXCHG]](s64), implicit [[ICMP]](s1)
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[COPY3]](s64), implicit [[ICMP]](s1)
   %0:_(p3) = COPY $vgpr0
   %1:_(s64) = COPY $vgpr1_vgpr2
   %2:_(s64) = COPY $vgpr3_vgpr4
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddo.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddo.mir
index dba20e128237cd..bd932e08f4a7a9 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddo.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddo.mir
@@ -82,12 +82,13 @@ body: |
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
     ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[COPY]], [[COPY1]]
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY [[ADD]](s32)
     ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[ADD]](s32), [[COPY]]
     ; CHECK-NEXT: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[COPY1]](s32), [[C]]
     ; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s1) = G_XOR [[ICMP1]], [[ICMP]]
     ; CHECK-NEXT: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[XOR]](s1)
-    ; CHECK-NEXT: $vgpr0 = COPY [[ADD]](s32)
+    ; CHECK-NEXT: $vgpr0 = COPY [[COPY2]](s32)
     ; CHECK-NEXT: $vgpr1 = COPY [[ZEXT]](s32)
     %0:_(s32) = COPY $vgpr0
     %1:_(s32) = COPY $vgpr1
@@ -113,12 +114,13 @@ body: |
     ; CHECK-NEXT: [[UADDO:%[0-9]+]]:_(s32), [[UADDO1:%[0-9]+]]:_(s1) = G_UADDO [[UV]], [[UV2]]
     ; CHECK-NEXT: [[UADDE:%[0-9]+]]:_(s32), [[UADDE1:%[0-9]+]]:_(s1) = G_UADDE [[UV1]], [[UV3]], [[UADDO1]]
     ; CHECK-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[UADDO]](s32), [[UADDE]](s32)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s64) = COPY [[MV]](s64)
     ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[MV]](s64), [[COPY]]
     ; CHECK-NEXT: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[COPY1]](s64), [[C]]
     ; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s1) = G_XOR [[ICMP1]], [[ICMP]]
     ; CHECK-NEXT: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[XOR]](s1)
-    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[MV]](s64)
+    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[COPY2]](s64)
     ; CHECK-NEXT: $vgpr2 = COPY [[ZEXT]](s32)
     %0:_(s64) = COPY $vgpr0_vgpr1
     %1:_(s64) = COPY $vgpr2_vgpr3
@@ -152,6 +154,7 @@ body: |
     ; CHECK-NEXT: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND1]], [[C]](s32)
     ; CHECK-NEXT: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND]], [[SHL]]
     ; CHECK-NEXT: [[BITCAST2:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(<2 x s16>) = COPY [[BITCAST2]](<2 x s16>)
     ; CHECK-NEXT: [[BITCAST3:%[0-9]+]]:_(s32) = G_BITCAST [[COPY]](<2 x s16>)
     ; CHECK-NEXT: [[LSHR2:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST3]], [[C]](s32)
     ; CHECK-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[ADD]], 16
@@ -164,8 +167,8 @@ body: |
     ; CHECK-NEXT: [[LSHR3:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST4]], [[C]](s32)
     ; CHECK-NEXT: [[SEXT_INREG4:%[0-9]+]]:_(s32) = G_SEXT_INREG [[BITCAST4]], 16
     ; CHECK-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
-    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY [[C2]](s32)
-    ; CHECK-NEXT: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG4]](s32), [[COPY2]]
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY [[C2]](s32)
+    ; CHECK-NEXT: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG4]](s32), [[COPY3]]
     ; CHECK-NEXT: [[SEXT_INREG5:%[0-9]+]]:_(s32) = G_SEXT_INREG [[LSHR3]], 16
     ; CHECK-NEXT: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG5]](s32), [[C2]]
     ; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s1) = G_XOR [[ICMP2]], [[ICMP]]
@@ -176,7 +179,7 @@ body: |
     ; CHECK-NEXT: [[AND2:%[0-9]+]]:_(s32) = G_AND [[ANYEXT]], [[C3]]
     ; CHECK-NEXT: [[AND3:%[0-9]+]]:_(s32) = G_AND [[ANYEXT1]], [[C3]]
     ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[AND2]](s32), [[AND3]](s32)
-    ; CHECK-NEXT: $vgpr0 = COPY [[BITCAST2]](<2 x s16>)
+    ; CHECK-NEXT: $vgpr0 = COPY [[COPY2]](<2 x s16>)
     ; CHECK-NEXT: $vgpr1_vgpr2 = COPY [[BUILD_VECTOR]](<2 x s32>)
     %0:_(<2 x s16>) = COPY $vgpr0
     %1:_(<2 x s16>) = COPY $vgpr1
@@ -318,6 +321,7 @@ body: |
     ; CHECK-NEXT: [[OR1:%[0-9]+]]:_(s32) = G_OR [[AND2]], [[SHL1]]
     ; CHECK-NEXT: [[BITCAST5:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR1]](s32)
     ; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<4 x s16>) = G_CONCAT_VECTORS [[BITCAST4]](<2 x s16>), [[BITCAST5]](<2 x s16>)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(<4 x s16>) = COPY [[CONCAT_VECTORS]](<4 x s16>)
     ; CHECK-NEXT: [[UV4:%[0-9]+]]:_(<2 x s16>), [[UV5:%[0-9]+]]:_(<2 x s16>) = G_UNMERGE_VALUES [[COPY]](<4 x s16>)
     ; CHECK-NEXT: [[BITCAST6:%[0-9]+]]:_(s32) = G_BITCAST [[UV4]](<2 x s16>)
     ; CHECK-NEXT: [[LSHR4:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST6]], [[C]](s32)
@@ -342,14 +346,14 @@ body: |
     ; CHECK-NEXT: [[LSHR7:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST9]], [[C]](s32)
     ; CHECK-NEXT: [[SEXT_INREG8:%[0-9]+]]:_(s32) = G_SEXT_INREG [[BITCAST8]], 16
     ; CHECK-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
-    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY [[C2]](s32)
-    ; CHECK-NEXT: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG8]](s32), [[COPY2]]
-    ; CHECK-NEXT: [[SEXT_INREG9:%[0-9]+]]:_(s32) = G_SEXT_INREG [[LSHR6]], 16
     ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY [[C2]](s32)
-    ; CHECK-NEXT: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG9]](s32), [[COPY3]]
-    ; CHECK-NEXT: [[SEXT_INREG10:%[0-9]+]]:_(s32) = G_SEXT_INREG [[BITCAST9]], 16
+    ; CHECK-NEXT: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG8]](s32), [[COPY3]]
+    ; CHECK-NEXT: [[SEXT_INREG9:%[0-9]+]]:_(s32) = G_SEXT_INREG [[LSHR6]], 16
     ; CHECK-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY [[C2]](s32)
-    ; CHECK-NEXT: [[ICMP6:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG10]](s32), [[COPY4]]
+    ; CHECK-NEXT: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG9]](s32), [[COPY4]]
+    ; CHECK-NEXT: [[SEXT_INREG10:%[0-9]+]]:_(s32) = G_SEXT_INREG [[BITCAST9]], 16
+    ; CHECK-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY [[C2]](s32)
+    ; CHECK-NEXT: [[ICMP6:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG10]](s32), [[COPY5]]
     ; CHECK-NEXT: [[SEXT_INREG11:%[0-9]+]]:_(s32) = G_SEXT_INREG [[LSHR7]], 16
     ; CHECK-NEXT: [[ICMP7:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SEXT_INREG11]](s32), [[C2]]
     ; CHECK-NEXT: [[XOR:%[0-9]+]]:_(s1) = G_XOR [[ICMP4]], [[ICMP]]
@@ -366,7 +370,7 @@ body: |
     ; CHECK-NEXT: [[AND6:%[0-9]+]]:_(s32) = G_AND [[ANYEXT2]], [[C3]]
     ; CHECK-NEXT: [[AND7:%[0-9]+]]:_(s32) = G_AND [[ANYEXT3]], [[C3]]
     ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[AND4]](s32), [[AND5]](s32), [[AND6]](s32), [[AND7]](s32)
-    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[CONCAT_VECTORS]](<4 x s16>)
+    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[COPY2]](<4 x s16>)
     ; CHECK-NEXT: $vgpr2_vgpr3_vgpr4_vgpr5 = COPY [[BUILD_VECTOR]](<4 x s32>)
     %0:_(<4 x s16>) = COPY $vgpr0_vgpr1
     %1:_(<4 x s16>) = COPY $vgpr1_vgpr2
@@ -392,6 +396,7 @@ body: |
     ; CHECK-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[UV]], [[UV2]]
     ; CHECK-NEXT: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[UV1]], [[UV3]]
     ; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[ADD]](s32), [[ADD1]](s32)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(<2 x s32>) = COPY [[BUILD_VECTOR]](<2 x s32>)
     ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
     ; CHECK-NEXT: [[UV4:%[0-9]+]]:_(s32), [[UV5:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
     ; CHECK-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[ADD]](s32), [[UV4]]
@@ -407,7 +412,7 @@ body: |
     ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s32) = G_AND [[ANYEXT]], [[C1]]
     ; CHECK-NEXT: [[AND1:%[0-9]+]]:_(s32) = G_AND [[ANYEXT1]], [[C1]]
     ; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[AND]](s32), [[AND1]](s32)
-    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
+    ; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[COPY2]](<2 x s32>)
     ; CHECK-NEXT: $vgpr2_vgpr3 = COPY [[BUILD_VECTOR1]](<2 x s32>)
     %0:_(<2 x s32>) = COPY $vgpr0_vgpr1
     %1:_(<2 x s32>) = COPY $vgpr2_vgpr3
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddsat.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddsat.mir
index 93d00714158be5..eabfcc447697a5 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddsat.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddsat.mir
@@ -951,19 +951,20 @@ body: |
     ; GFX6-NEXT: [[UADDO:%[0-9]+]]:_(s32), [[UADDO1:%[0-9]+]]:_(s1) = G_UADDO [[UV]], [[UV2]]
     ; GFX6-NEXT: [[UADDE:%[0-9]+]]:_(s32), [[UADDE1:%[0-9]+]]:_(s1) = G_UADDE [[UV1]], [[UV3]], [[UADDO1]]
     ; GFX6-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[UADDO]](s32), [[UADDE]](s32)
+    ; GFX6-NEXT: [[COPY2:%[0-9]+]]:_(s64) = COPY [[MV]](s64)
     ; GFX6-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
     ; GFX6-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[MV]](s64), [[COPY]]
     ; GFX6-NEXT: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[COPY1]](s64), [[C]]
     ; GFX6-NEXT: [[XOR:%[0-9]+]]:_(s1) = G_XOR [[ICMP1]], [[ICMP]]
     ; GFX6-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 63
-    ; GFX6-NEXT: [[ASHR:%[0-9]+]]:_(s64) = G_ASHR [[MV]], [[C1]](s32)
+    ; GFX6-NEXT: [[ASHR:%[0-9]+]]:_(s64) = G_ASHR [[COPY2]], [[C1]](s32)
     ; GFX6-NEXT: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 -9223372036854775808
     ; GFX6-NEXT: [[UV...
[truncated]

shiltian · 2024-03-28T14:29:11Z

@aemerson @arsenm
I think we should really avoid using a defined register as destination operand for other new instructions. This basically breaks the SSA rule. If there is no constant folding when building a new instruction between the two same definitions, it doesn't have any issue so far. People will be likely to run into a similar issue as here in the future if they want to add new constant folding. In the combiner, most of the functions define a new vreg for the destination, and then replace the original one with the new one. The replacement, depending on the original register, could be a G_COPY, like what we are doing here. In fact, the build of G_COPY kinda also breaks the SSA rule because it uses a defined register as destination. I think that should be the only exception, and there should not be any other kind of new instructions built between this point and the erase of the original instruction. Interestingly, it may not always work if we erase the original instruction first and then build others because the builder will complain that the insert point is out of the basic block.

aemerson · 2024-03-28T16:54:09Z

@aemerson @arsenm I think we should really avoid using a defined register as destination operand for other new instructions. This basically breaks the SSA rule. If there is no constant folding when building a new instruction between the two same definitions, it doesn't have any issue so far. People will be likely to run into a similar issue as here in the future if they want to add new constant folding. In the combiner, most of the functions define a new vreg for the destination, and then replace the original one with the new one. The replacement, depending on the original register, could be a G_COPY, like what we are doing here. In fact, the build of G_COPY kinda also breaks the SSA rule because it uses a defined register as destination. I think that should be the only exception, and there should not be any other kind of new instructions built between this point and the erase of the original instruction. Interestingly, it may not always work if we erase the original instruction first and then build others because the builder will complain that the insert point is out of the basic block.

This sounds reasonable to me but I don't think we should enforce that everywhere, because we use defined regs in the instruction selector pass. In AArch64 we often use MachineIRBuilder's buildInstr(), the non-CSE version that is, to easily generate target specific instructions.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

arsenm · 2024-03-29T07:26:54Z

This sounds reasonable to me but I don't think we should enforce that everywhere, because we use defined regs in the instruction selector pass. In AArch64 we often use MachineIRBuilder's buildInstr(), the non-CSE version that is, to easily generate target specific instructions.

I don't think the selector should be using MachineIRBuilder like this, and we shouldn't design API around this usage. It's not really saving you much over using a raw BuildMI call

arsenm · 2024-03-29T07:29:51Z

llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp

+    assert(DstOps.size() == 1 && "Invalid dsts");
+    LLT SrcTy = SrcOps[1].getLLTTy(*getMRI());
+
+    if (std::optional<SmallVector<APInt>> Cst =


I don't see the point of using APInt in this interface. You're discarding the used result type here, and then forcing it into a 1-bit APInt. Either preserve the original result type, or just return a bool?

There is a type check in MachineIRBuilder::buildConstant to make sure their bits are same. The use of APInt is to unify the ConstantFoldICmp interface to make it support both vector and scalar value.

We could add a non-APInt version of buildBuildVectorConstant

In that case it would accept ArrayRef<int64_t> I suppose, which still needs to have an explicit construction of that array because we can't have implicit cast from, say ArrayRef<bool> to ArrayRef<int64_t>. That doesn't look too much different from ArrayRef<APInt>.

This patch tries to fold `G_ICMP` if possible.

shiltian requested review from aemerson, jayfoad and arsenm March 22, 2024 23:22

shiltian commented Mar 22, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/Utils.cpp Show resolved Hide resolved

shiltian mentioned this pull request Mar 22, 2024

[GlobalISel] Handle div-by-pow2 #83155

Merged

shiltian force-pushed the fold-icmp branch 3 times, most recently from 4eb7a81 to 19c86d9 Compare March 25, 2024 21:08

shiltian commented Mar 25, 2024

View reviewed changes

arsenm reviewed Mar 26, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp Outdated Show resolved Hide resolved

shiltian commented Mar 28, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp Show resolved Hide resolved

arsenm reviewed Mar 28, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp Outdated Show resolved Hide resolved

arsenm reviewed Mar 28, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp Outdated Show resolved Hide resolved

arsenm added the llvm:globalisel label Mar 28, 2024

shiltian force-pushed the fold-icmp branch from 19c86d9 to f462e85 Compare March 28, 2024 14:18

aemerson reviewed Mar 28, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp Outdated Show resolved Hide resolved

aemerson approved these changes Mar 28, 2024

View reviewed changes

shiltian force-pushed the fold-icmp branch from f462e85 to e8e7150 Compare March 28, 2024 21:36

shiltian marked this pull request as ready for review March 28, 2024 21:36

llvmbot added the backend:AMDGPU label Mar 28, 2024

shiltian force-pushed the fold-icmp branch from e8e7150 to 7f88b48 Compare March 29, 2024 00:14

llvmbot added the backend:AArch64 label Mar 29, 2024

arsenm reviewed Mar 29, 2024

View reviewed changes

shiltian force-pushed the fold-icmp branch from 7f88b48 to 001a7eb Compare March 29, 2024 17:44

[GlobalISel] Fold G_ICMP if possible

f8b4b32

This patch tries to fold `G_ICMP` if possible.

shiltian force-pushed the fold-icmp branch from 001a7eb to f8b4b32 Compare March 29, 2024 18:01

shiltian merged commit 3a106e5 into llvm:main Mar 29, 2024

shiltian deleted the fold-icmp branch March 29, 2024 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GlobalISel] Fold G_ICMP if possible #86357

[GlobalISel] Fold G_ICMP if possible #86357

shiltian commented Mar 22, 2024

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2024

Uh oh!

github-actions bot commented Mar 22, 2024 •

edited

Loading

Uh oh!

Uh oh!

shiltian Mar 25, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Mar 28, 2024 •

edited

Loading

Uh oh!

shiltian commented Mar 28, 2024 •

edited

Loading

Uh oh!

aemerson commented Mar 28, 2024

Uh oh!

Uh oh!

arsenm commented Mar 29, 2024

Uh oh!

arsenm Mar 29, 2024

Uh oh!

shiltian Mar 29, 2024

Uh oh!

arsenm Mar 29, 2024

Uh oh!

shiltian Mar 29, 2024

Uh oh!

Uh oh!

[GlobalISel] Fold G_ICMP if possible #86357

[GlobalISel] Fold G_ICMP if possible #86357

Conversation

shiltian commented Mar 22, 2024

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2024

Uh oh!

github-actions bot commented Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shiltian Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aemerson commented Mar 28, 2024

Uh oh!

Uh oh!

arsenm commented Mar 29, 2024

Uh oh!

arsenm Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

shiltian Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

shiltian Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2024 •

edited

Loading

shiltian Mar 25, 2024 •

edited

Loading

llvmbot commented Mar 28, 2024 •

edited

Loading

shiltian commented Mar 28, 2024 •

edited

Loading