Skip to content

[InstCombine] Canonicalize Bit Testing by Shifting to Sign Bit #101822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

mskamp
Copy link
Contributor

@mskamp mskamp commented Aug 3, 2024

Implement a new transformations that folds the bit-testing expression (icmp slt (shl V (sub (bw-1) B)) 0) to (icmp ne (and V (shl 1 B)) 0). Also fold the negated variant of the LHS.

Alive proof: https://alive2.llvm.org/ce/z/5ic_qe

Fixes #86813.

@mskamp mskamp requested a review from nikic as a code owner August 3, 2024 13:16
@llvmbot
Copy link
Member

llvmbot commented Aug 3, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (mskamp)

Changes

Implement two new transformations that fold the following common ways to test if the B-th bit is set in an integer V:

  • (icmp ne (and (lshr V B) 1) 0) --> (icmp ne (and V (shl 1 B)) 0) for constant V. This rule already existed for non-constant V and constants other than 1; this restriction to non-constant V has been added in commit c3b2111 to fix an infinite loop. Avoid the infinite loop by allowing constant V only if
    the shift instruction is an lshr and the constant is 1.
  • (icmp slt (shl V (sub (bw-1) B)) 0) --> (icmp ne (and V (shl 1 B)) 0)

Also fold negated variants of the LHS.

This transformation necessitates an adaption of existing tests in icmp-and-shift.ll and load-cmp.ll. One test in icmp-and-shift.ll, which previously was a negative test, now gets folded. Rename it to indicate that it is a positive test.

Alive proofs:

Fixes #86813.


Full diff: https://github.com/llvm/llvm-project/pull/101822.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+19-5)
  • (modified) llvm/test/Transforms/InstCombine/icmp-and-shift.ll (+324-5)
  • (modified) llvm/test/Transforms/InstCombine/load-cmp.ll (+4-4)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 3b6df2760ecc2..ce3b95de828bd 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -1725,7 +1725,8 @@ Instruction *InstCombinerImpl::foldICmpAndShift(ICmpInst &Cmp,
   // preferable because it allows the C2 << Y expression to be hoisted out of a
   // loop if Y is invariant and X is not.
   if (Shift->hasOneUse() && C1.isZero() && Cmp.isEquality() &&
-      !Shift->isArithmeticShift() && !isa<Constant>(Shift->getOperand(0))) {
+      !Shift->isArithmeticShift() &&
+      ((!IsShl && C2.isOne()) || !isa<Constant>(Shift->getOperand(0)))) {
     // Compute C2 << Y.
     Value *NewShift =
         IsShl ? Builder.CreateLShr(And->getOperand(1), Shift->getOperand(1))
@@ -2304,19 +2305,32 @@ Instruction *InstCombinerImpl::foldICmpShlConstant(ICmpInst &Cmp,
     if (C.isZero() || (Pred == ICmpInst::ICMP_SGT ? C.isAllOnes() : C.isOne()))
       return new ICmpInst(Pred, Shl->getOperand(0), Cmp.getOperand(1));
 
+  unsigned TypeBits = C.getBitWidth();
+  Value *X = Shl->getOperand(0);
+  Type *ShType = Shl->getType();
+
+  // (icmp slt (shl X, (sub bw, Y)), 0)  --> (icmp ne (and X, (shl 1, Y)), 0)
+  // (icmp sgt (shl X, (sub bw, Y)), -1) --> (icmp eq (and X, (shl 1, Y)), 0)
+  if (Value * Y;
+      Shl->hasOneUse() &&
+      (Pred == ICmpInst::ICMP_SLT || Pred == ICmpInst::ICMP_SGT) &&
+      (Pred == ICmpInst::ICMP_SLT ? C.isZero() : C.isAllOnes()) &&
+      match(Shl->getOperand(1),
+            m_OneUse(m_Sub(m_SpecificInt(TypeBits - 1), m_Value(Y)))))
+    return new ICmpInst(
+        Pred == ICmpInst::ICMP_SLT ? ICmpInst::ICMP_NE : ICmpInst::ICMP_EQ,
+        Builder.CreateAnd(X, Builder.CreateShl(ConstantInt::get(ShType, 1), Y)),
+        ConstantInt::get(ShType, 0));
+
   const APInt *ShiftAmt;
   if (!match(Shl->getOperand(1), m_APInt(ShiftAmt)))
     return foldICmpShlOne(Cmp, Shl, C);
 
   // Check that the shift amount is in range. If not, don't perform undefined
   // shifts. When the shift is visited, it will be simplified.
-  unsigned TypeBits = C.getBitWidth();
   if (ShiftAmt->uge(TypeBits))
     return nullptr;
 
-  Value *X = Shl->getOperand(0);
-  Type *ShType = Shl->getType();
-
   // NSW guarantees that we are only shifting out sign bits from the high bits,
   // so we can ASHR the compare constant without needing a mask and eliminate
   // the shift.
diff --git a/llvm/test/Transforms/InstCombine/icmp-and-shift.ll b/llvm/test/Transforms/InstCombine/icmp-and-shift.ll
index 08d23e84c3960..a88c5b0f421ce 100644
--- a/llvm/test/Transforms/InstCombine/icmp-and-shift.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-and-shift.ll
@@ -404,11 +404,10 @@ define <2 x i32> @icmp_ne_and_pow2_lshr_pow2_vec(<2 x i32> %0) {
   ret <2 x i32> %conv
 }
 
-define i32 @icmp_eq_and1_lshr_pow2_negative1(i32 %0) {
-; CHECK-LABEL: @icmp_eq_and1_lshr_pow2_negative1(
-; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 7, [[TMP0:%.*]]
-; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], 1
-; CHECK-NEXT:    [[CONV:%.*]] = xor i32 [[AND]], 1
+define i32 @icmp_eq_and1_lshr_pow2_minus_one(i32 %0) {
+; CHECK-LABEL: @icmp_eq_and1_lshr_pow2_minus_one(
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ugt i32 [[TMP0:%.*]], 2
+; CHECK-NEXT:    [[CONV:%.*]] = zext i1 [[CMP]] to i32
 ; CHECK-NEXT:    ret i32 [[CONV]]
 ;
   %lshr = lshr i32 7, %0
@@ -606,3 +605,323 @@ define i1 @fold_ne_rhs_fail_shift_not_1s(i8 %x, i8 %yy) {
   %r = icmp ne i8 %and, 0
   ret i1 %r
 }
+
+define i1 @test_shr_and_1_ne_0(i32 %a, i32 %b) {
+; CHECK-LABEL: @test_shr_and_1_ne_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], [[A:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[TMP2]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %shr = lshr i32 %a, %b
+  %and = and i32 %shr, 1
+  %cmp = icmp ne i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_const_shr_and_1_ne_0(i32 %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[TMP1]], 42
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[AND]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp = icmp ne i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_not_const_shr_and_1_ne_0(i32 %b) {
+; CHECK-LABEL: @test_not_const_shr_and_1_ne_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[TMP1]], 42
+; CHECK-NEXT:    [[CMP_NOT:%.*]] = icmp eq i32 [[AND]], 0
+; CHECK-NEXT:    ret i1 [[CMP_NOT]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp = icmp eq i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_const_shr_exact_and_1_ne_0(i32 %b) {
+; CHECK-LABEL: @test_const_shr_exact_and_1_ne_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[TMP1]], 42
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[AND]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %shr = lshr exact i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp = icmp ne i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_const_shr_and_2_ne_0_negative(i32 %b) {
+; CHECK-LABEL: @test_const_shr_and_2_ne_0_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 42, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[SHR]], 2
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[AND]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 2
+  %cmp = icmp eq i32 %and, 0
+  ret i1 %cmp
+}
+
+define <8 x i1> @test_const_shr_and_1_ne_0_v8i8_splat_negative(<8 x i8> %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_v8i8_splat_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr <8 x i8> <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42>, [[B:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = trunc <8 x i8> [[SHR]] to <8 x i1>
+; CHECK-NEXT:    ret <8 x i1> [[CMP]]
+;
+  %shr = lshr <8 x i8> <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42>, %b
+  %and = and <8 x i8> %shr, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
+  %cmp = icmp ne <8 x i8> %and, zeroinitializer
+  ret <8 x i1> %cmp
+}
+
+define <8 x i1> @test_const_shr_and_1_ne_0_v8i8_nonsplat_negative(<8 x i8> %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_v8i8_nonsplat_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr <8 x i8> <i8 42, i8 43, i8 44, i8 45, i8 46, i8 47, i8 48, i8 49>, [[B:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = trunc <8 x i8> [[SHR]] to <8 x i1>
+; CHECK-NEXT:    ret <8 x i1> [[CMP]]
+;
+  %shr = lshr <8 x i8> <i8 42, i8 43, i8 44, i8 45, i8 46, i8 47, i8 48, i8 49>, %b
+  %and = and <8 x i8> %shr, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
+  %cmp = icmp ne <8 x i8> %and, zeroinitializer
+  ret <8 x i1> %cmp
+}
+
+define i1 @test_const_shr_and_1_ne_0_i1_negative(i1 %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_i1_negative(
+; CHECK-NEXT:    ret i1 true
+;
+  %shr = lshr i1 1, %b
+  %and = and i1 %shr, 1
+  %cmp = icmp ne i1 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_const_shr_and_1_ne_0_multi_use_lshr_negative(i32 %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_multi_use_lshr_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 42, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[SHR]], 1
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp ne i32 [[AND]], 0
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[SHR]], [[B]]
+; CHECK-NEXT:    [[RET:%.*]] = and i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[RET]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp1 = icmp ne i32 %and, 0
+  %cmp2 = icmp eq i32 %b, %shr
+  %ret = and i1 %cmp1, %cmp2
+  ret i1 %ret
+}
+
+define i1 @test_const_shr_and_1_ne_0_multi_use_and_negative(i32 %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_multi_use_and_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 42, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[SHR]], 1
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp ne i32 [[AND]], 0
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[AND]], [[B]]
+; CHECK-NEXT:    [[RET:%.*]] = and i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[RET]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp1 = icmp ne i32 %and, 0
+  %cmp2 = icmp eq i32 %b, %and
+  %ret = and i1 %cmp1, %cmp2
+  ret i1 %ret
+}
+
+define i1 @test_shl_sub_bw_minus_1_slt_0(i32 %a, i32 %b) {
+; CHECK-LABEL: @test_shl_sub_bw_minus_1_slt_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], [[A:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[TMP2]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %sub = sub i32 31, %b
+  %shl = shl i32 %a, %sub
+  %cmp = icmp slt i32 %shl, 0
+  ret i1 %cmp
+}
+
+define i1 @test_const_shl_sub_bw_minus_1_slt_0(i32 %b) {
+; CHECK-LABEL: @test_const_shl_sub_bw_minus_1_slt_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 42
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[TMP2]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %sub = sub i32 31, %b
+  %shl = shl i32 42, %sub
+  %cmp = icmp slt i32 %shl, 0
+  ret i1 %cmp
+}
+
+define i1 @test_not_shl_sub_bw_minus_1_slt_0(i32 %a, i32 %b) {
+; CHECK-LABEL: @test_not_shl_sub_bw_minus_1_slt_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], [[A:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP2]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %sub = sub i32 31, %b
+  %shl = shl i32 %a, %sub
+  %cmp = icmp sge i32 %shl, 0
+  ret i1 %cmp
+}
+
+define i1 @test_shl_nuw_sub_bw_minus_1_slt_0(i32 %a, i32 %b) {
+; CHECK-LABEL: @test_shl_nuw_sub_bw_minus_1_slt_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], [[A:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[TMP2]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %sub = sub i32 31, %b
+  %shl = shl nuw i32 %a, %sub
+  %cmp = icmp slt i32 %shl, 0
+  ret i1 %cmp
+}
+
+define i1 @test_not_const_shl_sub_bw_minus_1_slt_0(i32 %b) {
+; CHECK-LABEL: @test_not_const_shl_sub_bw_minus_1_slt_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 42
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP2]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %sub = sub i32 31, %b
+  %shl = shl i32 42, %sub
+  %cmp = icmp sge i32 %shl, 0
+  ret i1 %cmp
+}
+
+define <8 x i1> @test_shl_sub_bw_minus_1_slt_0_v8i8(<8 x i8> %a, <8 x i8> %b) {
+; CHECK-LABEL: @test_shl_sub_bw_minus_1_slt_0_v8i8(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw <8 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and <8 x i8> [[TMP1]], [[A:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne <8 x i8> [[TMP2]], zeroinitializer
+; CHECK-NEXT:    ret <8 x i1> [[CMP]]
+;
+  %sub = sub <8 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>, %b
+  %shl = shl <8 x i8> %a, %sub
+  %cmp = icmp slt <8 x i8> %shl, zeroinitializer
+  ret <8 x i1> %cmp
+}
+
+define <8 x i1> @test_const_shl_sub_bw_minus_1_slt_0_v8i8_splat(<8 x i8> %b) {
+; CHECK-LABEL: @test_const_shl_sub_bw_minus_1_slt_0_v8i8_splat(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw <8 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and <8 x i8> [[TMP1]], <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42>
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne <8 x i8> [[TMP2]], zeroinitializer
+; CHECK-NEXT:    ret <8 x i1> [[CMP]]
+;
+  %sub = sub <8 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>, %b
+  %shl = shl <8 x i8> <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42>, %sub
+  %cmp = icmp slt <8 x i8> %shl, zeroinitializer
+  ret <8 x i1> %cmp
+}
+
+define <8 x i1> @test_const_shl_sub_bw_minus_1_slt_0_v8i8_splat_poison_1(<8 x i8> %b) {
+; CHECK-LABEL: @test_const_shl_sub_bw_minus_1_slt_0_v8i8_splat_poison_1(
+; CHECK-NEXT:    [[SUB:%.*]] = sub <8 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 poison>, [[B:%.*]]
+; CHECK-NEXT:    [[SHL:%.*]] = shl <8 x i8> <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42>, [[SUB]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <8 x i8> [[SHL]], zeroinitializer
+; CHECK-NEXT:    ret <8 x i1> [[CMP]]
+;
+  %sub = sub <8 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 poison>, %b
+  %shl = shl <8 x i8> <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42>, %sub
+  %cmp = icmp slt <8 x i8> %shl, zeroinitializer
+  ret <8 x i1> %cmp
+}
+
+define <8 x i1> @test_const_shl_sub_bw_minus_1_slt_0_v8i8_splat_poison_2(<8 x i8> %b) {
+; CHECK-LABEL: @test_const_shl_sub_bw_minus_1_slt_0_v8i8_splat_poison_2(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw <8 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and <8 x i8> [[TMP1]], <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 poison>
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne <8 x i8> [[TMP2]], zeroinitializer
+; CHECK-NEXT:    ret <8 x i1> [[CMP]]
+;
+  %sub = sub <8 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>, %b
+  %shl = shl <8 x i8> <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 poison>, %sub
+  %cmp = icmp slt <8 x i8> %shl, zeroinitializer
+  ret <8 x i1> %cmp
+}
+
+define <8 x i1> @test_const_shl_sub_bw_minus_1_slt_0_v8i8_nonsplat(<8 x i8> %b) {
+; CHECK-LABEL: @test_const_shl_sub_bw_minus_1_slt_0_v8i8_nonsplat(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw <8 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and <8 x i8> [[TMP1]], <i8 42, i8 43, i8 44, i8 45, i8 46, i8 47, i8 48, i8 49>
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne <8 x i8> [[TMP2]], zeroinitializer
+; CHECK-NEXT:    ret <8 x i1> [[CMP]]
+;
+  %sub = sub <8 x i8> <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>, %b
+  %shl = shl <8 x i8> <i8 42, i8 43, i8 44, i8 45, i8 46, i8 47, i8 48, i8 49>, %sub
+  %cmp = icmp slt <8 x i8> %shl, zeroinitializer
+  ret <8 x i1> %cmp
+}
+
+define i1 @test_shl_sub_non_bw_minus_1_slt_0_negative(i32 %a, i32 %b) {
+; CHECK-LABEL: @test_shl_sub_non_bw_minus_1_slt_0_negative(
+; CHECK-NEXT:    [[SUB:%.*]] = sub i32 32, [[B:%.*]]
+; CHECK-NEXT:    [[SHL:%.*]] = shl i32 [[A:%.*]], [[SUB]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[SHL]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %sub = sub i32 32, %b
+  %shl = shl i32 %a, %sub
+  %cmp = icmp slt i32 %shl, 0
+  ret i1 %cmp
+}
+
+define i1 @test_shl_sub_bw_minus_1_slt_0_i1_negative(i1 %a, i1 %b) {
+; CHECK-LABEL: @test_shl_sub_bw_minus_1_slt_0_i1_negative(
+; CHECK-NEXT:    ret i1 [[A:%.*]]
+;
+  %sub = sub i1 0, %b
+  %shl = shl i1 %a, %sub
+  %cmp = icmp slt i1 %shl, 0
+  ret i1 %cmp
+}
+
+define i1 @test_shl_sub_bw_minus_1_slt_0_multi_use_sub_negative(i32 %a, i32 %b) {
+; CHECK-LABEL: @test_shl_sub_bw_minus_1_slt_0_multi_use_sub_negative(
+; CHECK-NEXT:    [[SUB:%.*]] = sub i32 31, [[B:%.*]]
+; CHECK-NEXT:    [[SHL:%.*]] = shl i32 [[A:%.*]], [[SUB]]
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp slt i32 [[SHL]], 0
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp sgt i32 [[SUB]], [[B]]
+; CHECK-NEXT:    [[RET:%.*]] = or i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[RET]]
+;
+  %sub = sub i32 31, %b
+  %shl = shl i32 %a, %sub
+  %cmp1 = icmp slt i32 %shl, 0
+  %cmp2 = icmp slt i32 %b, %sub
+  %ret = or i1 %cmp1, %cmp2
+  ret i1 %ret
+}
+
+define i1 @test_shl_sub_bw_minus_1_slt_0_multi_use_shl_negative(i32 %a, i32 %b) {
+; CHECK-LABEL: @test_shl_sub_bw_minus_1_slt_0_multi_use_shl_negative(
+; CHECK-NEXT:    [[SUB:%.*]] = sub i32 31, [[B:%.*]]
+; CHECK-NEXT:    [[SHL:%.*]] = shl i32 [[A:%.*]], [[SUB]]
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp slt i32 [[SHL]], 0
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[SHL]], [[B]]
+; CHECK-NEXT:    [[RET:%.*]] = and i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[RET]]
+;
+  %sub = sub i32 31, %b
+  %shl = shl i32 %a, %sub
+  %cmp1 = icmp slt i32 %shl, 0
+  %cmp2 = icmp eq i32 %b, %shl
+  %ret = and i1 %cmp1, %cmp2
+  ret i1 %ret
+}
diff --git a/llvm/test/Transforms/InstCombine/load-cmp.ll b/llvm/test/Transforms/InstCombine/load-cmp.ll
index b956de29e0b8d..8e39fe33cded8 100644
--- a/llvm/test/Transforms/InstCombine/load-cmp.ll
+++ b/llvm/test/Transforms/InstCombine/load-cmp.ll
@@ -109,8 +109,8 @@ define i1 @test3(i32 %X) {
 
 define i1 @test4(i32 %X) {
 ; CHECK-LABEL: @test4(
-; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 933, [[X:%.*]]
-; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[X:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 933
 ; CHECK-NEXT:    [[R:%.*]] = icmp ne i32 [[TMP2]], 0
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
@@ -123,8 +123,8 @@ define i1 @test4(i32 %X) {
 define i1 @test4_i16(i16 %X) {
 ; CHECK-LABEL: @test4_i16(
 ; CHECK-NEXT:    [[TMP1:%.*]] = zext nneg i16 [[X:%.*]] to i32
-; CHECK-NEXT:    [[TMP2:%.*]] = lshr i32 933, [[TMP1]]
-; CHECK-NEXT:    [[TMP3:%.*]] = and i32 [[TMP2]], 1
+; CHECK-NEXT:    [[TMP2:%.*]] = shl nuw i32 1, [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = and i32 [[TMP2]], 933
 ; CHECK-NEXT:    [[R:%.*]] = icmp ne i32 [[TMP3]], 0
 ; CHECK-NEXT:    ret i1 [[R]]
 ;

@dtcxzyw dtcxzyw requested review from dtcxzyw and goldsteinn August 3, 2024 15:17
@mskamp mskamp force-pushed the fix_86813_test_bit_canonicalization branch from ba7461f to 6325173 Compare August 3, 2024 17:38
@mskamp mskamp changed the title [InstCombine] Canonicalize Bit Testing [InstCombine] Canonicalize Bit Testing by Shifting to Sign Bit Aug 3, 2024
dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Aug 6, 2024
@mskamp mskamp force-pushed the fix_86813_test_bit_canonicalization branch from 6325173 to 32d94c8 Compare August 7, 2024 16:59
@mskamp mskamp force-pushed the fix_86813_test_bit_canonicalization branch from 32d94c8 to f7751bc Compare August 11, 2024 06:59
mskamp added 2 commits August 20, 2024 15:50
Implement a new transformations that folds the bit-testing expression
(icmp slt (shl V (sub (bw-1) B)) 0) to (icmp ne (and V (shl 1 B)) 0).
Also fold the negated variant of the LHS.

Alive proof: https://alive2.llvm.org/ce/z/5ic_qe

Relates to issue llvm#86813.
@mskamp mskamp force-pushed the fix_86813_test_bit_canonicalization branch from f7751bc to bee6b50 Compare August 20, 2024 17:55
@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 20, 2024

This pattern does not seem to exist in real-world code :(
We already fold (X << C) < 0 -> (X u>> (BW - C - 1)) != 0: https://godbolt.org/z/KnjsKT579

@mskamp
Copy link
Contributor Author

mskamp commented Aug 21, 2024

This pattern does not seem to exist in real-world code :( We already fold (X << C) < 0 -> (X u>> (BW - C - 1)) != 0: https://godbolt.org/z/KnjsKT579

Thank you for pointing this out. Then, the transformation is probably not worth the trouble.

@mskamp mskamp closed this Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimization on "test if bit N is set" pattern ((C >> x) & 1)
4 participants