Skip to content

[InstCombine] Canonicalize Bit Testing by Shifting to Bit 0 #101838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mskamp
Copy link
Contributor

@mskamp mskamp commented Aug 3, 2024

Implement a new transformation that fold the bit-testing expression (icmp ne (and (lshr V B) 1) 0) to (icmp ne (and V (shl 1 B)) 0) for constant V. This rule already existed for non-constant V and constants other than 1; this restriction to non-constant V has been added in commit c3b2111 to fix an infinite loop. Avoid the infinite loop by allowing constant V only if the shift instruction is an lshr and the constant is 1. Also fold the negated variant of the LHS.

This transformation necessitates an adaption of existing tests in icmp-and-shift.ll and load-cmp.ll. One test in icmp-and-shift.ll, which previously was a negative test, now gets folded. Rename it to indicate that it is a positive test.

Alive proof: https://alive2.llvm.org/ce/z/vcJJTx

Relates to issue #86813.

mskamp added 2 commits August 3, 2024 19:09
Implement a new transformation that fold the bit-testing expression
(icmp ne (and (lshr V B) 1) 0) to (icmp ne (and V (shl 1 B)) 0) for
constant V. This rule already existed for non-constant V and constants
other than 1; this restriction to non-constant V has been added in
commit c3b2111 to fix an infinite loop.
Avoid the infinite loop by allowing constant V only if the shift
instruction is an lshr and the constant is 1. Also fold the negated
variant of the LHS.

This transformation necessitates an adaption of existing tests in
`icmp-and-shift.ll` and `load-cmp.ll`. One test in `icmp-and-shift.ll`,
which previously was a negative test, now gets folded. Rename it to
indicate that it is a positive test.

Alive proof: https://alive2.llvm.org/ce/z/vcJJTx

Relates to issue llvm#86813.
@mskamp mskamp requested a review from nikic as a code owner August 3, 2024 17:38
@llvmbot
Copy link
Member

llvmbot commented Aug 3, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (mskamp)

Changes

Implement a new transformation that fold the bit-testing expression (icmp ne (and (lshr V B) 1) 0) to (icmp ne (and V (shl 1 B)) 0) for constant V. This rule already existed for non-constant V and constants other than 1; this restriction to non-constant V has been added in commit c3b2111 to fix an infinite loop. Avoid the infinite loop by allowing constant V only if the shift instruction is an lshr and the constant is 1. Also fold the negated variant of the LHS.

This transformation necessitates an adaption of existing tests in icmp-and-shift.ll and load-cmp.ll. One test in icmp-and-shift.ll, which previously was a negative test, now gets folded. Rename it to indicate that it is a positive test.

Alive proof: https://alive2.llvm.org/ce/z/vcJJTx

Relates to issue #86813.


Full diff: https://github.com/llvm/llvm-project/pull/101838.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+2-1)
  • (modified) llvm/test/Transforms/InstCombine/icmp-and-shift.ll (+137-5)
  • (modified) llvm/test/Transforms/InstCombine/load-cmp.ll (+4-4)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 3b6df2760ecc2..66809dbf0cbd5 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -1725,7 +1725,8 @@ Instruction *InstCombinerImpl::foldICmpAndShift(ICmpInst &Cmp,
   // preferable because it allows the C2 << Y expression to be hoisted out of a
   // loop if Y is invariant and X is not.
   if (Shift->hasOneUse() && C1.isZero() && Cmp.isEquality() &&
-      !Shift->isArithmeticShift() && !isa<Constant>(Shift->getOperand(0))) {
+      !Shift->isArithmeticShift() &&
+      ((!IsShl && C2.isOne()) || !isa<Constant>(Shift->getOperand(0)))) {
     // Compute C2 << Y.
     Value *NewShift =
         IsShl ? Builder.CreateLShr(And->getOperand(1), Shift->getOperand(1))
diff --git a/llvm/test/Transforms/InstCombine/icmp-and-shift.ll b/llvm/test/Transforms/InstCombine/icmp-and-shift.ll
index 08d23e84c3960..65b9749ffc1bf 100644
--- a/llvm/test/Transforms/InstCombine/icmp-and-shift.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-and-shift.ll
@@ -404,11 +404,10 @@ define <2 x i32> @icmp_ne_and_pow2_lshr_pow2_vec(<2 x i32> %0) {
   ret <2 x i32> %conv
 }
 
-define i32 @icmp_eq_and1_lshr_pow2_negative1(i32 %0) {
-; CHECK-LABEL: @icmp_eq_and1_lshr_pow2_negative1(
-; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 7, [[TMP0:%.*]]
-; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], 1
-; CHECK-NEXT:    [[CONV:%.*]] = xor i32 [[AND]], 1
+define i32 @icmp_eq_and1_lshr_pow2_minus_one(i32 %0) {
+; CHECK-LABEL: @icmp_eq_and1_lshr_pow2_minus_one(
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ugt i32 [[TMP0:%.*]], 2
+; CHECK-NEXT:    [[CONV:%.*]] = zext i1 [[CMP]] to i32
 ; CHECK-NEXT:    ret i32 [[CONV]]
 ;
   %lshr = lshr i32 7, %0
@@ -606,3 +605,136 @@ define i1 @fold_ne_rhs_fail_shift_not_1s(i8 %x, i8 %yy) {
   %r = icmp ne i8 %and, 0
   ret i1 %r
 }
+
+define i1 @test_shr_and_1_ne_0(i32 %a, i32 %b) {
+; CHECK-LABEL: @test_shr_and_1_ne_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], [[A:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[TMP2]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %shr = lshr i32 %a, %b
+  %and = and i32 %shr, 1
+  %cmp = icmp ne i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_const_shr_and_1_ne_0(i32 %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[TMP1]], 42
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[AND]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp = icmp ne i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_not_const_shr_and_1_ne_0(i32 %b) {
+; CHECK-LABEL: @test_not_const_shr_and_1_ne_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[TMP1]], 42
+; CHECK-NEXT:    [[CMP_NOT:%.*]] = icmp eq i32 [[AND]], 0
+; CHECK-NEXT:    ret i1 [[CMP_NOT]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp = icmp eq i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_const_shr_exact_and_1_ne_0(i32 %b) {
+; CHECK-LABEL: @test_const_shr_exact_and_1_ne_0(
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[TMP1]], 42
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i32 [[AND]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %shr = lshr exact i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp = icmp ne i32 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_const_shr_and_2_ne_0_negative(i32 %b) {
+; CHECK-LABEL: @test_const_shr_and_2_ne_0_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 42, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[SHR]], 2
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[AND]], 0
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 2
+  %cmp = icmp eq i32 %and, 0
+  ret i1 %cmp
+}
+
+define <8 x i1> @test_const_shr_and_1_ne_0_v8i8_splat_negative(<8 x i8> %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_v8i8_splat_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr <8 x i8> <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42>, [[B:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = trunc <8 x i8> [[SHR]] to <8 x i1>
+; CHECK-NEXT:    ret <8 x i1> [[CMP]]
+;
+  %shr = lshr <8 x i8> <i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42, i8 42>, %b
+  %and = and <8 x i8> %shr, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
+  %cmp = icmp ne <8 x i8> %and, zeroinitializer
+  ret <8 x i1> %cmp
+}
+
+define <8 x i1> @test_const_shr_and_1_ne_0_v8i8_nonsplat_negative(<8 x i8> %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_v8i8_nonsplat_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr <8 x i8> <i8 42, i8 43, i8 44, i8 45, i8 46, i8 47, i8 48, i8 49>, [[B:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = trunc <8 x i8> [[SHR]] to <8 x i1>
+; CHECK-NEXT:    ret <8 x i1> [[CMP]]
+;
+  %shr = lshr <8 x i8> <i8 42, i8 43, i8 44, i8 45, i8 46, i8 47, i8 48, i8 49>, %b
+  %and = and <8 x i8> %shr, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
+  %cmp = icmp ne <8 x i8> %and, zeroinitializer
+  ret <8 x i1> %cmp
+}
+
+define i1 @test_const_shr_and_1_ne_0_i1_negative(i1 %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_i1_negative(
+; CHECK-NEXT:    ret i1 true
+;
+  %shr = lshr i1 1, %b
+  %and = and i1 %shr, 1
+  %cmp = icmp ne i1 %and, 0
+  ret i1 %cmp
+}
+
+define i1 @test_const_shr_and_1_ne_0_multi_use_lshr_negative(i32 %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_multi_use_lshr_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 42, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[SHR]], 1
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp ne i32 [[AND]], 0
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[SHR]], [[B]]
+; CHECK-NEXT:    [[RET:%.*]] = and i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[RET]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp1 = icmp ne i32 %and, 0
+  %cmp2 = icmp eq i32 %b, %shr
+  %ret = and i1 %cmp1, %cmp2
+  ret i1 %ret
+}
+
+define i1 @test_const_shr_and_1_ne_0_multi_use_and_negative(i32 %b) {
+; CHECK-LABEL: @test_const_shr_and_1_ne_0_multi_use_and_negative(
+; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 42, [[B:%.*]]
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[SHR]], 1
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp ne i32 [[AND]], 0
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[AND]], [[B]]
+; CHECK-NEXT:    [[RET:%.*]] = and i1 [[CMP1]], [[CMP2]]
+; CHECK-NEXT:    ret i1 [[RET]]
+;
+  %shr = lshr i32 42, %b
+  %and = and i32 %shr, 1
+  %cmp1 = icmp ne i32 %and, 0
+  %cmp2 = icmp eq i32 %b, %and
+  %ret = and i1 %cmp1, %cmp2
+  ret i1 %ret
+}
diff --git a/llvm/test/Transforms/InstCombine/load-cmp.ll b/llvm/test/Transforms/InstCombine/load-cmp.ll
index b956de29e0b8d..8e39fe33cded8 100644
--- a/llvm/test/Transforms/InstCombine/load-cmp.ll
+++ b/llvm/test/Transforms/InstCombine/load-cmp.ll
@@ -109,8 +109,8 @@ define i1 @test3(i32 %X) {
 
 define i1 @test4(i32 %X) {
 ; CHECK-LABEL: @test4(
-; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 933, [[X:%.*]]
-; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw i32 1, [[X:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 933
 ; CHECK-NEXT:    [[R:%.*]] = icmp ne i32 [[TMP2]], 0
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
@@ -123,8 +123,8 @@ define i1 @test4(i32 %X) {
 define i1 @test4_i16(i16 %X) {
 ; CHECK-LABEL: @test4_i16(
 ; CHECK-NEXT:    [[TMP1:%.*]] = zext nneg i16 [[X:%.*]] to i32
-; CHECK-NEXT:    [[TMP2:%.*]] = lshr i32 933, [[TMP1]]
-; CHECK-NEXT:    [[TMP3:%.*]] = and i32 [[TMP2]], 1
+; CHECK-NEXT:    [[TMP2:%.*]] = shl nuw i32 1, [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = and i32 [[TMP2]], 933
 ; CHECK-NEXT:    [[R:%.*]] = icmp ne i32 [[TMP3]], 0
 ; CHECK-NEXT:    ret i1 [[R]]
 ;

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@goldsteinn
Copy link
Contributor

LGTM

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but it looks like there is a regression in this case: https://llvm.godbolt.org/z/8dGhf9Wz1

@mskamp
Copy link
Contributor Author

mskamp commented Aug 4, 2024

Good point. I think we could adapt the transformation in InstCombineCasts.cpp:988 to also support smaller bit widths (and insert a trunc in this case). I could work on such a change if this is fine.

@nikic nikic merged commit 533190a into llvm:main Aug 4, 2024
9 checks passed
@nikic
Copy link
Contributor

nikic commented Aug 4, 2024

Good point. I think we could adapt the transformation in InstCombineCasts.cpp:988 to also support smaller bit widths (and insert a trunc in this case). I could work on such a change if this is fine.

Sound good to me!

dtcxzyw pushed a commit that referenced this pull request Aug 21, 2024
Previously, (zext (icmp ne (and X, (1 << ShAmt)), 0)) has only been
folded if the bit width of X and the result were equal. Use a trunc or
zext instruction to also support other bit widths.
    
This is a follow-up to commit 533190a,
which introduced a regression: (zext (icmp ne (and (lshr X ShAmt) 1) 0))
is not folded any longer to (zext/trunc (and (lshr X ShAmt) 1)) since
the commit introduced the fold of (icmp ne (and (lshr X ShAmt) 1) 0) to
(icmp ne (and X (1 << ShAmt)) 0). The change introduced by this commit
restores this fold.
    
Alive proof: https://alive2.llvm.org/ce/z/MFkNXs
    
Relates to issue #86813 and pull request #101838.
cjdb pushed a commit to cjdb/llvm-project that referenced this pull request Aug 23, 2024
Previously, (zext (icmp ne (and X, (1 << ShAmt)), 0)) has only been
folded if the bit width of X and the result were equal. Use a trunc or
zext instruction to also support other bit widths.
    
This is a follow-up to commit 533190a,
which introduced a regression: (zext (icmp ne (and (lshr X ShAmt) 1) 0))
is not folded any longer to (zext/trunc (and (lshr X ShAmt) 1)) since
the commit introduced the fold of (icmp ne (and (lshr X ShAmt) 1) 0) to
(icmp ne (and X (1 << ShAmt)) 0). The change introduced by this commit
restores this fold.
    
Alive proof: https://alive2.llvm.org/ce/z/MFkNXs
    
Relates to issue llvm#86813 and pull request llvm#101838.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants