Skip to content

[InstSimplify] Fold converted urem to 0 if there's no overlapping bits #71528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 20, 2023

Conversation

huntergr-arm
Copy link
Collaborator

When folding urem instructions we can end up not recognizing that
the output will always be 0 due to Value*s being different, despite
generating the same data (in this case, 2 different calls to vscale).

This patch recognizes the (x << N) & (add (x << M), -1) pattern that
instcombine replaces urem with after the two vscale calls have been
reduced to one via CSE, then replaces with 0 when x is a non-zero
power of 2 and N >= M.

@llvmbot
Copy link
Member

llvmbot commented Nov 7, 2023

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Graham Hunter (huntergr-arm)

Changes

When folding urem instructions we can end up not recognizing that
the output will always be 0 due to Value*s being different, despite
generating the same data (in this case, 2 different calls to vscale).

This patch recognizes the (x << N) & (add (x << M), -1) pattern that
instcombine replaces urem with after the two vscale calls have been
reduced to one via CSE, then replaces with 0 when x is a non-zero
power of 2 and N >= M.


Full diff: https://github.com/llvm/llvm-project/pull/71528.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+10)
  • (added) llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll (+52)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 46af9bf5eed003a..da38f8039dbc3ca 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -2662,6 +2662,16 @@ Instruction *InstCombinerImpl::visitAnd(BinaryOperator &I) {
   if (sinkNotIntoOtherHandOfLogicalOp(I))
     return &I;
 
+  // (x << N) & (add (x << M), -1) --> 0, where x is known to be a non-zero
+  // power of 2 and M <= N.
+  const APInt *Shift1, *Shift2;
+  if (match(&I, m_c_And(m_OneUse(m_Shl(m_Value(X), m_APInt(Shift1))),
+                        m_OneUse(m_Add(m_Shl(m_Value(Y), m_APInt(Shift2)),
+                                       m_AllOnes())))) &&
+      X == Y && isKnownToBeAPowerOfTwo(X, /*OrZero*/ false, 0, &I) &&
+      Shift1->uge(*Shift2))
+    return replaceInstUsesWith(I, Constant::getNullValue(I.getType()));
+
   // An and recurrence w/loop invariant step is equivelent to (and start, step)
   PHINode *PN = nullptr;
   Value *Start = nullptr, *Step = nullptr;
diff --git a/llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll b/llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll
new file mode 100644
index 000000000000000..4979e7a01972299
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll
@@ -0,0 +1,52 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -mtriple unknown -passes=instcombine -S < %s | FileCheck %s
+
+;; The and X, (add Y, -1) pattern is from an earlier instcombine pass which
+;; converted
+
+;; define dso_local i64 @f1() local_unnamed_addr #0 {
+;; entry:
+;;   %0 = call i64 @llvm.aarch64.sve.cntb(i32 31)
+;;   %1 = call i64 @llvm.aarch64.sve.cnth(i32 31)
+;;   %rem = urem i64 %0, %1
+;;   ret i64 %rem
+;; }
+
+;; into
+
+;; define dso_local i64 @f1() local_unnamed_addr #0 {
+;; entry:
+;;   %0 = call i64 @llvm.vscale.i64()
+;;   %1 = shl nuw nsw i64 %0, 4
+;;   %2 = call i64 @llvm.vscale.i64()
+;;   %3 = shl nuw nsw i64 %2, 3
+;;   %4 = add nsw i64 %3, -1
+;;   %rem = and i64 %1, %4
+;;   ret i64 %rem
+;; }
+
+;; InstCombine would have folded the original to returning 0 if the vscale
+;; calls were the same Value*, but since there's two of them it doesn't
+;; work and we convert the urem to add/and. CSE then gets rid of the extra
+;; vscale, leaving us with a new pattern to match. This only works because
+;; vscale is known to be a nonzero power of 2 (assuming there's a defined
+;; range for it).
+
+define dso_local i64 @f1() local_unnamed_addr #0 {
+; CHECK-LABEL: define dso_local i64 @f1
+; CHECK-SAME: () local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret i64 0
+;
+entry:
+  %0 = call i64 @llvm.vscale.i64()
+  %1 = shl nuw nsw i64 %0, 4
+  %2 = shl nuw nsw i64 %0, 3
+  %3 = add nsw i64 %2, -1
+  %rem = and i64 %1, %3
+  ret i64 %rem
+}
+
+declare i64 @llvm.vscale.i64()
+
+attributes #0 = { vscale_range(1,16) }

@dtcxzyw
Copy link
Member

dtcxzyw commented Nov 7, 2023

Alive2: https://alive2.llvm.org/ce/z/PdD_x2

@goldsteinn
Copy link
Contributor

Alive2: https://alive2.llvm.org/ce/z/PdD_x2

Works for pow2 or zero: https://alive2.llvm.org/ce/z/UeesRf

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Nov 8, 2023
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for X is a power of 2 or zero.

define i64 @test_pow2_or_zero(i64 %arg) {
  %neg = sub i64 0, %arg
  %x = and i64 %neg, %arg
  %shl1 = shl i64 %x, 4
  %shl2 = shl i64 %x, 3
  %mask = add i64 %shl2, -1
  %rem = and i64 %shl1, %mask
  ret i64 %rem
}

Alive2: https://alive2.llvm.org/ce/z/_e3zLi
Wonder if we need some negative tests here.

@huntergr-arm
Copy link
Collaborator Author

Precommitted extra tests in 34f83e8 (including a negative test where vscale doesn't have a defined range)

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Waiting for additional approval from other reviewers.

@nikic nikic changed the title [InstCombine] Fold converted urem to 0 if there's no overlapping bits [InstSimplify] Fold converted urem to 0 if there's no overlapping bits Nov 11, 2023
@nikic
Copy link
Contributor

nikic commented Nov 15, 2023

Could you please rebase over ebb8ffd and move the fold into the simplifyAndCommutative() function, so it does not have to be repeated twice?

When folding urem instructions we can end up not recognizing that
the output will always be 0 due to Value*s being different, despite
generating the same data (in this case, 2 different calls to vscale).

This patch recognizes the (x << N) & (add (x << M), -1) pattern that
instcombine replaces urem with after the two vscale calls have been
reduced to one via CSE, then replaces with 0 when x is a non-zero
power of 2 and N >= M.
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huntergr-arm huntergr-arm merged commit 4028dd2 into llvm:main Nov 20, 2023
@huntergr-arm huntergr-arm deleted the vscale-combines branch November 20, 2023 10:27
sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023
llvm#71528)

When folding urem instructions we can end up not recognizing that
the output will always be 0 due to Value*s being different, despite
generating the same data (in this case, 2 different calls to vscale).

This patch recognizes the (x << N) & (add (x << M), -1) pattern that
instcombine replaces urem with after the two vscale calls have been
reduced to one via CSE, then replaces with 0 when x is a power of 2
and N >= M.
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
llvm#71528)

When folding urem instructions we can end up not recognizing that
the output will always be 0 due to Value*s being different, despite
generating the same data (in this case, 2 different calls to vscale).

This patch recognizes the (x << N) & (add (x << M), -1) pattern that
instcombine replaces urem with after the two vscale calls have been
reduced to one via CSE, then replaces with 0 when x is a power of 2
and N >= M.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants