[InstSimplify] Fold converted urem to 0 if there's no overlapping bits #71528

huntergr-arm · 2023-11-07T12:15:10Z

When folding urem instructions we can end up not recognizing that
the output will always be 0 due to Value*s being different, despite
generating the same data (in this case, 2 different calls to vscale).

This patch recognizes the (x << N) & (add (x << M), -1) pattern that
instcombine replaces urem with after the two vscale calls have been
reduced to one via CSE, then replaces with 0 when x is a non-zero
power of 2 and N >= M.

llvmbot · 2023-11-07T12:15:39Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Graham Hunter (huntergr-arm)

Changes

When folding urem instructions we can end up not recognizing that
the output will always be 0 due to Value*s being different, despite
generating the same data (in this case, 2 different calls to vscale).

This patch recognizes the (x << N) & (add (x << M), -1) pattern that
instcombine replaces urem with after the two vscale calls have been
reduced to one via CSE, then replaces with 0 when x is a non-zero
power of 2 and N >= M.

Full diff: https://github.com/llvm/llvm-project/pull/71528.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+10)
(added) llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll (+52)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 46af9bf5eed003a..da38f8039dbc3ca 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -2662,6 +2662,16 @@ Instruction *InstCombinerImpl::visitAnd(BinaryOperator &I) {
   if (sinkNotIntoOtherHandOfLogicalOp(I))
     return &I;
 
+  // (x << N) & (add (x << M), -1) --> 0, where x is known to be a non-zero
+  // power of 2 and M <= N.
+  const APInt *Shift1, *Shift2;
+  if (match(&I, m_c_And(m_OneUse(m_Shl(m_Value(X), m_APInt(Shift1))),
+                        m_OneUse(m_Add(m_Shl(m_Value(Y), m_APInt(Shift2)),
+                                       m_AllOnes())))) &&
+      X == Y && isKnownToBeAPowerOfTwo(X, /*OrZero*/ false, 0, &I) &&
+      Shift1->uge(*Shift2))
+    return replaceInstUsesWith(I, Constant::getNullValue(I.getType()));
+
   // An and recurrence w/loop invariant step is equivelent to (and start, step)
   PHINode *PN = nullptr;
   Value *Start = nullptr, *Step = nullptr;
diff --git a/llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll b/llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll
new file mode 100644
index 000000000000000..4979e7a01972299
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll
@@ -0,0 +1,52 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -mtriple unknown -passes=instcombine -S < %s | FileCheck %s
+
+;; The and X, (add Y, -1) pattern is from an earlier instcombine pass which
+;; converted
+
+;; define dso_local i64 @f1() local_unnamed_addr #0 {
+;; entry:
+;;   %0 = call i64 @llvm.aarch64.sve.cntb(i32 31)
+;;   %1 = call i64 @llvm.aarch64.sve.cnth(i32 31)
+;;   %rem = urem i64 %0, %1
+;;   ret i64 %rem
+;; }
+
+;; into
+
+;; define dso_local i64 @f1() local_unnamed_addr #0 {
+;; entry:
+;;   %0 = call i64 @llvm.vscale.i64()
+;;   %1 = shl nuw nsw i64 %0, 4
+;;   %2 = call i64 @llvm.vscale.i64()
+;;   %3 = shl nuw nsw i64 %2, 3
+;;   %4 = add nsw i64 %3, -1
+;;   %rem = and i64 %1, %4
+;;   ret i64 %rem
+;; }
+
+;; InstCombine would have folded the original to returning 0 if the vscale
+;; calls were the same Value*, but since there's two of them it doesn't
+;; work and we convert the urem to add/and. CSE then gets rid of the extra
+;; vscale, leaving us with a new pattern to match. This only works because
+;; vscale is known to be a nonzero power of 2 (assuming there's a defined
+;; range for it).
+
+define dso_local i64 @f1() local_unnamed_addr #0 {
+; CHECK-LABEL: define dso_local i64 @f1
+; CHECK-SAME: () local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret i64 0
+;
+entry:
+  %0 = call i64 @llvm.vscale.i64()
+  %1 = shl nuw nsw i64 %0, 4
+  %2 = shl nuw nsw i64 %0, 3
+  %3 = add nsw i64 %2, -1
+  %rem = and i64 %1, %3
+  ret i64 %rem
+}
+
+declare i64 @llvm.vscale.i64()
+
+attributes #0 = { vscale_range(1,16) }

dtcxzyw · 2023-11-07T13:02:45Z

Alive2: https://alive2.llvm.org/ce/z/PdD_x2

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

llvm/test/Transforms/InstCombine/po2-shift-add-and-to-zero.ll

goldsteinn · 2023-11-07T17:52:40Z

Alive2: https://alive2.llvm.org/ce/z/PdD_x2

Works for pow2 or zero: https://alive2.llvm.org/ce/z/UeesRf

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

dtcxzyw

Please add a test for X is a power of 2 or zero.

define i64 @test_pow2_or_zero(i64 %arg) {
  %neg = sub i64 0, %arg
  %x = and i64 %neg, %arg
  %shl1 = shl i64 %x, 4
  %shl2 = shl i64 %x, 3
  %mask = add i64 %shl2, -1
  %rem = and i64 %shl1, %mask
  ret i64 %rem
}

Alive2: https://alive2.llvm.org/ce/z/_e3zLi
Wonder if we need some negative tests here.

llvm/test/Transforms/InstSimplify/po2-shift-add-and-to-zero.ll

huntergr-arm · 2023-11-08T17:05:37Z

Precommitted extra tests in 34f83e8 (including a negative test where vscale doesn't have a defined range)

dtcxzyw

LGTM. Waiting for additional approval from other reviewers.

llvm/lib/Analysis/InstructionSimplify.cpp

nikic · 2023-11-15T15:37:47Z

Could you please rebase over ebb8ffd and move the fold into the simplifyAndCommutative() function, so it does not have to be repeated twice?

When folding urem instructions we can end up not recognizing that the output will always be 0 due to Value*s being different, despite generating the same data (in this case, 2 different calls to vscale). This patch recognizes the (x << N) & (add (x << M), -1) pattern that instcombine replaces urem with after the two vscale calls have been reduced to one via CSE, then replaces with 0 when x is a non-zero power of 2 and N >= M.

nikic

LGTM

llvm#71528) When folding urem instructions we can end up not recognizing that the output will always be 0 due to Value*s being different, despite generating the same data (in this case, 2 different calls to vscale). This patch recognizes the (x << N) & (add (x << M), -1) pattern that instcombine replaces urem with after the two vscale calls have been reduced to one via CSE, then replaces with 0 when x is a power of 2 and N >= M.

huntergr-arm requested review from sdesmalen-arm and mgabka November 7, 2023 12:15

huntergr-arm requested a review from nikic as a code owner November 7, 2023 12:15

llvmbot added the llvm:transforms label Nov 7, 2023

dtcxzyw requested changes Nov 7, 2023

View reviewed changes

goldsteinn reviewed Nov 7, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Nov 7, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp Outdated Show resolved Hide resolved

huntergr-arm force-pushed the vscale-combines branch from 758c28a to b7650a1 Compare November 8, 2023 14:17

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Nov 8, 2023

dtcxzyw requested changes Nov 8, 2023

View reviewed changes

llvm/test/Transforms/InstSimplify/po2-shift-add-and-to-zero.ll Show resolved Hide resolved

huntergr-arm force-pushed the vscale-combines branch from b7650a1 to ddb6adb Compare November 8, 2023 17:29

dtcxzyw approved these changes Nov 11, 2023

View reviewed changes

nikic reviewed Nov 11, 2023

View reviewed changes

llvm/lib/Analysis/InstructionSimplify.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/InstructionSimplify.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/InstructionSimplify.cpp Outdated Show resolved Hide resolved

nikic changed the title ~~[InstCombine] Fold converted urem to 0 if there's no overlapping bits~~ [InstSimplify] Fold converted urem to 0 if there's no overlapping bits Nov 11, 2023

huntergr-arm force-pushed the vscale-combines branch from 371c5de to 14831e7 Compare November 16, 2023 16:33

nikic approved these changes Nov 16, 2023

View reviewed changes

huntergr-arm merged commit 4028dd2 into llvm:main Nov 20, 2023

huntergr-arm deleted the vscale-combines branch November 20, 2023 10:27

huntergr-arm mentioned this pull request Nov 21, 2023

[AArch64] [SVE] ratio of svcntb() to svcnth() #61505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InstSimplify] Fold converted urem to 0 if there's no overlapping bits #71528

[InstSimplify] Fold converted urem to 0 if there's no overlapping bits #71528

Uh oh!

huntergr-arm commented Nov 7, 2023

Uh oh!

llvmbot commented Nov 7, 2023 •

edited

Loading

Uh oh!

dtcxzyw commented Nov 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goldsteinn commented Nov 7, 2023

Uh oh!

Uh oh!

Uh oh!

dtcxzyw left a comment •

edited

Loading

Uh oh!

Uh oh!

huntergr-arm commented Nov 8, 2023

Uh oh!

dtcxzyw left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikic commented Nov 15, 2023

Uh oh!

nikic left a comment

Uh oh!

Uh oh!

[InstSimplify] Fold converted urem to 0 if there's no overlapping bits #71528

[InstSimplify] Fold converted urem to 0 if there's no overlapping bits #71528

Uh oh!

Conversation

huntergr-arm commented Nov 7, 2023

Uh oh!

llvmbot commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtcxzyw commented Nov 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goldsteinn commented Nov 7, 2023

Uh oh!

Uh oh!

Uh oh!

dtcxzyw left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

huntergr-arm commented Nov 8, 2023

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikic commented Nov 15, 2023

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Nov 7, 2023 •

edited

Loading

dtcxzyw left a comment •

edited

Loading