Skip to content

[InstCombine] fold unsigned predicates on srem result #122520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 18, 2025

Conversation

jacobly0
Copy link
Contributor

@jacobly0 jacobly0 commented Jan 10, 2025

This allows optimization of more signed floor implementations when the divisor is a known power of two to an arithmetic shift.

Proof for the implemented optimizations:
https://alive2.llvm.org/ce/z/j6C-Nz

Proof for the test cases:
https://alive2.llvm.org/ce/z/M_PBjw

@jacobly0 jacobly0 requested a review from nikic as a code owner January 10, 2025 20:05
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jan 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Jacob Young (jacobly0)

Changes

This allows optimization of more signed floor implementations when the divisor is a known power of two to an arithmetic shift.

Proof for the implemented optimizations:
https://alive2.llvm.org/ce/z/UtdPgv

Proof for the test cases:
https://alive2.llvm.org/ce/z/M_PBjw


Full diff: https://github.com/llvm/llvm-project/pull/122520.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+18-4)
  • (modified) llvm/test/Transforms/InstCombine/add.ll (+26)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 2e457257599493..16cd4e847a3ad4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -2679,7 +2679,8 @@ Instruction *InstCombinerImpl::foldICmpSRemConstant(ICmpInst &Cmp,
   // (X % pow2C) sgt/slt 0
   const ICmpInst::Predicate Pred = Cmp.getPredicate();
   if (Pred != ICmpInst::ICMP_SGT && Pred != ICmpInst::ICMP_SLT &&
-      Pred != ICmpInst::ICMP_EQ && Pred != ICmpInst::ICMP_NE)
+      Pred != ICmpInst::ICMP_EQ && Pred != ICmpInst::ICMP_NE &&
+      Pred != ICmpInst::ICMP_UGT && Pred != ICmpInst::ICMP_ULT)
     return nullptr;
 
   // TODO: The one-use check is standard because we do not typically want to
@@ -2692,23 +2693,36 @@ Instruction *InstCombinerImpl::foldICmpSRemConstant(ICmpInst &Cmp,
   if (!match(SRem->getOperand(1), m_Power2(DivisorC)))
     return nullptr;
 
+  Type *Ty = SRem->getType();
+  APInt SignMask = APInt::getSignMask(Ty->getScalarSizeInBits());
+  // Setting the lsb instead of adding one properly handles i1.
+  APInt SignMaskOrOne = SignMask | 1;
+
   // For cmp_sgt/cmp_slt only zero valued C is handled.
   // For cmp_eq/cmp_ne only positive valued C is handled.
+  // For cmp_ugt only signed min/max valued C is handled.
+  // For cmp_ult only signed min | 0/1 valued C is handled.
   if (((Pred == ICmpInst::ICMP_SGT || Pred == ICmpInst::ICMP_SLT) &&
        !C.isZero()) ||
       ((Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE) &&
-       !C.isStrictlyPositive()))
+       !C.isStrictlyPositive()) ||
+      (Pred == ICmpInst::ICMP_UGT && !C.isMinSignedValue() &&
+       !C.isMaxSignedValue()) ||
+      (Pred == ICmpInst::ICMP_ULT && !C.isMinSignedValue() &&
+       C != SignMaskOrOne))
     return nullptr;
 
   // Mask off the sign bit and the modulo bits (low-bits).
-  Type *Ty = SRem->getType();
-  APInt SignMask = APInt::getSignMask(Ty->getScalarSizeInBits());
   Constant *MaskC = ConstantInt::get(Ty, SignMask | (*DivisorC - 1));
   Value *And = Builder.CreateAnd(SRem->getOperand(0), MaskC);
 
   if (Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE)
     return new ICmpInst(Pred, And, ConstantInt::get(Ty, C));
 
+  if (Pred == ICmpInst::ICMP_ULT)
+    return new ICmpInst(ICmpInst::ICMP_ULT, And,
+                        ConstantInt::get(Ty, SignMaskOrOne));
+
   // For 'is positive?' check that the sign-bit is clear and at least 1 masked
   // bit is set. Example:
   // (i8 X % 32) s> 0 --> (X & 159) s> 0
diff --git a/llvm/test/Transforms/InstCombine/add.ll b/llvm/test/Transforms/InstCombine/add.ll
index 222f87fa3a5f18..495f99824652d6 100644
--- a/llvm/test/Transforms/InstCombine/add.ll
+++ b/llvm/test/Transforms/InstCombine/add.ll
@@ -3018,6 +3018,32 @@ define i32 @floor_sdiv_wrong_op(i32 %x, i32 %y) {
   ret i32 %r
 }
 
+define i32 @floor_sdiv_using_srem_by_8(i32 %x) {
+; CHECK-LABEL: @floor_sdiv_using_srem_by_8(
+; CHECK-NEXT:    [[F:%.*]] = ashr i32 [[X:%.*]], 3
+; CHECK-NEXT:    ret i32 [[F]]
+;
+  %d = sdiv i32 %x, 8
+  %r = srem i32 %x, 8
+  %i = icmp ugt i32 %r, -2147483648
+  %s = sext i1 %i to i32
+  %f = add i32 %d, %s
+  ret i32 %f
+}
+
+define i32 @floor_sdiv_using_srem_by_2(i32 %x) {
+; CHECK-LABEL: @floor_sdiv_using_srem_by_2(
+; CHECK-NEXT:    [[F:%.*]] = ashr i32 [[X:%.*]], 1
+; CHECK-NEXT:    ret i32 [[F]]
+;
+  %d = sdiv i32 %x, 2
+  %r = srem i32 %x, 2
+  %i = icmp ugt i32 %r, -2147483648
+  %s = sext i1 %i to i32
+  %f = add i32 %d, %s
+  ret i32 %f
+}
+
 ; (X s>> (BW - 1)) + (zext (X s> 0)) --> (X s>> (BW - 1)) | (zext (X != 0))
 
 define i8 @signum_i8_i8(i8 %x) {

Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this optimization is too specific.
For your motivating case, we can simply convert icmp ugt (srem i32 %x, 8), -2147483648 into icmp slt (srem i32 %x, 8), 0. Then InstCombine will convert the remain expression into ashr X, 3: https://godbolt.org/z/Y7K5hd7c3

Generalized proof: https://alive2.llvm.org/ce/z/gST2A5

@jacobly0 jacobly0 force-pushed the floor-sdiv branch 2 times, most recently from f98ced2 to ccc9c7d Compare January 11, 2025 12:21
@jacobly0 jacobly0 requested a review from dtcxzyw January 11, 2025 12:22
@jacobly0
Copy link
Contributor Author

jacobly0 commented Jan 11, 2025

I think this optimization is too specific. For your motivating case, we can simply convert icmp ugt (srem i32 %x, 8), -2147483648 into icmp slt (srem i32 %x, 8), 0. Then InstCombine will convert the remain expression into ashr X, 3: https://godbolt.org/z/Y7K5hd7c3

Generalized proof: https://alive2.llvm.org/ce/z/gST2A5

Thanks for the hint, it should now be generalized to any comparison constant outside of the srem result range.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some negative tests.

@jacobly0 jacobly0 force-pushed the floor-sdiv branch 2 times, most recently from 9b77759 to e15c88e Compare January 12, 2025 12:47
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!
Please wait for additional approval from other reviewers :)

@dtcxzyw dtcxzyw requested a review from goldsteinn January 12, 2025 12:55
This allows optimization of more signed floor implementations when the
divisor is a known power of two to an arithmetic shift.

Proof for the implemented optimizations:
https://alive2.llvm.org/ce/z/j6C-Nz

Proof for the test cases:
https://alive2.llvm.org/ce/z/M_PBjw
@goldsteinn
Copy link
Contributor

LGTM

jacobly0 added a commit to jacobly0/llvm-project that referenced this pull request Jan 14, 2025
This is a generalization of llvm#122520 to other instructions with
constrained result ranges.
jacobly0 added a commit to jacobly0/llvm-project that referenced this pull request Jan 15, 2025
This is a generalization of llvm#122520 to other instructions with
constrained result ranges.
@andrewrk andrewrk merged commit bb59eb8 into llvm:main Jan 18, 2025
8 checks passed
@jacobly0 jacobly0 deleted the floor-sdiv branch January 18, 2025 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants