-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[InstCombine] fold unsigned predicates on srem result #122520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Jacob Young (jacobly0) ChangesThis allows optimization of more signed floor implementations when the divisor is a known power of two to an arithmetic shift. Proof for the implemented optimizations: Proof for the test cases: Full diff: https://github.com/llvm/llvm-project/pull/122520.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 2e457257599493..16cd4e847a3ad4 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -2679,7 +2679,8 @@ Instruction *InstCombinerImpl::foldICmpSRemConstant(ICmpInst &Cmp,
// (X % pow2C) sgt/slt 0
const ICmpInst::Predicate Pred = Cmp.getPredicate();
if (Pred != ICmpInst::ICMP_SGT && Pred != ICmpInst::ICMP_SLT &&
- Pred != ICmpInst::ICMP_EQ && Pred != ICmpInst::ICMP_NE)
+ Pred != ICmpInst::ICMP_EQ && Pred != ICmpInst::ICMP_NE &&
+ Pred != ICmpInst::ICMP_UGT && Pred != ICmpInst::ICMP_ULT)
return nullptr;
// TODO: The one-use check is standard because we do not typically want to
@@ -2692,23 +2693,36 @@ Instruction *InstCombinerImpl::foldICmpSRemConstant(ICmpInst &Cmp,
if (!match(SRem->getOperand(1), m_Power2(DivisorC)))
return nullptr;
+ Type *Ty = SRem->getType();
+ APInt SignMask = APInt::getSignMask(Ty->getScalarSizeInBits());
+ // Setting the lsb instead of adding one properly handles i1.
+ APInt SignMaskOrOne = SignMask | 1;
+
// For cmp_sgt/cmp_slt only zero valued C is handled.
// For cmp_eq/cmp_ne only positive valued C is handled.
+ // For cmp_ugt only signed min/max valued C is handled.
+ // For cmp_ult only signed min | 0/1 valued C is handled.
if (((Pred == ICmpInst::ICMP_SGT || Pred == ICmpInst::ICMP_SLT) &&
!C.isZero()) ||
((Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE) &&
- !C.isStrictlyPositive()))
+ !C.isStrictlyPositive()) ||
+ (Pred == ICmpInst::ICMP_UGT && !C.isMinSignedValue() &&
+ !C.isMaxSignedValue()) ||
+ (Pred == ICmpInst::ICMP_ULT && !C.isMinSignedValue() &&
+ C != SignMaskOrOne))
return nullptr;
// Mask off the sign bit and the modulo bits (low-bits).
- Type *Ty = SRem->getType();
- APInt SignMask = APInt::getSignMask(Ty->getScalarSizeInBits());
Constant *MaskC = ConstantInt::get(Ty, SignMask | (*DivisorC - 1));
Value *And = Builder.CreateAnd(SRem->getOperand(0), MaskC);
if (Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE)
return new ICmpInst(Pred, And, ConstantInt::get(Ty, C));
+ if (Pred == ICmpInst::ICMP_ULT)
+ return new ICmpInst(ICmpInst::ICMP_ULT, And,
+ ConstantInt::get(Ty, SignMaskOrOne));
+
// For 'is positive?' check that the sign-bit is clear and at least 1 masked
// bit is set. Example:
// (i8 X % 32) s> 0 --> (X & 159) s> 0
diff --git a/llvm/test/Transforms/InstCombine/add.ll b/llvm/test/Transforms/InstCombine/add.ll
index 222f87fa3a5f18..495f99824652d6 100644
--- a/llvm/test/Transforms/InstCombine/add.ll
+++ b/llvm/test/Transforms/InstCombine/add.ll
@@ -3018,6 +3018,32 @@ define i32 @floor_sdiv_wrong_op(i32 %x, i32 %y) {
ret i32 %r
}
+define i32 @floor_sdiv_using_srem_by_8(i32 %x) {
+; CHECK-LABEL: @floor_sdiv_using_srem_by_8(
+; CHECK-NEXT: [[F:%.*]] = ashr i32 [[X:%.*]], 3
+; CHECK-NEXT: ret i32 [[F]]
+;
+ %d = sdiv i32 %x, 8
+ %r = srem i32 %x, 8
+ %i = icmp ugt i32 %r, -2147483648
+ %s = sext i1 %i to i32
+ %f = add i32 %d, %s
+ ret i32 %f
+}
+
+define i32 @floor_sdiv_using_srem_by_2(i32 %x) {
+; CHECK-LABEL: @floor_sdiv_using_srem_by_2(
+; CHECK-NEXT: [[F:%.*]] = ashr i32 [[X:%.*]], 1
+; CHECK-NEXT: ret i32 [[F]]
+;
+ %d = sdiv i32 %x, 2
+ %r = srem i32 %x, 2
+ %i = icmp ugt i32 %r, -2147483648
+ %s = sext i1 %i to i32
+ %f = add i32 %d, %s
+ ret i32 %f
+}
+
; (X s>> (BW - 1)) + (zext (X s> 0)) --> (X s>> (BW - 1)) | (zext (X != 0))
define i8 @signum_i8_i8(i8 %x) {
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this optimization is too specific.
For your motivating case, we can simply convert icmp ugt (srem i32 %x, 8), -2147483648
into icmp slt (srem i32 %x, 8), 0
. Then InstCombine will convert the remain expression into ashr X, 3
: https://godbolt.org/z/Y7K5hd7c3
Generalized proof: https://alive2.llvm.org/ce/z/gST2A5
f98ced2
to
ccc9c7d
Compare
Thanks for the hint, it should now be generalized to any comparison constant outside of the srem result range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some negative tests.
9b77759
to
e15c88e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
Please wait for additional approval from other reviewers :)
This allows optimization of more signed floor implementations when the divisor is a known power of two to an arithmetic shift. Proof for the implemented optimizations: https://alive2.llvm.org/ce/z/j6C-Nz Proof for the test cases: https://alive2.llvm.org/ce/z/M_PBjw
LGTM |
This is a generalization of llvm#122520 to other instructions with constrained result ranges.
This is a generalization of llvm#122520 to other instructions with constrained result ranges.
This allows optimization of more signed floor implementations when the divisor is a known power of two to an arithmetic shift.
Proof for the implemented optimizations:
https://alive2.llvm.org/ce/z/j6C-Nz
Proof for the test cases:
https://alive2.llvm.org/ce/z/M_PBjw