-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[InstCombine] Fold expression using basic properties of floor and ceiling function #107107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: None (c8ef) Changesalive2: https://alive2.llvm.org/ce/z/giSGaj Full diff: https://github.com/llvm/llvm-project/pull/107107.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 32a9f1ab34fb3f..e67a2875733416 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -4130,12 +4130,9 @@ static Value *simplifyFCmpInst(unsigned Predicate, Value *LHS, Value *RHS,
//
// This catches the 2 variable input case, constants are handled below as a
// class-like compare.
+ KnownFPClass LHSClass = computeKnownFPClass(LHS, fcAllFlags, /*Depth=*/0, Q);
+ KnownFPClass RHSClass = computeKnownFPClass(RHS, fcAllFlags, /*Depth=*/0, Q);
if (Pred == FCmpInst::FCMP_ORD || Pred == FCmpInst::FCMP_UNO) {
- KnownFPClass RHSClass =
- computeKnownFPClass(RHS, fcAllFlags, /*Depth=*/0, Q);
- KnownFPClass LHSClass =
- computeKnownFPClass(LHS, fcAllFlags, /*Depth=*/0, Q);
-
if (FMF.noNaNs() ||
(RHSClass.isKnownNeverNaN() && LHSClass.isKnownNeverNaN()))
return ConstantInt::get(RetTy, Pred == FCmpInst::FCMP_ORD);
@@ -4143,6 +4140,37 @@ static Value *simplifyFCmpInst(unsigned Predicate, Value *LHS, Value *RHS,
if (RHSClass.isKnownAlwaysNaN() || LHSClass.isKnownAlwaysNaN())
return ConstantInt::get(RetTy, Pred == CmpInst::FCMP_UNO);
}
+ // floor(x) <= x --> true; x <= ceil(x) --> true
+ if (LHSClass.isKnownNeverNaN() &&
+ match(LHS, m_Intrinsic<Intrinsic::floor>(m_Specific(RHS))) ||
+ RHSClass.isKnownNeverNaN() &&
+ match(RHS, m_Intrinsic<Intrinsic::ceil>(m_Specific(LHS)))) {
+ switch (Pred) {
+ case FCmpInst::FCMP_OLE:
+ case FCmpInst::FCMP_ULE:
+ return getTrue(RetTy);
+ case FCmpInst::FCMP_OGT:
+ case FCmpInst::FCMP_UGT:
+ return getFalse(RetTy);
+ default:
+ break;
+ }
+ }
+ if (RHSClass.isKnownNeverNaN() &&
+ match(RHS, m_Intrinsic<Intrinsic::floor>(m_Specific(LHS))) ||
+ LHSClass.isKnownNeverNaN() &&
+ match(LHS, m_Intrinsic<Intrinsic::ceil>(m_Specific(RHS)))) {
+ switch (Pred) {
+ case FCmpInst::FCMP_OGE:
+ case FCmpInst::FCMP_UGE:
+ return getTrue(RetTy);
+ case FCmpInst::FCMP_OLT:
+ case FCmpInst::FCMP_ULT:
+ return getFalse(RetTy);
+ default:
+ break;
+ }
+ }
const APFloat *C = nullptr;
match(RHS, m_APFloatAllowPoison(C));
diff --git a/llvm/test/Transforms/InstSimplify/fp-floor-ceil.ll b/llvm/test/Transforms/InstSimplify/fp-floor-ceil.ll
new file mode 100644
index 00000000000000..fe78656b014ae0
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/fp-floor-ceil.ll
@@ -0,0 +1,114 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=instsimplify -S | FileCheck %s
+
+define i1 @x_floor_ole(float %0) {
+; CHECK-LABEL: @x_floor_ole(
+; CHECK-NEXT: ret i1 true
+;
+ %2 = call nnan float @llvm.floor.f32(float %0)
+ %3 = fcmp ole float %2, %0
+ ret i1 %3
+}
+
+define i1 @x_floor_ule(float %0) {
+; CHECK-LABEL: @x_floor_ule(
+; CHECK-NEXT: ret i1 true
+;
+ %2 = call nnan float @llvm.floor.f32(float %0)
+ %3 = fcmp ule float %2, %0
+ ret i1 %3
+}
+
+define i1 @x_floor_ogt(float %0) {
+; CHECK-LABEL: @x_floor_ogt(
+; CHECK-NEXT: ret i1 false
+;
+ %2 = call nnan float @llvm.floor.f32(float %0)
+ %3 = fcmp ogt float %2, %0
+ ret i1 %3
+}
+
+define i1 @x_floor_ugt(float %0) {
+; CHECK-LABEL: @x_floor_ugt(
+; CHECK-NEXT: ret i1 false
+;
+ %2 = call nnan float @llvm.floor.f32(float %0)
+ %3 = fcmp ugt float %2, %0
+ ret i1 %3
+}
+
+define i1 @x_floor_ueq(float %0) {
+; CHECK-LABEL: @x_floor_ueq(
+; CHECK-NEXT: [[TMP2:%.*]] = call nnan float @llvm.floor.f32(float [[TMP0:%.*]])
+; CHECK-NEXT: [[TMP3:%.*]] = fcmp ueq float [[TMP2]], [[TMP0]]
+; CHECK-NEXT: ret i1 [[TMP3]]
+;
+ %2 = call nnan float @llvm.floor.f32(float %0)
+ %3 = fcmp ueq float %2, %0
+ ret i1 %3
+}
+
+define <2 x i1> @x_floor_ugt_vec(<2 x float> %0) {
+; CHECK-LABEL: @x_floor_ugt_vec(
+; CHECK-NEXT: ret <2 x i1> zeroinitializer
+;
+ %2 = call nnan <2 x float> @llvm.floor.v2f32(<2 x float> %0)
+ %3 = fcmp ugt <2 x float> %2, %0
+ ret <2 x i1> %3
+}
+
+define i1 @x_ceil_ole(float %0) {
+; CHECK-LABEL: @x_ceil_ole(
+; CHECK-NEXT: ret i1 false
+;
+ %2 = call nnan float @llvm.ceil.f32(float %0)
+ %3 = fcmp olt float %2, %0
+ ret i1 %3
+}
+
+define i1 @x_ceil_ule(float %0) {
+; CHECK-LABEL: @x_ceil_ule(
+; CHECK-NEXT: ret i1 false
+;
+ %2 = call nnan float @llvm.ceil.f32(float %0)
+ %3 = fcmp ult float %2, %0
+ ret i1 %3
+}
+
+define i1 @x_ceil_ogt(float %0) {
+; CHECK-LABEL: @x_ceil_ogt(
+; CHECK-NEXT: ret i1 true
+;
+ %2 = call nnan float @llvm.ceil.f32(float %0)
+ %3 = fcmp oge float %2, %0
+ ret i1 %3
+}
+
+define i1 @x_ceil_ugt(float %0) {
+; CHECK-LABEL: @x_ceil_ugt(
+; CHECK-NEXT: ret i1 true
+;
+ %2 = call nnan float @llvm.ceil.f32(float %0)
+ %3 = fcmp uge float %2, %0
+ ret i1 %3
+}
+
+define i1 @x_ceil_ueq(float %0) {
+; CHECK-LABEL: @x_ceil_ueq(
+; CHECK-NEXT: [[TMP2:%.*]] = call nnan float @llvm.ceil.f32(float [[TMP0:%.*]])
+; CHECK-NEXT: [[TMP3:%.*]] = fcmp ueq float [[TMP2]], [[TMP0]]
+; CHECK-NEXT: ret i1 [[TMP3]]
+;
+ %2 = call nnan float @llvm.ceil.f32(float %0)
+ %3 = fcmp ueq float %2, %0
+ ret i1 %3
+}
+
+define <2 x i1> @x_ceil_ugt_vec(<2 x float> %0) {
+; CHECK-LABEL: @x_ceil_ugt_vec(
+; CHECK-NEXT: ret <2 x i1> <i1 true, i1 true>
+;
+ %2 = call nnan <2 x float> @llvm.ceil.f32(<2 x float> %0)
+ %3 = fcmp uge <2 x float> %2, %0
+ ret <2 x i1> %3
+}
|
KnownFPClass LHSClass = computeKnownFPClass(LHS, fcAllFlags, /*Depth=*/0, Q); | ||
KnownFPClass RHSClass = computeKnownFPClass(RHS, fcAllFlags, /*Depth=*/0, Q); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These queries can be expensive and you should try to defer them as late as possible (and prefer to early exit on the RHS query before the LHS, as the RHS is canonically cheaper)
if (FMF.noNaNs() || | ||
(RHSClass.isKnownNeverNaN() && LHSClass.isKnownNeverNaN())) | ||
return ConstantInt::get(RetTy, Pred == FCmpInst::FCMP_ORD); | ||
|
||
if (RHSClass.isKnownAlwaysNaN() || LHSClass.isKnownAlwaysNaN()) | ||
return ConstantInt::get(RetTy, Pred == CmpInst::FCMP_UNO); | ||
} | ||
// floor(x) <= x --> true; x <= ceil(x) --> true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate lines for each of these comments
if (FMF.noNaNs() || | ||
(RHSClass.isKnownNeverNaN() && LHSClass.isKnownNeverNaN())) | ||
return ConstantInt::get(RetTy, Pred == FCmpInst::FCMP_ORD); | ||
|
||
if (RHSClass.isKnownAlwaysNaN() || LHSClass.isKnownAlwaysNaN()) | ||
return ConstantInt::get(RetTy, Pred == CmpInst::FCMP_UNO); | ||
} | ||
// floor(x) <= x --> true; x <= ceil(x) --> true | ||
if (LHSClass.isKnownNeverNaN() && | ||
match(LHS, m_Intrinsic<Intrinsic::floor>(m_Specific(RHS))) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match the intrinsic before the expensive class query
break; | ||
} | ||
} | ||
if (RHSClass.isKnownNeverNaN() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wouldn't need the known never nan checks if you can replace with an fcmp ord/uno, but then you have to create a new instruction. Maybe it would be better to just handle this in instcombine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your idea correctly, it seems that we might only need the following transformation. The existing code in InstSimplify should then be able to simplify these to either true or false?
define i1 @src_x_floor_ole(float %0) {
%2 = call float @llvm.floor.f32(float %0)
%3 = fcmp ole float %2, %0
ret i1 %3
}
define i1 @tgt_x_floor_ole(float %0) {
%2 = call float @llvm.floor.f32(float %0)
%3 = fcmp ord float %2, %0
ret i1 %3
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the floor at all: https://alive2.llvm.org/ce/z/E4erNR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll move the patch to InstCombineCompares.cpp tonight.
; CHECK-LABEL: @x_floor_ole( | ||
; CHECK-NEXT: ret i1 true | ||
; | ||
%2 = call nnan float @llvm.floor.f32(float %0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use named values in tests.
Also test a case where the known-never-nan comes from the source value, not the instruction flag (e.g. nofpclass(nan) on the incoming function argument)
Hi @dtcxzyw, could you help me verify if the transformation in the patch has any real-world use cases? |
This pattern is simple so I think it is ok to handle this regardless of its real-world usefulness. |
if (Pred == FCmpInst::FCMP_OLE || | ||
Pred == FCmpInst::FCMP_ULE && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the pred checks first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean check pred before matching operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
isKnownNeverNaN(LHS, 0, | ||
CI.getSimplifyQuery().getWithInstruction(&I))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not need to call isKnownNeverNaN. The only reason it was useful in instsimplify was you could not insert the new fcmp required to preserve the nan behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But without nnan
, the following transformation seems to be invalid?
define i1 @src_x_floor_ole(float %0) {
%2 = call float @llvm.floor.f32(float %0)
%3 = fcmp ule float %2, %0
ret i1 %3
}
define i1 @tgt_x_floor_ole(float %0) {
%3 = fcmp ord float %0, 0.0
ret i1 %3
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs some adjustment, but you can avoid it: https://alive2.llvm.org/ce/z/YZfsRb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right👍
;fcmp ole floor(x), x => fcmp ord x, 0
;fcmp ogt floor(x), x => false
;fcmp oge x, floor(x) => fcmp ord x, 0
;fcmp olt x, floor(x) => false
;fcmp ole x, ceil(x) => fcmp ord x, 0
;fcmp ogt x, ceil(x) => false
;fcmp oge ceil(x), x => fcmp ord x, 0
;fcmp olt ceil(x), x => false
;fcmp ule floor(x), x => fcmp ule float 0xFFF0000000000000, x
;fcmp ugt floor(x), x => fcmp ugt float 0xFFF0000000000000, x
;fcmp uge x, floor(x) => fcmp uge float x, 0xFFF0000000000000
;fcmp ult x, floor(x) => fcmp ult float x, 0xFFF0000000000000
;fcmp ule x, ceil(x) => fcmp ule float x, 0x7FF0000000000000
;fcmp ugt x, ceil(x) => fcmp ugt float x, 0x7FF0000000000000
;fcmp uge ceil(x), x => fcmp uge float 0x7FF0000000000000, x
;fcmp ult ceil(x), x => fcmp ult float 0x7FF0000000000000, x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the alive2 link in your PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The culmination of all 16 transformations results in a timeout. By disabling either the first half or the second half, you can observe the expected outcome.
bool floor_x = match(LHS, m_Intrinsic<Intrinsic::floor>(m_Specific(RHS))); | ||
bool x_floor = match(RHS, m_Intrinsic<Intrinsic::floor>(m_Specific(LHS))); | ||
bool ceil_x = match(LHS, m_Intrinsic<Intrinsic::ceil>(m_Specific(RHS))); | ||
bool x_ceil = match(RHS, m_Intrinsic<Intrinsic::ceil>(m_Specific(LHS))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner if you swapped operands so floor/ceil is always on the left, instead of always checking both variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I can check whether x_floor
or x_ceil
has occurred. If so, I will swap the LHS and the RHS and obtain the swapped predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
%ceil = call float @llvm.ceil.f32(float %x) | ||
%ret = fcmp oge float %x, %ceil | ||
ret i1 %ret | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative tests for the other compare types, like oeq/one/une/ord/uno
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Other types of negative comparison tests have been added.
; CHECK-NEXT: ret i1 [[RET]] | ||
; | ||
%ceil = call float @llvm.ceil.f32(float %x) | ||
%ret = fcmp ninf oge float %ceil, %x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only flag preserving test, but it only covers oge. Should have one for each of these handled cases (i.e. have a test with a flag with ole, oge, ule, ugt, uge, and ult)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. All positive transformations resulting in a fcmp
now have the corresponding fmf flags.
if (x_floor || x_ceil) { | ||
std::swap(LHS, RHS); | ||
Pred = I.getSwappedPredicate(); | ||
(x_floor ? floor_x : ceil_x) = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of pre-matching all the cases, you could match the preferred direction and then only try the other match if they both failed. You can also then directly assign floor_x / ceil_x from the match function when commuting.
Also variable names should be camel case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Code style issues have also been addressed.
"", &I); | ||
break; | ||
case FCmpInst::FCMP_UGT: | ||
// fcmp ugt floor(x), x => fcmp ugt -inf, x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// fcmp ugt floor(x), x => fcmp ugt -inf, x | |
// fcmp ugt floor(x), x => fcmp uno x, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm-project/llvm/test/Transforms/InstCombine/fp-floor-ceil.ll
Lines 32 to 40 in b0a37fb
define i1 @floor_x_ugt(float %x) { | |
; CHECK-LABEL: @floor_x_ugt( | |
; CHECK-NEXT: [[RET:%.*]] = fcmp ninf uno float [[X:%.*]], 0.000000e+00 | |
; CHECK-NEXT: ret i1 [[RET]] | |
; | |
%floor = call float @llvm.floor.f32(float %x) | |
%ret = fcmp ninf ugt float %floor, %x | |
ret i1 %ret | |
} |
From the test case above, we can see that the instructions will ultimately be converted to 'fcmp uno'. However, I believe we should keep this comment unchanged to align with the actual transformation in this exact function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean you should modify both the comment and the transform below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The alive2 test was also updated.
Co-authored-by: Yingwei Zheng <[email protected]>
Co-authored-by: Yingwei Zheng <[email protected]>
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
Please wait for additional approval from other reviewers :)
This seems irrelevant to my patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, but I didn't review in detail.
std::swap(LHS, RHS); | ||
Pred = I.getSwappedPredicate(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think it's usually more elegant to do this by calling foldFCmpWithFloorAndCeil(Pred, Op0, Op1)
and then foldFCmpWithFloorAndCeil(getSwappedPredicate(), Op1, Op0)
, so the method itself only has to deal with one variant. But the way you did it is fine as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea. I'll make the changes you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we still need to include the fcmp
instruction in the function to preserve the fmf
flags. The current implementation should be sufficient.
@arsenm, could you please take another look? |
"", &I); | ||
break; | ||
default: | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't handling the cases that use infinities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The infinite case will eventually be transformed into constants, uno
, or ord
. Therefore, in earlier commits, we directly transform it into the final result.
See also: #107107 (comment)
alive2:
https://alive2.llvm.org/ce/z/Ag3Ki7https://alive2.llvm.org/ce/z/ywP5t2related: #76438
This patch adds the following foldings:
floor(x) <= x --> true
andx <= ceil(x) --> true
. We leverage the properties of these math functions and ensure there is no floating point input ofnan
.