-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[SimplifyCFG] Treat umul + extract pattern as cheap single instruction. #124933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Gábor Spaits (spaits) ChangesThis pr was created to fix #115683 . In the comments @dtcxzyw has suggested, to treat the the pattern I was also thinking, maybe I should create a mechanism to add more cheap patterns like this one, and in the feature we just provide the pattern and don't have to create a add a new What do you think? Is this solution correct? Or should I try to do this in a different way? Full diff: https://github.com/llvm/llvm-project/pull/124933.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 12dd49da279b9c..5197db70285cec 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -3290,9 +3290,11 @@ bool SimplifyCFGOpt::speculativelyExecuteBB(BranchInst *BI,
bool HoistLoadsStores = HoistLoadsStoresWithCondFaulting &&
Options.HoistLoadsStoresWithCondFaulting;
SmallVector<Instruction *, 2> SpeculatedConditionalLoadsStores;
+ InstructionCost BlockCostSoFar = 0;
Value *SpeculatedStoreValue = nullptr;
StoreInst *SpeculatedStore = nullptr;
EphemeralValueTracker EphTracker;
+ bool PatternFound = false;
for (Instruction &I : reverse(drop_end(*ThenBB))) {
// Skip debug info.
if (isa<DbgInfoIntrinsic>(I)) {
@@ -3329,9 +3331,6 @@ bool SimplifyCFGOpt::speculativelyExecuteBB(BranchInst *BI,
else
++SpeculatedInstructions;
- if (SpeculatedInstructions > 1)
- return false;
-
// Don't hoist the instruction if it's unsafe or expensive.
if (!IsSafeCheapLoadStore &&
!isSafeToSpeculativelyExecute(&I, BI, Options.AC) &&
@@ -3339,10 +3338,23 @@ bool SimplifyCFGOpt::speculativelyExecuteBB(BranchInst *BI,
(SpeculatedStoreValue =
isSafeToSpeculateStore(&I, BB, ThenBB, EndBB))))
return false;
- if (!IsSafeCheapLoadStore && !SpeculatedStoreValue &&
- computeSpeculationCost(&I, TTI) >
+
+ if (match(&I,
+ m_ExtractValue<1>(m_OneUse(
+ m_Intrinsic<Intrinsic::umul_with_overflow>(m_Value())))) &&
+ ThenBB->size() <= 3) {
+ PatternFound = true;
+ }
+
+ BlockCostSoFar += computeSpeculationCost(&I, TTI);
+ if (! PatternFound && !IsSafeCheapLoadStore && !SpeculatedStoreValue &&
+ BlockCostSoFar >
PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic)
return false;
+ // If we don't find any pattern, that must be cheap, then only speculatively
+ // execute a single instruction (not counting the terminator).
+ if (!PatternFound && SpeculatedInstructions > 1)
+ return false;
// Store the store speculation candidate.
if (!SpeculatedStore && SpeculatedStoreValue)
diff --git a/llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll b/llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll
new file mode 100644
index 00000000000000..72610218c314ec
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll
@@ -0,0 +1,27 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -passes=simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S < %s | FileCheck %s
+
+define dso_local signext range(i16 0, 2) i16 @func2(i64 noundef %x, i64 noundef %y) local_unnamed_addr #0 {
+; CHECK-LABEL: @func2(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i64 [[Y:%.*]], 0
+; CHECK-NEXT: [[MUL:%.*]] = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 [[Y]], i64 [[X:%.*]])
+; CHECK-NEXT: [[MUL_OV:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
+; CHECK-NEXT: [[SPEC_SELECT:%.*]] = select i1 [[CMP_NOT]], i1 false, i1 [[MUL_OV]]
+; CHECK-NEXT: [[CONV:%.*]] = zext i1 [[SPEC_SELECT]] to i16
+; CHECK-NEXT: ret i16 [[CONV]]
+;
+entry:
+ %cmp.not = icmp eq i64 %y, 0
+ br i1 %cmp.not, label %land.end, label %land.rhs
+
+land.rhs: ; preds = %entry
+ %mul = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %y, i64 %x)
+ %mul.ov = extractvalue { i64, i1 } %mul, 1
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %entry
+ %0 = phi i1 [ false, %entry ], [ %mul.ov, %land.rhs ]
+ %conv = zext i1 %0 to i16
+ ret i16 %conv
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Also related to: |
// If we don't find any pattern, that must be cheap, then only speculatively | ||
// execute a single instruction (not counting the terminator). | ||
if (!PatternFound && SpeculatedInstructions > 1) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be easier to just add a MaxSpeculatedInstructionsToHoist
variable and increment it if you find an overflow intrinsic whose only user is an extract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, I was thinking something a bit different, particularly I don't really see the reason for accumulating BlockCostSoFar
for the extract
and mul.overflow
if the expressed intent is to treat it as a single instruction.
What about something like:
bool PartialInst = false;
...
// Not count load/store into cost if target supports conditional faulting
// b/c it's cheap to speculate it.
if (IsSafeCheapLoadStore)
SpeculatedConditionalLoadsStores.push_back(&I);
else if (match(&I, m_ExtractValue<1>(
m_OneUse(m_WithOverflowInst(m_Value())))) &&
I.getOperand(0)->getParent() == ThenBB)
// We are treating (extract (mul/sub/add.overflow), 1) as a single
// instruction
PartialInst = true;
else
++SpeculatedInstructions;
...
if (!IsSafeCheapLoadStore && !SpeculatedStoreValue && !PartialInst &&
computeSpeculationCost(&I, TTI) >
PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic)
return false;
Although I'm not really sure if the PartialInst
part is necessary at all in fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the accumulating part of the PR. It was not needed. I have just added it because it looked more generic to me(#124933 (comment) later we would want to hoist more then one instruction but we would still care about the cost then we would need it, but not now). I know it is a generally bad practice to include code in a commit, that is not related to the main goal of the commit. Sorry for that.
If we don't have thePartialInst
variable then the hoisting would not be done, because of the high cost.
On rv64, the value of PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic
is 2 and the cost of the pattern is 5. So one of the instructions cost must be 3 and that would be over the threshold.
I was also thinking of extending the pattern. Maybe we should check, if the condition is truly a zero checking for one of the operand of the overflow operator. This hoist is probably beneficial in that case only, because later on the redundant check can be dismissed.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also MaxSpeculatedInstructionsToHoist
is still needed, because the match
only finds the last instruction of the pattern, the others are still checked and therefore they increase the instruction count.
@goldsteinn Thank you very much for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you! Can you adjust the PR description to make it ready for merge?
Please wait for additional approval from other reviewers :)
@dtcxzyw @nikic Thank you very much for your review. |
6dade29
to
838e617
Compare
@nikic Sorry for the slow response. I finally had some some time.
Maybe I should look more into that function and try to move this optimization there? |
Gentle ping @nikic . It turns out, that the cost of %mul = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %add, i64 %x) is 1, when there is no target specified. When using the riscv64 target then this value is 5.
|
Co-authored-by: Antonio Frighetto <[email protected]>
Co-authored-by: Antonio Frighetto <[email protected]>
Co-authored-by: Antonio Frighetto <[email protected]>
Co-authored-by: Antonio Frighetto <[email protected]>
@antoniofrighetto Thank you very much for your review. |
@spaits Looks like the CI is reporting a failure in PhaseOrdering/X86/unsigned-multiply-overflow-check.ll test, which should require an update, could you look at it please? |
@antoniofrighetto I have fixed the failing test case. It seems like we have a regression, compared to some earlier state of the PR, that has been lost to squashing. Now we get back the original state of |
Yeah, this started regressing since 838e617 (and unfortunately previous 6dade290797ae1852c8c1f816a3c856b1493dfdd seems not to be available anymore). |
Okay I found out, that when running this command: bin/opt -passes=instcombine,simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S testfile.ll The input BasicBlock ( bb2: ; preds = %bb
%mul = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %arg, i64 %arg1)
%mul.ov = extractvalue { i64, i1 } %mul, 1
%0 = xor i1 %mul.ov, true
br label %bb5 That |
So, the only reason it has worked before was the omission of the checking how many instructions will be hoisted. |
I think I will try to move this optimization to to |
Perhaps we could include WithOverflowInst *UnusedWO;
if (match(&I, m_CombineOr(m_ExtractValue<1>(m_OneUse(m_WithOverflowInst(UnusedWO))),
m_Not(m_ExtractValue<1>(m_OneUse(m_WithOverflowInst(UnusedWO)))))))
Found = true; But TBH, I believe the current code should be fine (regardless of the test). |
I have created another solution for this problem. It is in #128021. I think that is a better one. Please check that too. |
Fix issue #115683 .
Overflow arithmetic instruction plus extract value are usually generated when a division is being replaced, but the zero check may still be there. In that case hoist these two instructions out of this basic block, and let later optimizations take care of the unnecessary zero checks.
An example: