[SimplifyCFG] Treat umul + extract pattern as cheap single instruction. #124933

spaits · 2025-01-29T15:23:24Z

Fix issue #115683 .

Overflow arithmetic instruction plus extract value are usually generated when a division is being replaced, but the zero check may still be there. In that case hoist these two instructions out of this basic block, and let later optimizations take care of the unnecessary zero checks.

An example:

define i1 @src(i8 %x, i8 %y) zeroext {
entry:
  %cmp.not = icmp eq i8 %y, 0
  br i1 %cmp.not, label %land.end, label %land.rhs

land.rhs:
  %mul = umul_overflow i8 %y, %x
  %mul.ov = extractvalue {i8, i1} %mul, 1
  br label %land.end

land.end:
  %#0 = phi i1 [ 0, %entry ], [ %mul.ov, %land.rhs ]
  ret i1 %#0
}
=>
define i1 @tgt(i8 %x, i8 %y) zeroext {
entry:
  %fx = freeze i8 %x
  %mul = umul_overflow i8 %y, %fx
  %mul.ov = extractvalue {i8, i1} %mul, 1
  ret i1 %mul.ov
}

llvmbot · 2025-01-29T15:24:04Z

@llvm/pr-subscribers-llvm-transforms

Author: Gábor Spaits (spaits)

Changes

This pr was created to fix #115683 .

In the comments @dtcxzyw has suggested, to treat the the pattern extract oneuse(op.with.overflow),1 as one instruction.
If we want the speculativelyExecuteBB function to speculatively execute this block, then we must do that, since it only speculatively executes a block, if that has at most two instructions.

I was also thinking, maybe I should create a mechanism to add more cheap patterns like this one, and in the feature we just provide the pattern and don't have to create a add a new if (pattern && len(thenBlock) == 1 + (len(pattern))) then break to the speculativelyExecuteBB function.

What do you think? Is this solution correct? Or should I try to do this in a different way?
Any input is welcomed.

Full diff: https://github.com/llvm/llvm-project/pull/124933.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+17-5)
(added) llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll (+27)

diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 12dd49da279b9c..5197db70285cec 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -3290,9 +3290,11 @@ bool SimplifyCFGOpt::speculativelyExecuteBB(BranchInst *BI,
   bool HoistLoadsStores = HoistLoadsStoresWithCondFaulting &&
                           Options.HoistLoadsStoresWithCondFaulting;
   SmallVector<Instruction *, 2> SpeculatedConditionalLoadsStores;
+  InstructionCost BlockCostSoFar = 0;
   Value *SpeculatedStoreValue = nullptr;
   StoreInst *SpeculatedStore = nullptr;
   EphemeralValueTracker EphTracker;
+  bool PatternFound = false;
   for (Instruction &I : reverse(drop_end(*ThenBB))) {
     // Skip debug info.
     if (isa<DbgInfoIntrinsic>(I)) {
@@ -3329,9 +3331,6 @@ bool SimplifyCFGOpt::speculativelyExecuteBB(BranchInst *BI,
     else
       ++SpeculatedInstructions;
 
-    if (SpeculatedInstructions > 1)
-      return false;
-
     // Don't hoist the instruction if it's unsafe or expensive.
     if (!IsSafeCheapLoadStore &&
         !isSafeToSpeculativelyExecute(&I, BI, Options.AC) &&
@@ -3339,10 +3338,23 @@ bool SimplifyCFGOpt::speculativelyExecuteBB(BranchInst *BI,
           (SpeculatedStoreValue =
                isSafeToSpeculateStore(&I, BB, ThenBB, EndBB))))
       return false;
-    if (!IsSafeCheapLoadStore && !SpeculatedStoreValue &&
-        computeSpeculationCost(&I, TTI) >
+
+    if (match(&I,
+              m_ExtractValue<1>(m_OneUse(
+                  m_Intrinsic<Intrinsic::umul_with_overflow>(m_Value())))) &&
+        ThenBB->size() <= 3) {
+      PatternFound = true;
+    }
+
+    BlockCostSoFar += computeSpeculationCost(&I, TTI);
+    if (! PatternFound && !IsSafeCheapLoadStore && !SpeculatedStoreValue &&
+        BlockCostSoFar >
             PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic)
       return false;
+    // If we don't find any pattern, that must be cheap, then only speculatively
+    // execute a single instruction (not counting the terminator).
+    if (!PatternFound && SpeculatedInstructions > 1)
+      return false;
 
     // Store the store speculation candidate.
     if (!SpeculatedStore && SpeculatedStoreValue)
diff --git a/llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll b/llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll
new file mode 100644
index 00000000000000..72610218c314ec
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll
@@ -0,0 +1,27 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -passes=simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S < %s | FileCheck %s
+
+define dso_local signext range(i16 0, 2) i16 @func2(i64 noundef %x, i64 noundef %y) local_unnamed_addr #0 {
+; CHECK-LABEL: @func2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP_NOT:%.*]] = icmp eq i64 [[Y:%.*]], 0
+; CHECK-NEXT:    [[MUL:%.*]] = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 [[Y]], i64 [[X:%.*]])
+; CHECK-NEXT:    [[MUL_OV:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
+; CHECK-NEXT:    [[SPEC_SELECT:%.*]] = select i1 [[CMP_NOT]], i1 false, i1 [[MUL_OV]]
+; CHECK-NEXT:    [[CONV:%.*]] = zext i1 [[SPEC_SELECT]] to i16
+; CHECK-NEXT:    ret i16 [[CONV]]
+;
+entry:
+  %cmp.not = icmp eq i64 %y, 0
+  br i1 %cmp.not, label %land.end, label %land.rhs
+
+land.rhs:                                         ; preds = %entry
+  %mul = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %y, i64 %x)
+  %mul.ov = extractvalue { i64, i1 } %mul, 1
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %entry
+  %0 = phi i1 [ false, %entry ], [ %mul.ov, %land.rhs ]
+  %conv = zext i1 %0 to i16
+  ret i16 %conv
+}

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

github-actions · 2025-01-29T15:27:01Z

✅ With the latest revision this PR passed the C/C++ code formatter.

spaits · 2025-01-30T13:07:16Z

Also related to:
https://reviews.llvm.org/D65143
And fixes the problem pointed out in:
6b248fc

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

goldsteinn · 2025-01-31T02:42:51Z

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

+    // If we don't find any pattern, that must be cheap, then only speculatively
+    // execute a single instruction (not counting the terminator).
+    if (!PatternFound && SpeculatedInstructions > 1)
+      return false;


I think it may be easier to just add a MaxSpeculatedInstructionsToHoist variable and increment it if you find an overflow intrinsic whose only user is an extract.

Err, I was thinking something a bit different, particularly I don't really see the reason for accumulating BlockCostSoFar for the extract and mul.overflow if the expressed intent is to treat it as a single instruction.

What about something like:

bool PartialInst = false; ... // Not count load/store into cost if target supports conditional faulting // b/c it's cheap to speculate it. if (IsSafeCheapLoadStore) SpeculatedConditionalLoadsStores.push_back(&I); else if (match(&I, m_ExtractValue<1>( m_OneUse(m_WithOverflowInst(m_Value())))) && I.getOperand(0)->getParent() == ThenBB) // We are treating (extract (mul/sub/add.overflow), 1) as a single // instruction PartialInst = true; else ++SpeculatedInstructions; ... if (!IsSafeCheapLoadStore && !SpeculatedStoreValue && !PartialInst && computeSpeculationCost(&I, TTI) > PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic) return false;

Although I'm not really sure if the PartialInst part is necessary at all in fact.

I have removed the accumulating part of the PR. It was not needed. I have just added it because it looked more generic to me(#124933 (comment) later we would want to hoist more then one instruction but we would still care about the cost then we would need it, but not now). I know it is a generally bad practice to include code in a commit, that is not related to the main goal of the commit. Sorry for that.

If we don't have thePartialInst variable then the hoisting would not be done, because of the high cost.
On rv64, the value of PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic is 2 and the cost of the pattern is 5. So one of the instructions cost must be 3 and that would be over the threshold.

I was also thinking of extending the pattern. Maybe we should check, if the condition is truly a zero checking for one of the operand of the overflow operator. This hoist is probably beneficial in that case only, because later on the redundant check can be dismissed.
What do you think?

Also MaxSpeculatedInstructionsToHoist is still needed, because the match only finds the last instruction of the pattern, the others are still checked and therefore they increase the instruction count.

spaits · 2025-01-31T21:14:05Z

@goldsteinn Thank you very much for your review.
I have added the MaxSpeculatedInstructionsToHoist variable as you have suggested.
I kept the PatternFound. It is used to decide whether we should take in account the calculated cost of the basic block. I could use the MaxSpeculatedInstructionsToHoist variable to do this now: if the value of this is one, then there is no cheap patter, else there is. But, what if we would want to hoist more instructions than one, even if there is no patter? For that reason I keep an own variable for patterns.

dtcxzyw

LGTM. Thank you! Can you adjust the PR description to make it ready for merge?
Please wait for additional approval from other reviewers :)

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

spaits · 2025-02-02T21:36:53Z

@dtcxzyw @nikic Thank you very much for your review.
I have made some changes on the PR description. Is it proper for a commit message?
I have also fixes the pattern length and the instruction counting issue.
Also added a test with some more add instructions in the basic block, to see that we are not hoisting, if there are more instructions.
The tests still need a deeper look. I will do more work on them soon. When is use %add2 in the %99 = phi i32 ... the hoisting will happen. This is interesting, because when I do debugging I see the return false, when the instruction count hits 3. Maybe these instructions are being hoisted by another part of the SimplifyCFG pass?

llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll

spaits · 2025-02-04T15:45:06Z

@nikic Sorry for the slow response. I finally had some some time.
The problem with my test was that, I forgot to use the target triple = "riscv64-unknown-unknown-elf" attribute for the file.
When this is not there (or any other target) then the hoisting happens without the patch.
The function doing that is the following (If I understand everything correctly):

llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp

Line 8448 in 838e617

if (foldTwoEntryPHINode(PN, TTI, DTU, Options.AC, DL,

Maybe I should look more into that function and try to move this optimization there?

spaits · 2025-02-12T11:14:08Z

Gentle ping @nikic .
I have checked foldTwoEntryPHINode and dominatesMergePoint.

It turns out, that the cost of

 %mul = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %add, i64 %x)

is 1, when there is no target specified. When using the riscv64 target then this value is 5.

dominatesMergePoint works recursively so I think it would be more complex, to "change" the cost of the pattern there. I would personally keep things the way they are in the current version of the patch, but if you would want this to move elsewhere the I am going to do that.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

Co-authored-by: Antonio Frighetto <[email protected]>

spaits · 2025-02-12T22:37:27Z

@antoniofrighetto Thank you very much for your review.
I think I have addressed all the reviews. Sorry for the amount of typos.
In the case of your last comment about MaxSpeculatedInstructionsToHoist I think we should wait for other reviewers opinions too.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

antoniofrighetto · 2025-02-20T09:56:38Z

@spaits Looks like the CI is reporting a failure in PhaseOrdering/X86/unsigned-multiply-overflow-check.ll test, which should require an update, could you look at it please?

spaits · 2025-02-20T12:17:25Z

@antoniofrighetto I have fixed the failing test case. It seems like we have a regression, compared to some earlier state of the PR, that has been lost to squashing. Now we get back the original state of llvm/test/Transforms/PhaseOrdering/X86/unsigned-multiply-overflow-check.ll. I will check why it no longer fixes that X86 test case.

antoniofrighetto · 2025-02-20T12:41:28Z

@antoniofrighetto I have fixed the failing test case. It seems like we have a regression, compared to some earlier state of the PR, that has been lost to squashing. Now we get back the original state of llvm/test/Transforms/PhaseOrdering/X86/unsigned-multiply-overflow-check.ll. I will check why it no longer fixes that X86 test case.

Yeah, this started regressing since 838e617 (and unfortunately previous 6dade290797ae1852c8c1f816a3c856b1493dfdd seems not to be available anymore).

spaits · 2025-02-20T12:46:54Z

Okay I found out, that when running this command:

bin/opt -passes=instcombine,simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S testfile.ll

The input BasicBlock (ThenBB) for spewill look like this:

bb2:                                              ; preds = %bb
  %mul = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %arg, i64 %arg1)
  %mul.ov = extractvalue { i64, i1 } %mul, 1
  %0 = xor i1 %mul.ov, true
  br label %bb5

That xor is added into the basic block by InstCombinePass.

spaits · 2025-02-20T12:48:13Z

So, the only reason it has worked before was the omission of the checking how many instructions will be hoisted.

spaits · 2025-02-20T13:11:41Z

I think I will try to move this optimization to to foldTwoEntryPHINode. That function is calling the dominatesMergePoint function. In that I will match to the pattern and set a low cost value for this pattern.

antoniofrighetto · 2025-02-20T13:54:34Z

Perhaps we could include !extract oneuse(op.with.overflow),1 in the pattern, have something like (and adjust the limit):

WithOverflowInst *UnusedWO;
if (match(&I, m_CombineOr(m_ExtractValue<1>(m_OneUse(m_WithOverflowInst(UnusedWO))),
                          m_Not(m_ExtractValue<1>(m_OneUse(m_WithOverflowInst(UnusedWO)))))))
  Found = true;

But TBH, I believe the current code should be fine (regardless of the test).

spaits · 2025-02-20T15:58:06Z

I have created another solution for this problem. It is in #128021. I think that is a better one. Please check that too.

spaits requested review from nikic, RKSimon and dtcxzyw January 29, 2025 15:23

llvmbot added the llvm:transforms label Jan 29, 2025

spaits commented Jan 29, 2025

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

spaits changed the title ~~[Transform] Treat umul + extract pattern as cheap single instruction.~~ [SimplifyCFG] Treat umul + extract pattern as cheap single instruction. Jan 29, 2025

spaits added the llvm:optimizations label Jan 30, 2025

goldsteinn reviewed Jan 31, 2025

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Jan 31, 2025

View reviewed changes

This was referenced Feb 2, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

pre-commit: PR124933 dtcxzyw/llvm-opt-benchmark#2065

Closed

dtcxzyw approved these changes Feb 2, 2025

View reviewed changes

nikic reviewed Feb 2, 2025

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

nikic reviewed Feb 2, 2025

View reviewed changes

llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll Outdated Show resolved Hide resolved

llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll Outdated Show resolved Hide resolved

spaits commented Feb 2, 2025

View reviewed changes

llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll Show resolved Hide resolved

[Transform] Treat umul + extract pattern as cheap single instruction.

838e617

spaits force-pushed the llvm-issue-115683 branch from 6dade29 to 838e617 Compare February 4, 2025 15:38

antoniofrighetto reviewed Feb 12, 2025

View reviewed changes

spaits and others added 5 commits February 12, 2025 12:47

Fix typo

31dcf40

Co-authored-by: Antonio Frighetto <[email protected]>

Fix typo

134ad7b

Co-authored-by: Antonio Frighetto <[email protected]>

Update llvm/lib/Transforms/Utils/SimplifyCFG.cpp

3fb1a07

Co-authored-by: Antonio Frighetto <[email protected]>

Update llvm/lib/Transforms/Utils/SimplifyCFG.cpp

5634169

Co-authored-by: Antonio Frighetto <[email protected]>

Proper spacing for fields

3ed3573

antoniofrighetto reviewed Feb 14, 2025

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

Get rid of MaxSpeculatedInstructionsToHoist

63a3125

Fix failing test case

4d014d5

spaits mentioned this pull request Mar 12, 2025

[SimplifyCFG] Treat extract oneuse(op.with.overflow),1 pattern as a single instruction #128021

Merged

spaits closed this Mar 14, 2025

[SimplifyCFG] Treat umul + extract pattern as cheap single instruction. #124933

[SimplifyCFG] Treat umul + extract pattern as cheap single instruction. #124933

Uh oh!

Conversation

spaits commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 29, 2025

Uh oh!

Uh oh!

github-actions bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spaits commented Jan 30, 2025

Uh oh!

Uh oh!

goldsteinn Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

goldsteinn Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

spaits Feb 1, 2025

Choose a reason for hiding this comment

Uh oh!

spaits Feb 1, 2025

Choose a reason for hiding this comment

Uh oh!

spaits commented Jan 31, 2025

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spaits commented Feb 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spaits commented Feb 4, 2025

Uh oh!

spaits commented Feb 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spaits commented Feb 12, 2025

Uh oh!

Uh oh!

antoniofrighetto commented Feb 20, 2025

Uh oh!

spaits commented Feb 20, 2025

Uh oh!

antoniofrighetto commented Feb 20, 2025

Uh oh!

spaits commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spaits commented Feb 20, 2025

Uh oh!

spaits commented Feb 20, 2025

Uh oh!

antoniofrighetto commented Feb 20, 2025

Uh oh!

spaits commented Feb 20, 2025

Uh oh!

Uh oh!

spaits commented Jan 29, 2025 •

edited

Loading

github-actions bot commented Jan 29, 2025 •

edited

Loading

spaits commented Feb 20, 2025 •

edited

Loading