Skip to content

[SimplifyCFG] Treat umul + extract pattern as cheap single instruction. #124933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 25 additions & 6 deletions llvm/lib/Transforms/Utils/SimplifyCFG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3286,7 +3286,17 @@ bool SimplifyCFGOpt::speculativelyExecuteBB(BranchInst *BI,

SmallVector<Instruction *, 4> SpeculatedDbgIntrinsics;

// The number of already examined instructions. Debug instructions don't
// count.
unsigned SpeculatedInstructions = 0;

// In case we have found a cheap pattern, we don't want to do cost checking
// anymore. We are sure we want to hoist the pattern. To know, that we are
// only hoisting the cheap pattern only and not other expensive instructions
// too, we have the `MaxSpeculatedInstructionsToHoist` variable to track that
// the basic block truly only contains that pattern.
bool PartialInst = false;

bool HoistLoadsStores = HoistLoadsStoresWithCondFaulting &&
Options.HoistLoadsStoresWithCondFaulting;
SmallVector<Instruction *, 2> SpeculatedConditionalLoadsStores;
Expand Down Expand Up @@ -3316,34 +3326,43 @@ bool SimplifyCFGOpt::speculativelyExecuteBB(BranchInst *BI,
if (EphTracker.track(&I))
continue;

// Only speculatively execute a single instruction (not counting the
// terminator) for now.
bool IsSafeCheapLoadStore = HoistLoadsStores &&
isSafeCheapLoadStore(&I, TTI) &&
SpeculatedConditionalLoadsStores.size() <
HoistLoadsStoresWithCondFaultingThreshold;

// Overflow arithmetic instruction plus extract value are usually generated
// when a division is being replaced, but the zero check may still be there.
// In that case hoist these two instructions out of this basic block, and
// let later optimizations take care of the unnecessary zero checks.
WithOverflowInst *OverflowI;
if (match(&I, m_ExtractValue<1>(m_OneUse(m_WithOverflowInst(OverflowI)))))
PartialInst = true;
// Not count load/store into cost if target supports conditional faulting
// b/c it's cheap to speculate it.
if (IsSafeCheapLoadStore)
SpeculatedConditionalLoadsStores.push_back(&I);
else
++SpeculatedInstructions;

if (SpeculatedInstructions > 1)
return false;

// Don't hoist the instruction if it's unsafe or expensive.
if (!IsSafeCheapLoadStore &&
!isSafeToSpeculativelyExecute(&I, BI, Options.AC) &&
!(HoistCondStores && !SpeculatedStoreValue &&
(SpeculatedStoreValue =
isSafeToSpeculateStore(&I, BB, ThenBB, EndBB))))
return false;
if (!IsSafeCheapLoadStore && !SpeculatedStoreValue &&

if (!PartialInst && !IsSafeCheapLoadStore && !SpeculatedStoreValue &&
computeSpeculationCost(&I, TTI) >
PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic)
return false;

// The number of instructions to be speculatively executed is limited.
// This limit is dependent on the found patterns.
if (SpeculatedInstructions > (PartialInst ? 2 : 1))
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be easier to just add a MaxSpeculatedInstructionsToHoist variable and increment it if you find an overflow intrinsic whose only user is an extract.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, I was thinking something a bit different, particularly I don't really see the reason for accumulating BlockCostSoFar for the extract and mul.overflow if the expressed intent is to treat it as a single instruction.

What about something like:

    bool PartialInst = false;
    ...
    // Not count load/store into cost if target supports conditional faulting
    // b/c it's cheap to speculate it.
    if (IsSafeCheapLoadStore)
      SpeculatedConditionalLoadsStores.push_back(&I);
    else if (match(&I, m_ExtractValue<1>(
                           m_OneUse(m_WithOverflowInst(m_Value())))) &&
             I.getOperand(0)->getParent() == ThenBB)
      // We are treating (extract (mul/sub/add.overflow), 1) as a single
      // instruction
      PartialInst = true;
    else
      ++SpeculatedInstructions;
     ...
    if (!IsSafeCheapLoadStore && !SpeculatedStoreValue && !PartialInst &&
        computeSpeculationCost(&I, TTI) >
            PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic)
      return false;

Although I'm not really sure if the PartialInst part is necessary at all in fact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the accumulating part of the PR. It was not needed. I have just added it because it looked more generic to me(#124933 (comment) later we would want to hoist more then one instruction but we would still care about the cost then we would need it, but not now). I know it is a generally bad practice to include code in a commit, that is not related to the main goal of the commit. Sorry for that.

If we don't have thePartialInst variable then the hoisting would not be done, because of the high cost.
On rv64, the value of PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic is 2 and the cost of the pattern is 5. So one of the instructions cost must be 3 and that would be over the threshold.

I was also thinking of extending the pattern. Maybe we should check, if the condition is truly a zero checking for one of the operand of the overflow operator. This hoist is probably beneficial in that case only, because later on the redundant check can be dismissed.
What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also MaxSpeculatedInstructionsToHoist is still needed, because the match only finds the last instruction of the pattern, the others are still checked and therefore they increase the instruction count.


// Store the store speculation candidate.
if (!SpeculatedStore && SpeculatedStoreValue)
SpeculatedStore = cast<StoreInst>(&I);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,26 +45,17 @@ define i1 @will_not_overflow(i64 %arg, i64 %arg1) {
; INSTCOMBINESIMPLIFYCFGONLY-LABEL: @will_not_overflow(
; INSTCOMBINESIMPLIFYCFGONLY-NEXT: bb:
; INSTCOMBINESIMPLIFYCFGONLY-NEXT: [[T0:%.*]] = icmp eq i64 [[ARG:%.*]], 0
; INSTCOMBINESIMPLIFYCFGONLY-NEXT: br i1 [[T0]], label [[BB5:%.*]], label [[BB2:%.*]]
; INSTCOMBINESIMPLIFYCFGONLY: bb2:
; INSTCOMBINESIMPLIFYCFGONLY-NEXT: [[MUL:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 [[ARG]], i64 [[ARG1:%.*]])
; INSTCOMBINESIMPLIFYCFGONLY-NEXT: [[MUL_OV:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
; INSTCOMBINESIMPLIFYCFGONLY-NEXT: br label [[BB5]]
; INSTCOMBINESIMPLIFYCFGONLY: bb5:
; INSTCOMBINESIMPLIFYCFGONLY-NEXT: [[T6:%.*]] = phi i1 [ false, [[BB:%.*]] ], [ [[MUL_OV]], [[BB2]] ]
; INSTCOMBINESIMPLIFYCFGONLY-NEXT: [[T6:%.*]] = select i1 [[T0]], i1 false, i1 [[MUL_OV]]
; INSTCOMBINESIMPLIFYCFGONLY-NEXT: ret i1 [[T6]]
;
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-LABEL: @will_not_overflow(
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: bb:
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: [[T0:%.*]] = icmp eq i64 [[ARG:%.*]], 0
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: br i1 [[T0]], label [[BB5:%.*]], label [[BB2:%.*]]
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE: bb2:
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: [[MUL:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 [[ARG]], i64 [[ARG1:%.*]])
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: [[ARG1:%.*]] = freeze i64 [[ARG2:%.*]]
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: [[MUL:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 [[ARG:%.*]], i64 [[ARG1]])
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: [[MUL_OV:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: br label [[BB5]]
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE: bb5:
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: [[T6:%.*]] = phi i1 [ false, [[BB:%.*]] ], [ [[MUL_OV]], [[BB2]] ]
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: ret i1 [[T6]]
; INSTCOMBINESIMPLIFYCFGINSTCOMBINE-NEXT: ret i1 [[MUL_OV]]
;
bb:
%t0 = icmp eq i64 %arg, 0
Expand Down
59 changes: 59 additions & 0 deletions llvm/test/Transforms/SimplifyCFG/umul-extract-pattern.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -passes=simplifycfg -simplifycfg-require-and-preserve-domtree=1 -S < %s | FileCheck %s
target triple = "riscv64-unknown-unknown-elf"

define i16 @func2(i64 %x, i64 %y) {
; CHECK-LABEL: @func2(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i64 [[Y:%.*]], 0
; CHECK-NEXT: [[MUL:%.*]] = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 [[Y]], i64 [[X:%.*]])
; CHECK-NEXT: [[MUL_OV:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
; CHECK-NEXT: [[SPEC_SELECT:%.*]] = select i1 [[CMP_NOT]], i1 false, i1 [[MUL_OV]]
; CHECK-NEXT: [[CONV:%.*]] = zext i1 [[SPEC_SELECT]] to i16
; CHECK-NEXT: ret i16 [[CONV]]
;
entry:
%cmp.not = icmp eq i64 %y, 0
br i1 %cmp.not, label %land.end, label %land.rhs

land.rhs: ; preds = %entry
%mul = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %y, i64 %x)
%mul.ov = extractvalue { i64, i1 } %mul, 1
br label %land.end

land.end: ; preds = %land.rhs, %entry
%0 = phi i1 [ false, %entry ], [ %mul.ov, %land.rhs ]
%conv = zext i1 %0 to i16
ret i16 %conv
}

define i16 @noHoist(i64 %x, i64 %y) {
; CHECK-LABEL: @noHoist(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i64 [[Y:%.*]], 0
; CHECK-NEXT: br i1 [[CMP_NOT]], label [[LAND_END:%.*]], label [[LAND_RHS:%.*]]
; CHECK: land.rhs:
; CHECK-NEXT: [[ADD2:%.*]] = add nsw i64 [[Y]], [[X:%.*]]
; CHECK-NEXT: [[MUL:%.*]] = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 [[ADD2]], i64 [[X]])
; CHECK-NEXT: [[MUL_OV:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
; CHECK-NEXT: br label [[LAND_END]]
; CHECK: land.end:
; CHECK-NEXT: [[SPEC_SELECT:%.*]] = phi i1 [ false, [[ENTRY:%.*]] ], [ [[MUL_OV]], [[LAND_RHS]] ]
; CHECK-NEXT: [[CONV:%.*]] = zext i1 [[SPEC_SELECT]] to i16
; CHECK-NEXT: ret i16 [[CONV]]
;
entry:
%cmp.not = icmp eq i64 %y, 0
br i1 %cmp.not, label %land.end, label %land.rhs

land.rhs: ; preds = %entry
%add = add nsw i64 %y, %x
%mul = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %add, i64 %x)
%mul.ov = extractvalue { i64, i1 } %mul, 1
br label %land.end

land.end: ; preds = %land.rhs, %entry
%0 = phi i1 [ false, %entry ], [ %mul.ov, %land.rhs ]
%conv = zext i1 %0 to i16
ret i16 %conv
}