-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[SimplifyCFG][NFC] Improve compile time for TryToSimplifyUncondBranchFromEmptyBlock optimization. #110715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Amara Emerson (aemerson) ChangesIn some pathological cases this optimization can spend an unreasonable amount of time. rdar://137063034 Full diff: https://github.com/llvm/llvm-project/pull/110715.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index 7659fc69196151..3e00a63e5f2cf9 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -122,6 +122,11 @@ static cl::opt<unsigned> MaxPhiEntriesIncreaseAfterRemovingEmptyBlock(
// bitreverse idioms.
static const unsigned BitPartRecursionMaxDepth = 48;
+static cl::opt<unsigned> MaxNumPredsThreshold(
+ "simplifycfg-uncondbr-max-num-preds-threshold", cl::init(128), cl::Hidden,
+ cl::desc("The maximum number of predecessors a block can have to simplify "
+ "uncond-brs from empty blocks."));
+
//===----------------------------------------------------------------------===//
// Local constant propagation.
//
@@ -1165,6 +1170,11 @@ bool llvm::TryToSimplifyUncondBranchFromEmptyBlock(BasicBlock *BB,
if (BB == Succ)
return false;
+ // Some pathological IR can have extremely high numbers of predecessor blocks.
+ // Bail out early if so.
+ if (Succ->hasNPredecessorsOrMore(MaxNumPredsThreshold))
+ return false;
+
SmallPtrSet<BasicBlock *, 16> BBPreds(pred_begin(BB), pred_end(BB));
SmallPtrSet<BasicBlock *, 16> SuccPreds(pred_begin(Succ), pred_end(Succ));
diff --git a/llvm/test/Transforms/SimplifyCFG/branch-common-pred-threshold.ll b/llvm/test/Transforms/SimplifyCFG/branch-common-pred-threshold.ll
new file mode 100644
index 00000000000000..3f1c93762e34eb
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/branch-common-pred-threshold.ll
@@ -0,0 +1,44 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=simplifycfg -simplifycfg-uncondbr-max-num-preds-threshold=3 -S | FileCheck %s
+
+; This test checks that we don't optimize the edge from Pred -> BB to Pred -> Succ when
+; the number of predecessors of Succ is greater than the threshold.
+
+define i8 @succ_has_3_preds(i8 noundef %arg, i1 %c1, i1 %c2) {
+; CHECK-LABEL: @succ_has_3_preds(
+; CHECK-NEXT: Pred:
+; CHECK-NEXT: call void @dummy()
+; CHECK-NEXT: br i1 [[C1:%.*]], label [[COMMONPRED:%.*]], label [[EXTRA_BB:%.*]]
+; CHECK: extra_bb:
+; CHECK-NEXT: br label [[SUCC:%.*]]
+; CHECK: CommonPred:
+; CHECK-NEXT: call void @dummy()
+; CHECK-NEXT: br i1 [[C2:%.*]], label [[SUCC]], label [[BB:%.*]]
+; CHECK: BB:
+; CHECK-NEXT: [[PHI1:%.*]] = phi i8 [ 1, [[COMMONPRED]] ]
+; CHECK-NEXT: br label [[SUCC]]
+; CHECK: Succ:
+; CHECK-NEXT: [[PHI2:%.*]] = phi i8 [ [[PHI1]], [[BB]] ], [ 4, [[COMMONPRED]] ], [ 0, [[EXTRA_BB]] ]
+; CHECK-NEXT: ret i8 [[PHI2]]
+;
+Pred:
+call void @dummy()
+ br i1 %c1, label %CommonPred, label %extra_bb
+
+extra_bb:
+ br i1 %c1, label %CommonPred, label %Succ
+
+CommonPred:
+call void @dummy()
+ br i1 %c2, label %Succ, label %BB
+
+BB:
+ %phi1 = phi i8 [1, %CommonPred]
+ br label %Succ
+
+Succ:
+ %phi2 = phi i8 [ %phi1, %BB ], [ 4, %CommonPred ], [0, %extra_bb]
+ ret i8 %phi2
+}
+
+declare void @dummy()
|
Could you please explain in more detail where the compile-time issue comes from? Just many predecessors is not a problem by itself, something must be causing quadratic complexity if this is an issue. 128 is really not particularly high in this context, that's something you'll easily see with a large switch statement. llvm-opt-benchmark results indicate that this does improve compile-time in some cases, but it also makes it much worse in others:
|
Reproduced locally.
After:
|
…FromEmptyBlock optimization. In some pathological cases this optimization can spend an unreasonable amount of time populating the set for predecessors of the successor block. This change sinks some of that initializing to the point where it's actually necessary so we can take advantage of the existing early-exits. rdar://137063034
0637e43
to
03d9ec6
Compare
Thanks for the data, I didn't know that benchmark suite existed, very cool! Our pathological case is quite dumb, it's not particularly any quadratic behavior I think (unless I'm mistaken). The issue was introduced by fc6bdb8 which added an unconditional initialization of the successor pointer set. In our test case that pointer set ends up being filled hundreds of thousands of times with 50k elements each time. In any case, I realized that this initial fix was the wrong thing to do anyway because we should only be trying to fix the new regression introduced by that change, the optimization before fc6bdb8 was fine by itself. I've changed the fix to instead try to sink the initialization of that set as late as possible and to re-use the predecessor walk done by |
ping @nikic does the new approach seem reasonable? |
llvm/lib/Transforms/Utils/Local.cpp
Outdated
@@ -1042,7 +1043,8 @@ CanRedirectPredsOfEmptyBBToSucc(BasicBlock *BB, BasicBlock *Succ, | |||
|
|||
// Get the single common predecessor of both BB and Succ. Return false | |||
// when there are more than one common predecessors. | |||
for (BasicBlock *SuccPred : SuccPreds) { | |||
for (BasicBlock *SuccPred : predecessors(Succ)) { | |||
SuccPredsOut.insert(SuccPred); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part is important to short-circuit in your case? Is it the checks above or the check in this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was the checks in this loop.
llvm/lib/Transforms/Utils/Local.cpp
Outdated
@@ -1182,6 +1184,10 @@ bool llvm::TryToSimplifyUncondBranchFromEmptyBlock(BasicBlock *BB, | |||
if ((!BBKillable && !BBPhisMergeable) || introduceTooManyPhiEntries(BB, Succ)) | |||
return false; | |||
|
|||
// SuccPreds may not have been fully filled by CanRedirectPredsOfEmptyBBToSucc | |||
// so finish it here. | |||
SuccPreds.insert(pred_begin(Succ), pred_end(Succ)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you iterate over all predecessors again here anyway, I think it would be cleaner to just only construct SuccPreds here and not try to partially cache it in CanRedirectPredsOfEmptyBBToSucc. It doesn't seem like we're actually saying any work, but it makes the API contract rather confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think you're right.
89a64fa
to
7a4c521
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
llvm/lib/Transforms/Utils/Local.cpp
Outdated
|
||
if ((!BBKillable && !BBPhisMergeable) || introduceTooManyPhiEntries(BB, Succ)) | ||
return false; | ||
|
||
SmallPtrSet<BasicBlock *, 16> SuccPreds(pred_begin(Succ), pred_end(Succ)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this further down to where it's actually needed now.
In some pathological cases this optimization can spend an unreasonable amount of time populating the set for predecessors of the successor block. This change sinks some of that initializing to the point where it's actually necessary so we can take advantage of the existing early-exits.
rdar://137063034