-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CodeGen] Add an option to skip extTSP BB placement for huge functions. #99310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The extTSP-based basic block layout algorithm improves the performance of the generated code, but unfortunately it has a super-linear time complexity. This leads to extremely long compilation times for certain kinds of autogenerated code. This patch adds an -mllvm flag to restrict extTSP only to functions smaller than a specified threshold. While commit bcdc047 added a knob to to limit the maximum chain size, it's still possible that for certain huge functions the number of chains is very large, leading to a quadratic behaviour in ExtTSPImpl::mergeChainPairs.
@llvm/pr-subscribers-backend-x86 Author: Krzysztof Pszeniczny (amharc) ChangesThe extTSP-based basic block layout algorithm improves the performance of the generated code, but unfortunately it has a super-linear time complexity. This leads to extremely long compilation times for certain relatively rare kinds of autogenerated code. This patch adds an Full diff: https://github.com/llvm/llvm-project/pull/99310.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineBlockPlacement.cpp b/llvm/lib/CodeGen/MachineBlockPlacement.cpp
index 4c864ca15ccc5..2f8e6adc43103 100644
--- a/llvm/lib/CodeGen/MachineBlockPlacement.cpp
+++ b/llvm/lib/CodeGen/MachineBlockPlacement.cpp
@@ -213,6 +213,12 @@ static cl::opt<bool> RenumberBlocksBeforeView(
"into a dot graph. Only used when a function is being printed."),
cl::init(false), cl::Hidden);
+static cl::opt<unsigned> ExtTspBlockPlacementMaxBlocks(
+ "ext-tsp-block-placement-max-blocks",
+ cl::desc("Maximum number of basic blocks in a function to run ext-TSP "
+ "block placement."),
+ cl::init(UINT_MAX), cl::Hidden);
+
namespace llvm {
extern cl::opt<bool> EnableExtTspBlockPlacement;
extern cl::opt<bool> ApplyExtTspWithoutProfile;
@@ -3511,7 +3517,8 @@ bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
// Apply a post-processing optimizing block placement.
if (MF.size() >= 3 && EnableExtTspBlockPlacement &&
- (ApplyExtTspWithoutProfile || MF.getFunction().hasProfileData())) {
+ (ApplyExtTspWithoutProfile || MF.getFunction().hasProfileData()) &&
+ MF.size() <= ExtTspBlockPlacementMaxBlocks) {
// Find a new placement and modify the layout of the blocks in the function.
applyExtTsp();
diff --git a/llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll b/llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll
index 842aced4884f7..ac172d32c6d8b 100644
--- a/llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll
+++ b/llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll
@@ -2,6 +2,7 @@
; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -enable-ext-tsp-block-placement=1 -ext-tsp-chain-split-threshold=128 -debug-only=block-placement < %s 2>&1 | FileCheck %s
; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -enable-ext-tsp-block-placement=1 -ext-tsp-chain-split-threshold=1 -debug-only=block-placement < %s 2>&1 | FileCheck %s -check-prefix=CHECK2
; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -enable-ext-tsp-block-placement=0 -debug-only=block-placement < %s 2>&1 | FileCheck %s -check-prefix=CHECK3
+; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -enable-ext-tsp-block-placement=1 -ext-tsp-block-placement-max-blocks=8 -debug-only=block-placement < %s 2>&1 | FileCheck %s -check-prefix=CHECK4
@yydebug = dso_local global i32 0, align 4
@@ -110,6 +111,20 @@ define void @func_large() !prof !0 {
; CHECK3: b7
; CHECK3: b8
; CHECK3: b9
+;
+; An expected output with function size larger than the threshold -- the layout is not modified:
+;
+; CHECK4-LABEL: func_large:
+; CHECK4: b0
+; CHECK4: b1
+; CHECK4: b2
+; CHECK4: b3
+; CHECK4: b4
+; CHECK4: b5
+; CHECK4: b6
+; CHECK4: b7
+; CHECK4: b8
+; CHECK4: b9
b0:
%0 = load i32, ptr @yydebug, align 4
|
Is it possible by any chance to play with the data leading to excessive runtimes? We could try to make the implementation faster instead of disabling it |
I'll try to come up with something, but unfortunately huge IR is often nontrivial to strip of confidential information. Usually such huge functions, even if they have a surprisingly large profile coverage, don't account for a huge amount of cores from the whole datacenter perspective. So it's actually likely that spending time on investigating how to apply ext-TSP for such abnormally large functions with a surprising number of hot chains might not be worth the savings. That said, transformations that are significantly super-linear (e.g. quadratic) will almost certainly run into issues when running on huge functions. Adding a cut-off flag to avoid significantly increasing the compilation time seems like a reasonable action to me: this PR doesn't modify the default behaviour of the compiler (note: the default value of this flag effectively means "no limit"), it just allows imposing a cut-off. |
I think eventually, we should improve the algorithm to use a priority-queue for the best merge candidate (thereby O(1) for finding the best merge and O(lg n) for updates), but until then a knob is best we can have. So this LGTM. |
…s. (#99310) Summary: The extTSP-based basic block layout algorithm improves the performance of the generated code, but unfortunately it has a super-linear time complexity. This leads to extremely long compilation times for certain relatively rare kinds of autogenerated code. This patch adds an `-mllvm` flag to optionally restrict extTSP only to functions smaller than a specified threshold. While commit bcdc047 added a knob to to limit the maximum chain size, it's still possible that for certain huge functions the number of chains is very large, leading to a quadratic behaviour in ExtTSPImpl::mergeChainPairs. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250584
The extTSP-based basic block layout algorithm improves the performance of the generated code, but unfortunately it has a super-linear time complexity. This leads to extremely long compilation times for certain relatively rare kinds of autogenerated code.
This patch adds an
-mllvm
flag to optionally restrict extTSP only to functions smaller than a specified threshold. While commitbcdc047 added a knob to to limit the maximum chain size, it's still possible that for certain huge functions the number of chains is very large, leading to a quadratic behaviour in ExtTSPImpl::mergeChainPairs.