Skip to content

[CodeGen] Add an option to skip extTSP BB placement for huge functions. #99310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 24, 2024

Conversation

amharc
Copy link
Contributor

@amharc amharc commented Jul 17, 2024

The extTSP-based basic block layout algorithm improves the performance of the generated code, but unfortunately it has a super-linear time complexity. This leads to extremely long compilation times for certain relatively rare kinds of autogenerated code.

This patch adds an -mllvm flag to optionally restrict extTSP only to functions smaller than a specified threshold. While commit
bcdc047 added a knob to to limit the maximum chain size, it's still possible that for certain huge functions the number of chains is very large, leading to a quadratic behaviour in ExtTSPImpl::mergeChainPairs.

The extTSP-based basic block layout algorithm improves the performance
of the generated code, but unfortunately it has a super-linear time
complexity. This leads to extremely long compilation times for certain
kinds of autogenerated code.

This patch adds an -mllvm flag to restrict extTSP only to functions
smaller than a specified threshold. While commit
bcdc047 added a knob to to limit the
maximum chain size, it's still possible that for certain huge functions
the number of chains is very large, leading to a quadratic behaviour
in ExtTSPImpl::mergeChainPairs.
@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2024

@llvm/pr-subscribers-backend-x86

Author: Krzysztof Pszeniczny (amharc)

Changes

The extTSP-based basic block layout algorithm improves the performance of the generated code, but unfortunately it has a super-linear time complexity. This leads to extremely long compilation times for certain relatively rare kinds of autogenerated code.

This patch adds an -mllvm flag to optionally restrict extTSP only to functions smaller than a specified threshold. While commit
bcdc047 added a knob to to limit the maximum chain size, it's still possible that for certain huge functions the number of chains is very large, leading to a quadratic behaviour in ExtTSPImpl::mergeChainPairs.


Full diff: https://github.com/llvm/llvm-project/pull/99310.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineBlockPlacement.cpp (+8-1)
  • (modified) llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll (+15)
diff --git a/llvm/lib/CodeGen/MachineBlockPlacement.cpp b/llvm/lib/CodeGen/MachineBlockPlacement.cpp
index 4c864ca15ccc5..2f8e6adc43103 100644
--- a/llvm/lib/CodeGen/MachineBlockPlacement.cpp
+++ b/llvm/lib/CodeGen/MachineBlockPlacement.cpp
@@ -213,6 +213,12 @@ static cl::opt<bool> RenumberBlocksBeforeView(
         "into a dot graph. Only used when a function is being printed."),
     cl::init(false), cl::Hidden);
 
+static cl::opt<unsigned> ExtTspBlockPlacementMaxBlocks(
+    "ext-tsp-block-placement-max-blocks",
+    cl::desc("Maximum number of basic blocks in a function to run ext-TSP "
+             "block placement."),
+    cl::init(UINT_MAX), cl::Hidden);
+
 namespace llvm {
 extern cl::opt<bool> EnableExtTspBlockPlacement;
 extern cl::opt<bool> ApplyExtTspWithoutProfile;
@@ -3511,7 +3517,8 @@ bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
 
   // Apply a post-processing optimizing block placement.
   if (MF.size() >= 3 && EnableExtTspBlockPlacement &&
-      (ApplyExtTspWithoutProfile || MF.getFunction().hasProfileData())) {
+      (ApplyExtTspWithoutProfile || MF.getFunction().hasProfileData()) &&
+      MF.size() <= ExtTspBlockPlacementMaxBlocks) {
     // Find a new placement and modify the layout of the blocks in the function.
     applyExtTsp();
 
diff --git a/llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll b/llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll
index 842aced4884f7..ac172d32c6d8b 100644
--- a/llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll
+++ b/llvm/test/CodeGen/X86/code_placement_ext_tsp_large.ll
@@ -2,6 +2,7 @@
 ; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -enable-ext-tsp-block-placement=1 -ext-tsp-chain-split-threshold=128 -debug-only=block-placement < %s 2>&1 | FileCheck %s
 ; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -enable-ext-tsp-block-placement=1 -ext-tsp-chain-split-threshold=1 -debug-only=block-placement < %s 2>&1 | FileCheck %s -check-prefix=CHECK2
 ; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -enable-ext-tsp-block-placement=0 -debug-only=block-placement < %s 2>&1 | FileCheck %s -check-prefix=CHECK3
+; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -enable-ext-tsp-block-placement=1 -ext-tsp-block-placement-max-blocks=8 -debug-only=block-placement < %s 2>&1 | FileCheck %s -check-prefix=CHECK4
 
 @yydebug = dso_local global i32 0, align 4
 
@@ -110,6 +111,20 @@ define void @func_large() !prof !0 {
 ; CHECK3: b7
 ; CHECK3: b8
 ; CHECK3: b9
+;
+; An expected output with function size larger than the threshold -- the layout is not modified:
+;
+; CHECK4-LABEL: func_large:
+; CHECK4: b0
+; CHECK4: b1
+; CHECK4: b2
+; CHECK4: b3
+; CHECK4: b4
+; CHECK4: b5
+; CHECK4: b6
+; CHECK4: b7
+; CHECK4: b8
+; CHECK4: b9
 
 b0:
   %0 = load i32, ptr @yydebug, align 4

@spupyrev
Copy link
Contributor

Is it possible by any chance to play with the data leading to excessive runtimes? We could try to make the implementation faster instead of disabling it

@amharc
Copy link
Contributor Author

amharc commented Jul 17, 2024

Is it possible by any chance to play with the data leading to excessive runtimes? We could try to make the implementation faster instead of disabling it

I'll try to come up with something, but unfortunately huge IR is often nontrivial to strip of confidential information.

Usually such huge functions, even if they have a surprisingly large profile coverage, don't account for a huge amount of cores from the whole datacenter perspective. So it's actually likely that spending time on investigating how to apply ext-TSP for such abnormally large functions with a surprising number of hot chains might not be worth the savings.

That said, transformations that are significantly super-linear (e.g. quadratic) will almost certainly run into issues when running on huge functions. Adding a cut-off flag to avoid significantly increasing the compilation time seems like a reasonable action to me: this PR doesn't modify the default behaviour of the compiler (note: the default value of this flag effectively means "no limit"), it just allows imposing a cut-off.

@rlavaee
Copy link
Contributor

rlavaee commented Jul 22, 2024

I think eventually, we should improve the algorithm to use a priority-queue for the best merge candidate (thereby O(1) for finding the best merge and O(lg n) for updates), but until then a knob is best we can have. So this LGTM.

@amharc amharc merged commit 5bae81b into llvm:main Jul 24, 2024
9 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…s. (#99310)

Summary:
The extTSP-based basic block layout algorithm improves the performance
of the generated code, but unfortunately it has a super-linear time
complexity. This leads to extremely long compilation times for certain
relatively rare kinds of autogenerated code.

This patch adds an `-mllvm` flag to optionally restrict extTSP only to
functions smaller than a specified threshold. While commit
bcdc047 added a knob to to limit the
maximum chain size, it's still possible that for certain huge functions
the number of chains is very large, leading to a quadratic behaviour in
ExtTSPImpl::mergeChainPairs.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250584
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants