-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT] Improve profile quality reporting #130810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-bolt Author: ShatianWang (ShatianWang) ChangesFix a format issue in profile quality reporting introduced in 127954. Previously, when the set of functions or basic blocks for a specific profile quality score is empty, then nothing will be printed for this specific score following the "BOLT-INFO: profile quality metrics..." line. For small test binaries, it is often the case that the eligible sets are empty for all three scores. As a result, the next "BOLT-INFO" will not start a new line. This PR fixes the issue by reporting 0 profile quality gap for each quality metric if the corresponding eligible set is empty. Added test Full diff: https://github.com/llvm/llvm-project/pull/130810.diff 2 Files Affected:
diff --git a/bolt/lib/Passes/ProfileQualityStats.cpp b/bolt/lib/Passes/ProfileQualityStats.cpp
index 332c78da8a1e3..61d67b4f9068e 100644
--- a/bolt/lib/Passes/ProfileQualityStats.cpp
+++ b/bolt/lib/Passes/ProfileQualityStats.cpp
@@ -157,8 +157,10 @@ void printCFGContinuityStats(raw_ostream &OS,
FractionECUnreachables.push_back(FractionECUnreachable);
}
- if (FractionECUnreachables.empty())
+ if (FractionECUnreachables.empty()) {
+ OS << "function CFG discontinuity 0.00%; ";
return;
+ }
llvm::sort(FractionECUnreachables);
const int Rank = int(FractionECUnreachables.size() *
@@ -251,8 +253,10 @@ void printCallGraphFlowConservationStats(
}
}
- if (CallGraphGaps.empty())
+ if (CallGraphGaps.empty()) {
+ OS << "call graph flow conservation gap 0.00%; ";
return;
+ }
llvm::sort(CallGraphGaps);
const int Rank =
@@ -340,8 +344,11 @@ void printCFGFlowConservationStats(raw_ostream &OS,
}
}
- if (CFGGapsWeightedAvg.empty())
+ if (CFGGapsWeightedAvg.empty()) {
+ OS << "CFG flow conservation gap 0.00% (weighted) 0.00% (worst)\n";
return;
+ }
+
llvm::sort(CFGGapsWeightedAvg);
const int RankWA = int(CFGGapsWeightedAvg.size() *
opts::PercentileForProfileQualityCheck / 100);
diff --git a/bolt/test/X86/profile-quality-reporting-small-binary.s b/bolt/test/X86/profile-quality-reporting-small-binary.s
new file mode 100644
index 0000000000000..603d5c3218bc3
--- /dev/null
+++ b/bolt/test/X86/profile-quality-reporting-small-binary.s
@@ -0,0 +1,35 @@
+## Test that BOLT-INFO is correctly formatted after profile quality reporting for
+## a small binary.
+
+# RUN: llvm-mc --filetype=obj --triple x86_64-unknown-unknown %s -o %t.o
+# RUN: link_fdata %s %t.o %t.fdata
+# RUN: llvm-strip --strip-unneeded %t.o
+# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q
+# RUN: llvm-bolt %t.exe -o %t.bolt --data=%t.fdata \
+# RUN: 2>&1 | FileCheck %s
+
+# CHECK: BOLT-INFO: profile quality metrics for the hottest 2 functions (reporting top 5% values): function CFG discontinuity 0.00%; call graph flow conservation gap 0.00%; CFG flow conservation gap 0.00% (weighted) 0.00% (worst)
+# CHECK-NEXT: BOLT-INFO:
+
+ .text
+ .globl func
+ .type func, @function
+func:
+ pushq %rbp
+ ret
+LLfunc_end:
+ .size func, LLfunc_end-func
+
+
+ .globl main
+ .type main, @function
+main:
+ pushq %rbp
+ movq %rsp, %rbp
+LLmain_func:
+ call func
+# FDATA: 1 main #LLmain_func# 1 func 0 0 500
+ movl $4, %edi
+ retq
+.Lmain_end:
+ .size main, .Lmain_end-main
|
@@ -157,8 +157,10 @@ void printCFGContinuityStats(raw_ostream &OS, | |||
FractionECUnreachables.push_back(FractionECUnreachable); | |||
} | |||
|
|||
if (FractionECUnreachables.empty()) | |||
if (FractionECUnreachables.empty()) { | |||
OS << "function CFG discontinuity 0.00%; "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is due to single-BB functions, right? Can we just assume the discontinuity is 0 for them, instead of skipping in L94-95? And apply a similar logic to other metrics.
conservation score calculation
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good overall. Please address a few nits inline, and add a space between [BOLT] and the rest of the title. I would also split out changes related to reporting and exceptions, as recommended in https://llvm.org/docs/DeveloperPolicy.html#incremental-development.
@@ -187,8 +188,10 @@ void printCallGraphFlowConservationStats( | |||
std::vector<double> CallGraphGaps; | |||
|
|||
for (const BinaryFunction *Function : Functions) { | |||
if (Function->size() <= 1 || !Function->isSimple()) | |||
if (Function->size() <= 1 || !Function->isSimple()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should now allow non-simple functions participate in call graph flow conservation gap computation, as they might be important from the layout perspective, and we strive to make sure call graph profile is attached to/from them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do currently participate in the call graph flow conservation gap, because the function calls made by non-simple functions are used to construct CallGraphIncomingFlows
in function computeFlowMappings
.
We are skipping them here because I don't think we have a way to accurately get the net entry block CFG outflow for all non-simple functions.
continue; | ||
} | ||
|
||
uint64_t EntryInflow = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This chunk is just changing the indent, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
bool HasPosECLP = false; | ||
for (const BinaryBasicBlock *LP : BB.landing_pads()) { | ||
if (LP->getKnownExecutionCount() > 0) { | ||
HasPosECLP = true; | ||
break; | ||
} | ||
} | ||
if (HasPosECLP) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool HasPosECLP = false; | |
for (const BinaryBasicBlock *LP : BB.landing_pads()) { | |
if (LP->getKnownExecutionCount() > 0) { | |
HasPosECLP = true; | |
break; | |
} | |
} | |
if (HasPosECLP) | |
continue; | |
auto isPosEC = std::bind(&BinaryBasicBlock::getKnownExecutionCount, std::placeholders::_1); | |
if (llvm::any_of(BB.landing_pads(), isPosEC)) | |
continue; |
} | ||
// We only consider functions with at least MinLPECSum counts in landing | ||
// pads to avoid false positives due to sampling noise | ||
const uint16_t MinLPECSum = 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move out to the header or the beginning of this file as a constant?
@@ -365,6 +396,74 @@ void printCFGFlowConservationStats(raw_ostream &OS, | |||
} | |||
} | |||
|
|||
void printExceptionHandlingStats(const BinaryContext &BC, raw_ostream &OS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand what's collected by this function: is it the part of EC that's attributed to LP blocks and invokes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two things that are collected for each binary function
- Sum of execution counts on landing pads / sum of execution counts on all basic blocks (including landing pads)
- Sum of execution counts on landing pads / sum of call counts for all call instructions in the function that have a landing pad
auto isPosEC = std::bind(&BinaryBasicBlock::getKnownExecutionCount, | ||
std::placeholders::_1); | ||
if (llvm::any_of(BB.landing_pads(), isPosEC)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maksim told me about this trick:
auto isPosEC = std::bind(&BinaryBasicBlock::getKnownExecutionCount, | |
std::placeholders::_1); | |
if (llvm::any_of(BB.landing_pads(), isPosEC)) | |
if (llvm::any_of(BB.landing_pads(), | |
std::mem_fn(&BinaryBasicBlock::getKnownExecutionCount)) |
if (!BC.MIB->isCall(Inst)) | ||
continue; | ||
if (BC.MIB->isInvoke(Inst)) { | ||
const std::optional<MCPlus::MCLandingPad> EHInfo = | ||
BC.MIB->getEHInfo(Inst); | ||
if (EHInfo->first) | ||
InvokeECSum += BBEC; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invoke must be a call:
if (!BC.MIB->isCall(Inst)) | |
continue; | |
if (BC.MIB->isInvoke(Inst)) { | |
const std::optional<MCPlus::MCLandingPad> EHInfo = | |
BC.MIB->getEHInfo(Inst); | |
if (EHInfo->first) | |
InvokeECSum += BBEC; | |
} | |
if (!BC.MIB->isInvoke(Inst)) | |
continue; | |
const std::optional<MCPlus::MCLandingPad> EHInfo = | |
BC.MIB->getEHInfo(Inst); | |
if (EHInfo->first) | |
InvokeECSum += BBEC; |
Improve profile quality reporting by 1) fixing a format issue for small binaries, 2) adding new stats for exception handling usage, 3) excluding selected blocks when computing the CFG flow conservation score. More specifically for 3), we are excluding blocks that satisfy at least one of the following characteristics: a) is a landing pad, b) has at least one landing pad with non-zero execution counts, c) ends with a recursive call. The reason for a) and b) is because the thrower --> landing pad edges are not explicitly represented in the CFG. The reason for c) is because the call-continuation fallthrough edge count is not important in case of recursive calls. Modified test `bolt/test/X86/profile-quality-reporting.test`. Added test `bolt/test/X86/profile-quality-reporting-small-binary.s`.
Improve profile quality reporting by 1) fixing a format issue for small binaries, 2) adding new stats for exception handling usage, 3) excluding selected blocks when computing the CFG flow conservation score. More specifically for 3), we are excluding blocks that satisfy at least one of the following characteristics: a) is a landing pad, b) has at least one landing pad with non-zero execution counts, c) ends with a recursive call. The reason for a) and b) is because the thrower --> landing pad edges are not explicitly represented in the CFG. The reason for c) is because the call-continuation fallthrough edge count is not important in case of recursive calls. Modified test `bolt/test/X86/profile-quality-reporting.test`. Added test `bolt/test/X86/profile-quality-reporting-small-binary.s`.
Improve profile quality reporting by 1) fixing a format issue for small binaries, 2) adding new stats for exception handling usage, 3) excluding selected blocks when computing the CFG flow conservation score. More specifically for 3), we are excluding blocks that satisfy at least one of the following characteristics: a) is a landing pad, b) has at least one landing pad with non-zero execution counts, c) ends with a recursive call. The reason for a) and b) is because the thrower --> landing pad edges are not explicitly represented in the CFG. The reason for c) is because the call-continuation fallthrough edge count is not important in case of recursive calls. Modified test `bolt/test/X86/profile-quality-reporting.test`. Added test `bolt/test/X86/profile-quality-reporting-small-binary.s`.
Improve profile quality reporting by 1) fixing a format issue for small binaries, 2) adding new stats for exception handling usage, 3) excluding selected blocks when computing the CFG flow conservation score.
More specifically for 3), we are excluding blocks that satisfy at least one of the following characteristics: a) is a landing pad, b) has at least one landing pad with non-zero execution counts, c) ends with a recursive call. The reason for a) and b) is because the thrower --> landing pad edges are not explicitly represented in the CFG. The reason for c) is because the call-continuation fallthrough edge count is not important in case of recursive calls.
Modified test
bolt/test/X86/profile-quality-reporting.test
.Added test
bolt/test/X86/profile-quality-reporting-small-binary.s
.