-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT] Profile quality stats -- CFG discontinuity #109683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-bolt Author: ShatianWang (ShatianWang) ChangesAdd the profile quality check for CFG discontinuity. By default, the option print-continuity-stats is true, and a single line of BOLT-INFO will be printed that contains a single number to summarize the profile CFG discontinuity (closer to 0 the better). If more detailed stats are needed, print-bucketed-stats can be added to the BOLT invocation. Full diff: https://github.com/llvm/llvm-project/pull/109683.diff 5 Files Affected:
diff --git a/bolt/include/bolt/Passes/ContinuityStats.h b/bolt/include/bolt/Passes/ContinuityStats.h
new file mode 100644
index 00000000000000..b9edc09cbf55cc
--- /dev/null
+++ b/bolt/include/bolt/Passes/ContinuityStats.h
@@ -0,0 +1,40 @@
+//===- bolt/Passes/ContinuityStats.h - function cfg continuity analysis ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Conduct function CFG continuity analysis.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef BOLT_PASSES_CONTINUITYSTATS_H
+#define BOLT_PASSES_CONTINUITYSTATS_H
+
+#include <vector>
+#include "bolt/Passes/BinaryPasses.h"
+
+namespace llvm {
+
+class raw_ostream;
+
+namespace bolt {
+class BinaryContext;
+
+/// Compute and report to the user the function CFG continuity quality
+class PrintContinuityStats : public BinaryFunctionPass {
+public:
+ explicit PrintContinuityStats(const cl::opt<bool> &PrintPass)
+ : BinaryFunctionPass(PrintPass) {}
+
+ const char *getName() const override { return "continuity-stats"; }
+ bool shouldPrint(const BinaryFunction &) const override { return false; }
+ Error runOnFunctions(BinaryContext &BC) override;
+};
+
+} // namespace bolt
+} // namespace llvm
+
+#endif // BOLT_PASSES_CONTINUITYSTATS_H
diff --git a/bolt/lib/Passes/CMakeLists.txt b/bolt/lib/Passes/CMakeLists.txt
index 407d8b03f73977..1c1273b3d2420d 100644
--- a/bolt/lib/Passes/CMakeLists.txt
+++ b/bolt/lib/Passes/CMakeLists.txt
@@ -26,6 +26,7 @@ add_llvm_library(LLVMBOLTPasses
PatchEntries.cpp
PettisAndHansen.cpp
PLTCall.cpp
+ ContinuityStats.cpp
RegAnalysis.cpp
RegReAssign.cpp
ReorderAlgorithm.cpp
diff --git a/bolt/lib/Passes/ContinuityStats.cpp b/bolt/lib/Passes/ContinuityStats.cpp
new file mode 100644
index 00000000000000..d2ce97949c1c12
--- /dev/null
+++ b/bolt/lib/Passes/ContinuityStats.cpp
@@ -0,0 +1,270 @@
+//===- bolt/Passes/ContinuityStats.cpp - function cfg continuity analysis ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Conduct function CFG continuity analysis.
+//
+//===----------------------------------------------------------------------===//
+
+#include "bolt/Passes/ContinuityStats.h"
+#include "bolt/Core/BinaryBasicBlock.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Utils/CommandLineOpts.h"
+#include "llvm/Support/CommandLine.h"
+#include <queue>
+#include <unordered_map>
+#include <unordered_set>
+
+#define DEBUG_TYPE "bolt-opts"
+
+using namespace llvm;
+using namespace bolt;
+
+namespace opts {
+extern cl::opt<unsigned> Verbosity;
+cl::opt<unsigned> NumTopFunctions(
+ "num-top-functions",
+ cl::desc(
+ "number of hottest functions to print aggregated CFG discontinuity stats of."),
+ cl::init(1000), cl::ZeroOrMore, cl::Hidden, cl::cat(BoltOptCategory));
+cl::opt<bool> PrintBucketedStats(
+ "print-bucketed-stats",
+ cl::desc("print CFG discontinuity stats for the top functions divided into buckets "
+ "based on their execution counts."),
+ cl::Hidden, cl::cat(BoltCategory));
+cl::opt<unsigned>
+ NumFunctionsPerBucket("num-functions-per-bucket",
+ cl::desc("maximum number of functions per bucket."),
+ cl::init(500), cl::ZeroOrMore, cl::Hidden,
+ cl::cat(BoltOptCategory));
+cl::opt<unsigned>
+ MinNumFunctions("min-num-functions",
+ cl::desc("minimum number of hot functions in the binary to "
+ "trigger profile CFG continuity check."),
+ cl::init(5), cl::ZeroOrMore, cl::Hidden,
+ cl::cat(BoltOptCategory));
+} // namespace opts
+
+namespace {
+using FunctionListType = std::vector<const BinaryFunction *>;
+using function_iterator = FunctionListType::iterator;
+
+template <typename T>
+void printDistribution(raw_ostream &OS, std::vector<T> &values,
+ bool Fraction = false) {
+ if (values.empty())
+ return;
+ // Sort values from largest to smallest and print the MAX, TOP 1%, 5%, 10%,
+ // 20%, 50%, 80%, MIN. If Fraction is true, then values are printed as
+ // fractions instead of integers.
+ std::sort(values.begin(), values.end());
+
+ auto printLine = [&](std::string Text, double Percent) {
+ int Rank = int(values.size() * (1.0 - Percent / 100));
+ if (Percent == 0)
+ Rank = values.size() - 1;
+ if (Fraction)
+ OS << " " << Text << std::string(9 - Text.length(), ' ') << ": "
+ << format("%.2lf%%", values[Rank] * 100) << "\n";
+ else
+ OS << " " << Text << std::string(9 - Text.length(), ' ') << ": "
+ << values[Rank] << "\n";
+ };
+
+ printLine("MAX", 0);
+ int percentages[] = {1, 5, 10, 20, 50, 80};
+ for (size_t i = 0; i < sizeof(percentages) / sizeof(percentages[0]); ++i) {
+ printLine("TOP " + std::to_string(percentages[i]) + "%", percentages[i]);
+ }
+ printLine("MIN", 100);
+}
+
+void printCFGContinuityStats(raw_ostream &OS,
+ iterator_range<function_iterator> &Functions,
+ bool Verbose=false) {
+ // Given a perfect profile, every positive-execution-count BB should be
+ // connected to an entry of the function through a positive-execution-count
+ // directed path in the control flow graph.
+ std::vector<size_t> NumUnreachables;
+ std::vector<size_t> SumECUnreachables;
+ std::vector<double> FractionECUnreachables;
+
+ for (auto it = Functions.begin(); it != Functions.end(); ++it) {
+ const BinaryFunction *Function = *it;
+ if (Function->size() <= 1)
+ continue;
+
+ // Compute the sum of all BB execution counts (ECs).
+ size_t NumPosECBBs = 0;
+ size_t SumAllBBEC = 0;
+ for (const BinaryBasicBlock &BB : *Function) {
+ size_t BBEC = BB.getKnownExecutionCount();
+ NumPosECBBs += BBEC > 0 ? 1 : 0;
+ SumAllBBEC += BBEC;
+ }
+
+ // Perform BFS on subgraph of CFG induced by positive weight edges.
+ // Compute the number of BBs reachable from the entry(s) of the function and
+ // the sum of their execution counts (ECs).
+ std::unordered_map<unsigned, const BinaryBasicBlock *> IndexToBB;
+ std::unordered_set<unsigned> Visited;
+ std::queue<unsigned> Queue;
+ for (const BinaryBasicBlock &BB : *Function) {
+ // Make sure BB.getIndex() is not already in IndexToBB.
+ assert(IndexToBB.find(BB.getIndex()) == IndexToBB.end());
+ IndexToBB[BB.getIndex()] = &BB;
+ if (BB.isEntryPoint() && BB.getKnownExecutionCount() > 0) {
+ Queue.push(BB.getIndex());
+ Visited.insert(BB.getIndex());
+ }
+ }
+ while (!Queue.empty()) {
+ unsigned BBIndex = Queue.front();
+ const BinaryBasicBlock *BB = IndexToBB[BBIndex];
+ Queue.pop();
+ auto SuccBIIter = BB->branch_info_begin();
+ for (BinaryBasicBlock *Succ : BB->successors()) {
+ uint64_t Count = SuccBIIter->Count;
+ if (Count == BinaryBasicBlock::COUNT_NO_PROFILE || Count == 0) {
+ ++SuccBIIter;
+ continue;
+ }
+ if (!Visited.insert(Succ->getIndex()).second) {
+ ++SuccBIIter;
+ continue;
+ }
+ Queue.push(Succ->getIndex());
+ ++SuccBIIter;
+ }
+ }
+
+ size_t NumReachableBBs = Visited.size();
+
+ // Loop through Visited, and sum the corresponding BBs' execution counts (ECs).
+ size_t SumReachableBBEC = 0;
+ for (unsigned BBIndex : Visited) {
+ const BinaryBasicBlock *BB = IndexToBB[BBIndex];
+ SumReachableBBEC += BB->getKnownExecutionCount();
+ }
+
+ size_t NumPosECBBsUnreachableFromEntry = NumPosECBBs - NumReachableBBs;
+ size_t SumUnreachableBBEC = SumAllBBEC - SumReachableBBEC;
+ double FractionECUnreachable = (double)SumUnreachableBBEC / SumAllBBEC;
+
+ if (opts::Verbosity >= 2 && FractionECUnreachable > 0.1 &&
+ SumUnreachableBBEC > 50) {
+ OS << "Non-trivial CFG discontinuity observed in function "
+ << Function->getPrintName() << "\n";
+ LLVM_DEBUG(Function->dump());
+ }
+
+ NumUnreachables.push_back(NumPosECBBsUnreachableFromEntry);
+ SumECUnreachables.push_back(SumUnreachableBBEC);
+ FractionECUnreachables.push_back(FractionECUnreachable);
+ }
+
+ if (!Verbose) {
+ if (FractionECUnreachables.empty()) {
+ OS << "no functions have more than 1 basic block and hence no CFG discontinuity.\n";
+ return;
+ }
+ std::sort(FractionECUnreachables.begin(), FractionECUnreachables.end());
+ int Rank = int(FractionECUnreachables.size() * 0.95);
+ OS << format("the TOP 5%% function CFG discontinuity is %.2lf%%", FractionECUnreachables[Rank] * 100) << "\n";
+ return;
+ }
+
+ OS << format("Focus on %zu (%.2lf%%) of considered functions that have at "
+ "least 2 basic blocks\n",
+ SumECUnreachables.size(),
+ 100.0 * (double)SumECUnreachables.size() /
+ (std::distance(Functions.begin(), Functions.end())));
+
+ if (!NumUnreachables.empty()) {
+ OS << "- Distribution of NUM(unreachable POS BBs) among all focal "
+ "functions\n";
+ printDistribution(OS, NumUnreachables);
+ }
+ if (!SumECUnreachables.empty()) {
+ OS << "- Distribution of SUM(unreachable POS BBs) among all focal "
+ "functions\n";
+ printDistribution(OS, SumECUnreachables);
+ }
+ if (!FractionECUnreachables.empty()) {
+ OS << "- Distribution of [(SUM(unreachable POS BBs) / SUM(all "
+ "POS BBs))] among all focal functions\n";
+ printDistribution(OS, FractionECUnreachables, /*Fraction=*/true);
+ }
+}
+
+void printAll(BinaryContext &BC,
+ size_t NumFunctionsPerBucket,
+ size_t NumTopFunctions) {
+
+ // Create a list of functions with valid profiles.
+ FunctionListType ValidFunctions;
+ for (const auto &BFI : BC.getBinaryFunctions()) {
+ const BinaryFunction *Function = &BFI.second;
+ if (Function->empty() || !Function->hasValidProfile() ||
+ !Function->isSimple())
+ continue;
+ ValidFunctions.push_back(Function);
+ }
+
+ // Sort the list of functions by execution counts (reverse).
+ llvm::sort(ValidFunctions,
+ [&](const BinaryFunction *A, const BinaryFunction *B) {
+ return A->getKnownExecutionCount() > B->getKnownExecutionCount();
+ });
+
+ size_t RealNumTopFunctions = std::min(NumTopFunctions, ValidFunctions.size());
+ if (RealNumTopFunctions <= opts::MinNumFunctions)
+ return;
+ BC.outs() << format("BOLT-INFO: among the hottest %zu functions ", RealNumTopFunctions);
+ iterator_range<function_iterator> Functions(
+ ValidFunctions.begin(), ValidFunctions.begin() + RealNumTopFunctions);
+ printCFGContinuityStats(BC.outs(), Functions, /*Verbose=*/false);
+
+ // Print more detailed bucketed stats if requested.
+ if (opts::PrintBucketedStats) {
+ size_t PerBucketSize = std::min(NumFunctionsPerBucket, ValidFunctions.size());
+ if (PerBucketSize == 0)
+ return;
+ size_t NumBuckets = RealNumTopFunctions / PerBucketSize +
+ (RealNumTopFunctions % PerBucketSize != 0);
+ BC.outs() << format("Detailed stats for %zu buckets, each with at most %zu functions:\n",
+ NumBuckets, PerBucketSize);
+ BC.outs() << "For each considered function, identify positive execution-count basic blocks\n"
+ << "(abbr. POS BBs) that are *unreachable* from the function entry through a\n"
+ << "positive-execution-count path.\n";
+
+ // For each bucket, print the CFG continuity stats of the functions in the bucket.
+ for (size_t BucketIndex = 0; BucketIndex < NumBuckets; ++BucketIndex) {
+ const size_t StartIndex = BucketIndex * PerBucketSize;
+ size_t EndIndex = std::min(StartIndex + PerBucketSize, NumTopFunctions);
+ EndIndex = std::min(EndIndex, ValidFunctions.size());
+ iterator_range<function_iterator> Functions(
+ ValidFunctions.begin() + StartIndex, ValidFunctions.begin() + EndIndex);
+ const size_t MaxFunctionExecutionCount =
+ ValidFunctions[StartIndex]->getKnownExecutionCount();
+ const size_t MinFunctionExecutionCount =
+ ValidFunctions[EndIndex - 1]->getKnownExecutionCount();
+ BC.outs() << format("----------------\n| Bucket %zu: "
+ "|\n----------------\nExecution counts of the %zu functions in the bucket: "
+ "%zu-%zu\n",
+ BucketIndex + 1, EndIndex - StartIndex,
+ MinFunctionExecutionCount, MaxFunctionExecutionCount);
+ printCFGContinuityStats(BC.outs(), Functions, /*Verbose=*/true);
+ }
+ }
+}
+} // namespace
+
+Error PrintContinuityStats::runOnFunctions(BinaryContext &BC) {
+ printAll(BC, opts::NumFunctionsPerBucket, opts::NumTopFunctions);
+ return Error::success();
+}
diff --git a/bolt/lib/Rewrite/BinaryPassManager.cpp b/bolt/lib/Rewrite/BinaryPassManager.cpp
index 5dfef0b71cc79f..aad4d15cf2869d 100644
--- a/bolt/lib/Rewrite/BinaryPassManager.cpp
+++ b/bolt/lib/Rewrite/BinaryPassManager.cpp
@@ -11,6 +11,7 @@
#include "bolt/Passes/Aligner.h"
#include "bolt/Passes/AllocCombiner.h"
#include "bolt/Passes/AsmDump.h"
+#include "bolt/Passes/ContinuityStats.h"
#include "bolt/Passes/CMOVConversion.h"
#include "bolt/Passes/FixRISCVCallsPass.h"
#include "bolt/Passes/FixRelaxationPass.h"
@@ -154,6 +155,11 @@ static cl::opt<bool>
cl::desc("print profile quality/bias analysis"),
cl::cat(BoltCategory));
+static cl::opt<bool>
+ PrintContinuityStats("print-continuity-stats",
+ cl::desc("print profile function CFG continuity stats"),
+ cl::init(true), cl::cat(BoltCategory));
+
static cl::opt<bool>
PrintRegReAssign("print-regreassign",
cl::desc("print functions after regreassign pass"),
@@ -373,6 +379,9 @@ Error BinaryFunctionPassManager::runAllPasses(BinaryContext &BC) {
if (opts::PrintProfileStats)
Manager.registerPass(std::make_unique<PrintProfileStats>(NeverPrint));
+ if (opts::PrintContinuityStats)
+ Manager.registerPass(std::make_unique<PrintContinuityStats>(NeverPrint));
+
Manager.registerPass(std::make_unique<ValidateInternalCalls>(NeverPrint));
Manager.registerPass(std::make_unique<ValidateMemRefs>(NeverPrint));
diff --git a/bolt/test/X86/cfg-discontinuity-reporting.test b/bolt/test/X86/cfg-discontinuity-reporting.test
new file mode 100644
index 00000000000000..563d55c3013ff8
--- /dev/null
+++ b/bolt/test/X86/cfg-discontinuity-reporting.test
@@ -0,0 +1,5 @@
+## Check profile discontinuity reporting
+RUN: yaml2obj %p/Inputs/blarge_new.yaml &> %t.exe
+RUN: llvm-bolt %t.exe -o %t.out --pa -p %p/Inputs/blarge_new.preagg.txt \
+RUN: --print-continuity-stats --min-num-functions=1 | FileCheck %s
+CHECK: among the hottest 5 functions the TOP 5% function CFG discontinuity is 100.00%
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
static cl::opt<bool> | ||
PrintContinuityStats("print-continuity-stats", | ||
cl::desc("print profile function CFG continuity stats"), | ||
cl::init(true), cl::cat(BoltCategory)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather leave the pass off by default, in line with other profile analyses/reports available.
bolt/lib/Passes/ContinuityStats.cpp
Outdated
if (Function->empty() || !Function->hasValidProfile() || | ||
!Function->isSimple()) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align with shouldOptimize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's wait for @maksfb review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a brief description in the summary of what profile continuity/discontinuity means.
@@ -0,0 +1,42 @@ | |||
//===- bolt/Passes/ContinuityStats.h - function cfg continuity analysis ---*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: fix line wrap
bolt/lib/Passes/ContinuityStats.cpp
Outdated
@@ -0,0 +1,288 @@ | |||
//===- bolt/Passes/ContinuityStats.cpp - function cfg continuity analysis ---*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
bolt/lib/Passes/ContinuityStats.cpp
Outdated
} | ||
std::sort(FractionECUnreachables.begin(), FractionECUnreachables.end()); | ||
int Rank = int(FractionECUnreachables.size() * 0.95); | ||
OS << format("the TOP 5%% function CFG discontinuity is %.2lf%%", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to capitalize "TOP" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Changed to "top" in the newest commit.
bolt/lib/Passes/ContinuityStats.cpp
Outdated
cl::opt<unsigned> | ||
NumTopFunctions("num-top-functions", | ||
cl::desc("number of hottest functions to print aggregated " | ||
"CFG discontinuity stats of."), | ||
cl::init(1000), cl::ZeroOrMore, cl::Hidden, | ||
cl::cat(BoltOptCategory)); | ||
cl::opt<bool> | ||
PrintBucketedStats("print-bucketed-stats", | ||
cl::desc("print CFG discontinuity stats for the top " | ||
"functions divided into buckets " | ||
"based on their execution counts."), | ||
cl::Hidden, cl::cat(BoltCategory)); | ||
cl::opt<unsigned> | ||
NumFunctionsPerBucket("num-functions-per-bucket", | ||
cl::desc("maximum number of functions per bucket."), | ||
cl::init(500), cl::ZeroOrMore, cl::Hidden, | ||
cl::cat(BoltOptCategory)); | ||
cl::opt<unsigned> | ||
MinNumFunctions("min-num-functions", | ||
cl::desc("minimum number of hot functions in the binary to " | ||
"trigger profile CFG continuity check."), | ||
cl::init(5), cl::ZeroOrMore, cl::Hidden, | ||
cl::cat(BoltOptCategory)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of pass-specific options that are sharing the global option namespace. Option names like "--num-top-functions" and "--min-num-functions" are ambiguous when taken out of the continuity context.
Rather then make these options pass-specific, I suggest we print the data they control under "-v=1". WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed all but "-num-top-functions" and changed its name to "-num-functions-for-continuity-check" to make it more descriptive. As suggested, whether or not to print bucketed stats is now controlled by -v
.
static cl::opt<bool> PrintContinuityStats( | ||
"print-continuity-stats", | ||
cl::desc("print profile function CFG continuity stats"), | ||
cl::cat(BoltCategory)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the pass has low overhead and prints a single line by default, I suggest we enable it by default and drop the option. Make sure it doesn't print anything when there is no profile available.
22b6a21
to
2da1163
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add const
qualifier where applicable?
bolt/lib/Passes/ContinuityStats.cpp
Outdated
if (!Verbose) { | ||
std::sort(FractionECUnreachables.begin(), FractionECUnreachables.end()); | ||
int Rank = int(FractionECUnreachables.size() * 0.95); | ||
OS << format("BOLT-INFO: among the hottest %zu functions ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, we should print this line under -v
too.
& add const qualifiers when applicable
#109683 identified an issue with pre-aggregated profile where a call to continuation fallthrough edge count is missing (profile discontinuity). This issue only affects pre-aggregated profile but not perf data since LBR stack has the necessary information to determine if the trace (fall- through) starts at call continuation, whereas pre-aggregated fallthrough lacks this information. The solution is to look at branch records in pre-aggregated profiles that correspond to returns and assign counts to call to continuation fallthrough: - BranchFrom is in another function or DSO, - BranchTo may be a call continuation site: - not an entry point/landing pad. Note that we can't directly check if BranchFrom corresponds to a return instruction if it's in external DSO. Keep call continuation handling for perf data (`getFallthroughsInTrace`) [1] as-is due to marginally better performance. The difference is that return-converted call to continuation fallthrough is slightly more frequent than other fallthroughs since the former only requires one LBR address while the latter need two that belong to the profiled binary. Hence return-converted fallthroughs have larger "weight" which affects code layout. [1] `DataAggregator::getFallthroughsInTrace` https://github.com/llvm/llvm-project/blob/fea18afeed39fe4435d67eee1834f0f34b23013d/bolt/lib/Profile/DataAggregator.cpp#L906-L915 Test Plan: added callcont-fallthru.s Reviewers: maksfb, ayermolo, ShatianWang, dcci Reviewed By: maksfb, ShatianWang Pull Request: #109486
llvm#109683 identified an issue with pre-aggregated profile where a call to continuation fallthrough edge count is missing (profile discontinuity). This issue only affects pre-aggregated profile but not perf data since LBR stack has the necessary information to determine if the trace (fall- through) starts at call continuation, whereas pre-aggregated fallthrough lacks this information. The solution is to look at branch records in pre-aggregated profiles that correspond to returns and assign counts to call to continuation fallthrough: - BranchFrom is in another function or DSO, - BranchTo may be a call continuation site: - not an entry point/landing pad. Note that we can't directly check if BranchFrom corresponds to a return instruction if it's in external DSO. Keep call continuation handling for perf data (`getFallthroughsInTrace`) [1] as-is due to marginally better performance. The difference is that return-converted call to continuation fallthrough is slightly more frequent than other fallthroughs since the former only requires one LBR address while the latter need two that belong to the profiled binary. Hence return-converted fallthroughs have larger "weight" which affects code layout. [1] `DataAggregator::getFallthroughsInTrace` https://github.com/llvm/llvm-project/blob/fea18afeed39fe4435d67eee1834f0f34b23013d/bolt/lib/Profile/DataAggregator.cpp#L906-L915 Test Plan: added callcont-fallthru.s Reviewers: maksfb, ayermolo, ShatianWang, dcci Reviewed By: maksfb, ShatianWang Pull Request: llvm#109486
Add two additional profile quality stats for CG (call graph) and CFG (control flow graph) flow conservations besides the CFG discontinuity stats introduced in #109683. The two new stats quantify how different "in-flow" is from "out-flow" in the following cases where they should be equal. The smaller the reported stats, the better the flow conservations are. CG flow conservation: for each function that is not a program entry, the number of times the function is called according to CG ("in-flow") should be equal to the number of times the transition from an entry basic block of the function to another basic block within the function is recorded ("out-flow"). CFG flow conservation: for each basic block that is not a function entry or exit, the number of times the transition into this basic block from another basic block within the function is recorded ("in-flow") should be equal to the number of times the transition from this basic block to another basic block within the function is recorded ("out-flow"). Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump functions / basic blocks with bad flow conservations.
Add two additional profile quality stats for CG (call graph) and CFG (control flow graph) flow conservations besides the CFG discontinuity stats introduced in llvm#109683. The two new stats quantify how different "in-flow" is from "out-flow" in the following cases where they should be equal. The smaller the reported stats, the better the flow conservations are. CG flow conservation: for each function that is not a program entry, the number of times the function is called according to CG ("in-flow") should be equal to the number of times the transition from an entry basic block of the function to another basic block within the function is recorded ("out-flow"). CFG flow conservation: for each basic block that is not a function entry or exit, the number of times the transition into this basic block from another basic block within the function is recorded ("in-flow") should be equal to the number of times the transition from this basic block to another basic block within the function is recorded ("out-flow"). Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump functions / basic blocks with bad flow conservations.
In a perfect profile, each positive-execution-count block in the function’s CFG should be reachable from a positive-execution-count function entry block through a positive-execution-count path. This new pass checks how well the BOLT input profile satisfies this “CFG continuity” property.
More specifically, for each of the hottest 1000 functions, the pass calculates the function’s fraction of basic block execution counts that is “unreachable”. It then reports the 95th percentile of the distribution of the 1000 unreachable fractions in a single BOLT-INFO line. The smaller the reported value is, the better the BOLT profile satisfies the CFG continuity property.
The default value of 1000 above can be changed via the hidden BOLT option
-num-functions-for-continuity-check=[N]
. If more detailed stats are needed,-v=1
can be added to the BOLT invocation: the hottest N functions will be grouped into 5 equally-sized buckets, from the hottest to the coldest; for each bucket, various summary statistics of the distribution of the fractions and the raw unreachable execution counts will be reported.