-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT] Add profile density computation #101094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOLT] Add profile density computation #101094
Conversation
Created using spr 1.3.4
Created using spr 1.3.4 [skip ci]
@llvm/pr-subscribers-bolt Author: Amir Ayupov (aaupov) ChangesFull diff: https://github.com/llvm/llvm-project/pull/101094.diff 2 Files Affected:
diff --git a/bolt/lib/Passes/BinaryPasses.cpp b/bolt/lib/Passes/BinaryPasses.cpp
index fa95ad7324ac1..23009bf74e077 100644
--- a/bolt/lib/Passes/BinaryPasses.cpp
+++ b/bolt/lib/Passes/BinaryPasses.cpp
@@ -223,6 +223,22 @@ static cl::opt<unsigned> TopCalledLimit(
"functions section"),
cl::init(100), cl::Hidden, cl::cat(BoltCategory));
+// Profile density options, synced with llvm-profgen/ProfileGenerator.cpp
+static cl::opt<bool> ShowDensity("show-density", cl::init(false),
+ cl::desc("show profile density details"),
+ cl::Optional);
+
+static cl::opt<int> ProfileDensityCutOffHot(
+ "profile-density-cutoff-hot", cl::init(990000),
+ cl::desc("Total samples cutoff for functions used to calculate "
+ "profile density."));
+
+static cl::opt<double> ProfileDensityThreshold(
+ "profile-density-threshold", cl::init(50),
+ cl::desc("If the profile density is below the given threshold, it "
+ "will be suggested to increase the sampling rate."),
+ cl::Optional);
+
} // namespace opts
namespace llvm {
@@ -1383,6 +1399,7 @@ Error PrintProgramStats::runOnFunctions(BinaryContext &BC) {
uint64_t StaleSampleCount = 0;
uint64_t InferredSampleCount = 0;
std::vector<const BinaryFunction *> ProfiledFunctions;
+ std::vector<std::pair<double, uint64_t>> FuncDensityList;
const char *StaleFuncsHeader = "BOLT-INFO: Functions with stale profile:\n";
for (auto &BFI : BC.getBinaryFunctions()) {
const BinaryFunction &Function = BFI.second;
@@ -1441,6 +1458,18 @@ Error PrintProgramStats::runOnFunctions(BinaryContext &BC) {
StaleSampleCount += SampleCount;
++NumAllStaleFunctions;
}
+
+ if (opts::ShowDensity) {
+ uint64_t Instructions = Function.getInputInstructionCount();
+ // In case of BOLT split functions registered in BAT, samples are
+ // automatically attributed to the main fragment. Add instructions from
+ // all fragments.
+ if (IsHotParentOfBOLTSplitFunction)
+ for (const BinaryFunction *Fragment : Function.getFragments())
+ Instructions += Fragment->getInputInstructionCount();
+ double Density = (double)1.0 * SampleCount / Instructions;
+ FuncDensityList.emplace_back(Density, SampleCount);
+ }
}
BC.NumProfiledFuncs = ProfiledFunctions.size();
BC.NumStaleProfileFuncs = NumStaleProfileFunctions;
@@ -1684,6 +1713,50 @@ Error PrintProgramStats::runOnFunctions(BinaryContext &BC) {
BC.outs() << ". Use -print-unknown to see the list.";
BC.outs() << '\n';
}
+
+ if (opts::ShowDensity) {
+ double Density = 0.0;
+ // Sorted by the density in descending order.
+ llvm::stable_sort(FuncDensityList,
+ [&](const std::pair<double, uint64_t> &A,
+ const std::pair<double, uint64_t> &B) {
+ if (A.first != B.first)
+ return A.first > B.first;
+ return A.second < B.second;
+ });
+
+ uint64_t AccumulatedSamples = 0;
+ uint32_t I = 0;
+ assert(opts::ProfileDensityCutOffHot <= 1000000 &&
+ "The cutoff value is greater than 1000000(100%)");
+ while (AccumulatedSamples <
+ TotalSampleCount *
+ static_cast<float>(opts::ProfileDensityCutOffHot) /
+ 1000000 &&
+ I < FuncDensityList.size()) {
+ AccumulatedSamples += FuncDensityList[I].second;
+ Density = FuncDensityList[I].first;
+ I++;
+ }
+ if (Density == 0.0) {
+ BC.errs() << "BOLT-WARNING: the output profile is empty or the "
+ "--profile-density-cutoff-hot option is "
+ "set too low. Please check your command.\n";
+ } else if (Density < opts::ProfileDensityThreshold) {
+ BC.errs()
+ << "BOLT-WARNING: BOLT is estimated to optimize better with "
+ << format("%.1f", opts::ProfileDensityThreshold / Density)
+ << "x more samples. Please consider increasing sampling rate or "
+ "profiling for longer duration to get more samples.\n";
+ }
+
+ BC.outs() << "BOLT-INFO: Functions with density >= "
+ << format("%.1f", Density) << " account for "
+ << format("%.2f",
+ static_cast<double>(opts::ProfileDensityCutOffHot) /
+ 10000)
+ << "% total sample counts.\n";
+ }
return Error::success();
}
diff --git a/bolt/test/X86/pre-aggregated-perf.test b/bolt/test/X86/pre-aggregated-perf.test
index 90252f9ff68da..fc6f332d53dfb 100644
--- a/bolt/test/X86/pre-aggregated-perf.test
+++ b/bolt/test/X86/pre-aggregated-perf.test
@@ -11,6 +11,8 @@ REQUIRES: system-linux
RUN: yaml2obj %p/Inputs/blarge.yaml &> %t.exe
RUN: perf2bolt %t.exe -o %t --pa -p %p/Inputs/pre-aggregated.txt -w %t.new \
+RUN: --show-density --profile-density-threshold=9 \
+RUN: --profile-density-cutoff-hot=970000 \
RUN: --profile-use-dfs | FileCheck %s
RUN: llvm-bolt %t.exe -data %t -o %t.null | FileCheck %s
@@ -18,6 +20,7 @@ RUN: llvm-bolt %t.exe -data %t.new -o %t.null | FileCheck %s
RUN: llvm-bolt %t.exe -p %p/Inputs/pre-aggregated.txt --pa -o %t.null | FileCheck %s
CHECK: BOLT-INFO: 4 out of 7 functions in the binary (57.1%) have non-empty execution profile
+CHECK: BOLT-INFO: Functions with density >= 9.4 account for 97.00% total sample counts.
RUN: cat %t | sort | FileCheck %s -check-prefix=PERF2BOLT
RUN: cat %t.new | FileCheck %s -check-prefix=NEWFORMAT
|
…pre-aggregated profile Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4 [skip ci]
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4 [skip ci]
Created using spr 1.3.4
Created using spr 1.3.4 [skip ci]
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Created using spr 1.3.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks.
Created using spr 1.3.4 [skip ci]
Align DataAggregator (Linux perf and pre-aggregated profile reader) to DataReader (fdata profile reader) behavior: set BF->RawBranchCount which is used in profile density computation (llvm#101094). Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe Reviewed By: WenleiHe Pull Request: llvm#101093
Reuse the definition of profile density from llvm-profgen (llvm#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated data), - function density is the ratio of dynamically executed function bytes to the static function size in bytes, - profile density: - functions are sorted by density in decreasing order, accumulating their respective sample counts, - profile density is the smallest density covering 99% of total sample count. In other words, BOLT binary profile density is the minimum amount of profile information per function (excluding functions in tail 1% sample count) which is sufficient to optimize the binary well. The density threshold of 60 was determined through experiments with large binaries by reducing the sample count and checking resulting profile density and performance. The threshold is conservative. perf2bolt would print the warning if the density is below the threshold and suggest to increase the sampling duration and/or frequency to reach a given density, e.g.: ``` BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples. ``` Test Plan: updated pre-aggregated-perf.test Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe, wlei-llvm Pull Request: llvm#101094
Reuse the definition of profile density from llvm-profgen (#92144):
pre-aggregated data),
to the static function size in bytes,
their respective sample counts,
count.
In other words, BOLT binary profile density is the minimum amount of
profile information per function (excluding functions in tail 1% sample
count) which is sufficient to optimize the binary well.
The density threshold of 60 was determined through experiments with
large binaries by reducing the sample count and checking resulting
profile density and performance. The threshold is conservative.
perf2bolt would print the warning if the density is below the threshold
and suggest to increase the sampling duration and/or frequency to reach
a given density, e.g.:
Test Plan: updated pre-aggregated-perf.test