Skip to content

[BOLT] Add profile density computation #101094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 25, 2024

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented Jul 29, 2024

Reuse the definition of profile density from llvm-profgen (#92144):

  • the density is computed in perf2bolt using raw samples (perf.data or
    pre-aggregated data),
  • function density is the ratio of dynamically executed function bytes
    to the static function size in bytes,
  • profile density:
    • functions are sorted by density in decreasing order, accumulating
      their respective sample counts,
    • profile density is the smallest density covering 99% of total sample
      count.

In other words, BOLT binary profile density is the minimum amount of
profile information per function (excluding functions in tail 1% sample
count) which is sufficient to optimize the binary well.

The density threshold of 60 was determined through experiments with
large binaries by reducing the sample count and checking resulting
profile density and performance. The threshold is conservative.

perf2bolt would print the warning if the density is below the threshold
and suggest to increase the sampling duration and/or frequency to reach
a given density, e.g.:

BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples.

Test Plan: updated pre-aggregated-perf.test

aaupov added 2 commits July 29, 2024 15:25
Created using spr 1.3.4
@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2024

@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/101094.diff

2 Files Affected:

  • (modified) bolt/lib/Passes/BinaryPasses.cpp (+73)
  • (modified) bolt/test/X86/pre-aggregated-perf.test (+3)
diff --git a/bolt/lib/Passes/BinaryPasses.cpp b/bolt/lib/Passes/BinaryPasses.cpp
index fa95ad7324ac1..23009bf74e077 100644
--- a/bolt/lib/Passes/BinaryPasses.cpp
+++ b/bolt/lib/Passes/BinaryPasses.cpp
@@ -223,6 +223,22 @@ static cl::opt<unsigned> TopCalledLimit(
              "functions section"),
     cl::init(100), cl::Hidden, cl::cat(BoltCategory));
 
+// Profile density options, synced with llvm-profgen/ProfileGenerator.cpp
+static cl::opt<bool> ShowDensity("show-density", cl::init(false),
+                                 cl::desc("show profile density details"),
+                                 cl::Optional);
+
+static cl::opt<int> ProfileDensityCutOffHot(
+    "profile-density-cutoff-hot", cl::init(990000),
+    cl::desc("Total samples cutoff for functions used to calculate "
+             "profile density."));
+
+static cl::opt<double> ProfileDensityThreshold(
+    "profile-density-threshold", cl::init(50),
+    cl::desc("If the profile density is below the given threshold, it "
+             "will be suggested to increase the sampling rate."),
+    cl::Optional);
+
 } // namespace opts
 
 namespace llvm {
@@ -1383,6 +1399,7 @@ Error PrintProgramStats::runOnFunctions(BinaryContext &BC) {
   uint64_t StaleSampleCount = 0;
   uint64_t InferredSampleCount = 0;
   std::vector<const BinaryFunction *> ProfiledFunctions;
+  std::vector<std::pair<double, uint64_t>> FuncDensityList;
   const char *StaleFuncsHeader = "BOLT-INFO: Functions with stale profile:\n";
   for (auto &BFI : BC.getBinaryFunctions()) {
     const BinaryFunction &Function = BFI.second;
@@ -1441,6 +1458,18 @@ Error PrintProgramStats::runOnFunctions(BinaryContext &BC) {
       StaleSampleCount += SampleCount;
       ++NumAllStaleFunctions;
     }
+
+    if (opts::ShowDensity) {
+      uint64_t Instructions = Function.getInputInstructionCount();
+      // In case of BOLT split functions registered in BAT, samples are
+      // automatically attributed to the main fragment. Add instructions from
+      // all fragments.
+      if (IsHotParentOfBOLTSplitFunction)
+        for (const BinaryFunction *Fragment : Function.getFragments())
+          Instructions += Fragment->getInputInstructionCount();
+      double Density = (double)1.0 * SampleCount / Instructions;
+      FuncDensityList.emplace_back(Density, SampleCount);
+    }
   }
   BC.NumProfiledFuncs = ProfiledFunctions.size();
   BC.NumStaleProfileFuncs = NumStaleProfileFunctions;
@@ -1684,6 +1713,50 @@ Error PrintProgramStats::runOnFunctions(BinaryContext &BC) {
       BC.outs() << ". Use -print-unknown to see the list.";
     BC.outs() << '\n';
   }
+
+  if (opts::ShowDensity) {
+    double Density = 0.0;
+    // Sorted by the density in descending order.
+    llvm::stable_sort(FuncDensityList,
+                      [&](const std::pair<double, uint64_t> &A,
+                          const std::pair<double, uint64_t> &B) {
+                        if (A.first != B.first)
+                          return A.first > B.first;
+                        return A.second < B.second;
+                      });
+
+    uint64_t AccumulatedSamples = 0;
+    uint32_t I = 0;
+    assert(opts::ProfileDensityCutOffHot <= 1000000 &&
+           "The cutoff value is greater than 1000000(100%)");
+    while (AccumulatedSamples <
+               TotalSampleCount *
+                   static_cast<float>(opts::ProfileDensityCutOffHot) /
+                   1000000 &&
+           I < FuncDensityList.size()) {
+      AccumulatedSamples += FuncDensityList[I].second;
+      Density = FuncDensityList[I].first;
+      I++;
+    }
+    if (Density == 0.0) {
+      BC.errs() << "BOLT-WARNING: the output profile is empty or the "
+                   "--profile-density-cutoff-hot option is "
+                   "set too low. Please check your command.\n";
+    } else if (Density < opts::ProfileDensityThreshold) {
+      BC.errs()
+          << "BOLT-WARNING: BOLT is estimated to optimize better with "
+          << format("%.1f", opts::ProfileDensityThreshold / Density)
+          << "x more samples. Please consider increasing sampling rate or "
+             "profiling for longer duration to get more samples.\n";
+    }
+
+    BC.outs() << "BOLT-INFO: Functions with density >= "
+              << format("%.1f", Density) << " account for "
+              << format("%.2f",
+                        static_cast<double>(opts::ProfileDensityCutOffHot) /
+                            10000)
+              << "% total sample counts.\n";
+  }
   return Error::success();
 }
 
diff --git a/bolt/test/X86/pre-aggregated-perf.test b/bolt/test/X86/pre-aggregated-perf.test
index 90252f9ff68da..fc6f332d53dfb 100644
--- a/bolt/test/X86/pre-aggregated-perf.test
+++ b/bolt/test/X86/pre-aggregated-perf.test
@@ -11,6 +11,8 @@ REQUIRES: system-linux
 
 RUN: yaml2obj %p/Inputs/blarge.yaml &> %t.exe
 RUN: perf2bolt %t.exe -o %t --pa -p %p/Inputs/pre-aggregated.txt -w %t.new \
+RUN:   --show-density --profile-density-threshold=9 \
+RUN:   --profile-density-cutoff-hot=970000 \
 RUN:   --profile-use-dfs | FileCheck %s
 
 RUN: llvm-bolt %t.exe -data %t -o %t.null | FileCheck %s
@@ -18,6 +20,7 @@ RUN: llvm-bolt %t.exe -data %t.new -o %t.null | FileCheck %s
 RUN: llvm-bolt %t.exe -p %p/Inputs/pre-aggregated.txt --pa -o %t.null | FileCheck %s
 
 CHECK: BOLT-INFO: 4 out of 7 functions in the binary (57.1%) have non-empty execution profile
+CHECK: BOLT-INFO: Functions with density >= 9.4 account for 97.00% total sample counts.
 
 RUN: cat %t | sort | FileCheck %s -check-prefix=PERF2BOLT
 RUN: cat %t.new | FileCheck %s -check-prefix=NEWFORMAT

…pre-aggregated profile

Created using spr 1.3.4
Created using spr 1.3.4
aaupov added 3 commits August 9, 2024 11:16
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
Created using spr 1.3.4
aaupov added 7 commits August 12, 2024 14:46
Created using spr 1.3.4
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
Created using spr 1.3.4
Copy link
Contributor

@wlei-llvm wlei-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Created using spr 1.3.4
Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks.

aaupov added a commit that referenced this pull request Oct 25, 2024
Align DataAggregator (Linux perf and pre-aggregated profile reader) to
DataReader (fdata profile reader) behavior: set BF->RawBranchCount which
is used in profile density computation (#101094).

Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe

Reviewed By: WenleiHe

Pull Request: #101093
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
@aaupov aaupov changed the base branch from users/aaupov/spr/main.bolt-add-profile-density-computation to main October 25, 2024 01:29
@aaupov aaupov merged commit 6ee5ff9 into main Oct 25, 2024
6 of 9 checks passed
@aaupov aaupov deleted the users/aaupov/spr/bolt-add-profile-density-computation branch October 25, 2024 01:31
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
Align DataAggregator (Linux perf and pre-aggregated profile reader) to
DataReader (fdata profile reader) behavior: set BF->RawBranchCount which
is used in profile density computation (llvm#101094).

Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe

Reviewed By: WenleiHe

Pull Request: llvm#101093
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
Reuse the definition of profile density from llvm-profgen (llvm#92144):
- the density is computed in perf2bolt using raw samples (perf.data or
  pre-aggregated data),
- function density is the ratio of dynamically executed function bytes
  to the static function size in bytes,
- profile density:
  - functions are sorted by density in decreasing order, accumulating
    their respective sample counts,
  - profile density is the smallest density covering 99% of total sample
    count.

In other words, BOLT binary profile density is the minimum amount of
profile information per function (excluding functions in tail 1% sample
count) which is sufficient to optimize the binary well.

The density threshold of 60 was determined through experiments with
large binaries by reducing the sample count and checking resulting
profile density and performance. The threshold is conservative.

perf2bolt would print the warning if the density is below the threshold
and suggest to increase the sampling duration and/or frequency to reach
a given density, e.g.:
```
BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples.
```

Test Plan: updated pre-aggregated-perf.test

Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe, wlei-llvm

Pull Request: llvm#101094
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants