Skip to content

[BOLT][heatmap] Compute section utilization and partition score #139193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented May 9, 2025

Heatmap groups samples into buckets of configurable size (--block-size
flag with 64 bytes as the default =X86 cache line size). Buckets are
mapped to containing sections; for buckets that cover multiple sections,
they are attributed to the first overlapping section. Buckets not mapped
to a section are reported as unmapped.

Heatmap reports section hotness which is a percentage of samples
attributed to the section.

Define section utilization as a percentage of buckets with non-zero
samples relative to the total number of section buckets.

Also define section partition score as a product of section hotness
(where total excludes unmapped buckets) and mapped utilization, ranging
from 0 to 1 (higher is better).

The intended use of new metrics is with production profile collected
from BOLT-optimized binary. In this case the partition score of .text
(hot text if function splitting is enabled) reflects optimization
profile
representativeness and the quality of hot-cold splitting.
Partition score of 1 means that all samples fall into hot text, and all
buckets (cache lines) in hot text are exercised, equivalent to perfect
hot-cold splitting.

Test Plan: updated heatmap-preagg.test

aaupov added 2 commits May 8, 2025 18:45
Created using spr 1.3.4
@llvmbot
Copy link
Member

llvmbot commented May 9, 2025

@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)

Changes

Heatmap collects samples grouped by buckets. The size is configurable
via --block-size, with 64 bytes as the default (X86 cache line size).

Define section utilization as the number of buckets mapped to the
section with non-zero samples divided by the total number of buckets
covering the section.

Note that for buckets that cross section boundaries, we will attribute
the utilization to the first overlapping section.

Test Plan: updated heatmap-preagg.test


Full diff: https://github.com/llvm/llvm-project/pull/139193.diff

4 Files Affected:

  • (modified) bolt/include/bolt/Profile/Heatmap.h (+20-2)
  • (modified) bolt/lib/Profile/DataAggregator.cpp (+4-2)
  • (modified) bolt/lib/Profile/Heatmap.cpp (+46-21)
  • (modified) bolt/test/X86/heatmap-preagg.test (+11-9)
diff --git a/bolt/include/bolt/Profile/Heatmap.h b/bolt/include/bolt/Profile/Heatmap.h
index fc1e2cd30011e..c7b3d45fa5cc2 100644
--- a/bolt/include/bolt/Profile/Heatmap.h
+++ b/bolt/include/bolt/Profile/Heatmap.h
@@ -9,6 +9,7 @@
 #ifndef BOLT_PROFILE_HEATMAP_H
 #define BOLT_PROFILE_HEATMAP_H
 
+#include "llvm/ADT/StringMap.h"
 #include "llvm/ADT/StringRef.h"
 #include <cstdint>
 #include <map>
@@ -45,6 +46,10 @@ class Heatmap {
   /// Map section names to their address range.
   const std::vector<SectionNameAndRange> TextSections;
 
+  uint64_t getNumBuckets(uint64_t Begin, uint64_t End) const {
+    return End / BucketSize + !!(End % BucketSize) - Begin / BucketSize;
+  };
+
 public:
   explicit Heatmap(uint64_t BucketSize = 4096, uint64_t MinAddress = 0,
                    uint64_t MaxAddress = std::numeric_limits<uint64_t>::max(),
@@ -77,9 +82,22 @@ class Heatmap {
 
   void printCDF(raw_ostream &OS) const;
 
-  void printSectionHotness(StringRef Filename) const;
+  /// Struct describing individual section hotness.
+  struct SectionStats {
+    uint64_t Samples{0};
+    uint64_t Buckets{0};
+  };
+
+  /// Mapping from section name to associated \p SectionStats. Special entries:
+  /// - [total] for total stats,
+  /// - [unmapped] for samples outside any section, if non-zero.
+  using SectionStatsMap = StringMap<SectionStats>;
+
+  SectionStatsMap computeSectionStats() const;
+
+  void printSectionHotness(const SectionStatsMap &, StringRef Filename) const;
 
-  void printSectionHotness(raw_ostream &OS) const;
+  void printSectionHotness(const SectionStatsMap &, raw_ostream &OS) const;
 
   size_t size() const { return Map.size(); }
 };
diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index a5ac87ee781b2..11850fab28bb8 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -1357,10 +1357,12 @@ std::error_code DataAggregator::printLBRHeatMap() {
     HM.printCDF(opts::OutputFilename);
   else
     HM.printCDF(opts::OutputFilename + ".csv");
+  Heatmap::SectionStatsMap Stats = HM.computeSectionStats();
   if (opts::OutputFilename == "-")
-    HM.printSectionHotness(opts::OutputFilename);
+    HM.printSectionHotness(Stats, opts::OutputFilename);
   else
-    HM.printSectionHotness(opts::OutputFilename + "-section-hotness.csv");
+    HM.printSectionHotness(Stats,
+                           opts::OutputFilename + "-section-hotness.csv");
 
   return std::error_code();
 }
diff --git a/bolt/lib/Profile/Heatmap.cpp b/bolt/lib/Profile/Heatmap.cpp
index c7821b3a1a15a..d3ff74f664046 100644
--- a/bolt/lib/Profile/Heatmap.cpp
+++ b/bolt/lib/Profile/Heatmap.cpp
@@ -284,23 +284,24 @@ void Heatmap::printCDF(raw_ostream &OS) const {
   Counts.clear();
 }
 
-void Heatmap::printSectionHotness(StringRef FileName) const {
+void Heatmap::printSectionHotness(const Heatmap::SectionStatsMap &Stats,
+                                  StringRef FileName) const {
   std::error_code EC;
   raw_fd_ostream OS(FileName, EC, sys::fs::OpenFlags::OF_None);
   if (EC) {
     errs() << "error opening output file: " << EC.message() << '\n';
     exit(1);
   }
-  printSectionHotness(OS);
+  printSectionHotness(Stats, OS);
 }
 
-void Heatmap::printSectionHotness(raw_ostream &OS) const {
+StringMap<Heatmap::SectionStats> Heatmap::computeSectionStats() const {
   uint64_t NumTotalCounts = 0;
-  StringMap<uint64_t> SectionHotness;
+  StringMap<SectionStats> Stat;
   unsigned TextSectionIndex = 0;
 
   if (TextSections.empty())
-    return;
+    return Stat;
 
   uint64_t UnmappedHotness = 0;
   auto RecordUnmappedBucket = [&](uint64_t Address, uint64_t Frequency) {
@@ -312,37 +313,61 @@ void Heatmap::printSectionHotness(raw_ostream &OS) const {
     UnmappedHotness += Frequency;
   };
 
-  for (const std::pair<const uint64_t, uint64_t> &KV : Map) {
-    NumTotalCounts += KV.second;
+  for (const auto [Bucket, Count] : Map) {
+    NumTotalCounts += Count;
     // We map an address bucket to the first section (lowest address)
     // overlapping with that bucket.
-    auto Address = KV.first * BucketSize;
+    auto Address = Bucket * BucketSize;
     while (TextSectionIndex < TextSections.size() &&
            Address >= TextSections[TextSectionIndex].EndAddress)
       TextSectionIndex++;
     if (TextSectionIndex >= TextSections.size() ||
         Address + BucketSize < TextSections[TextSectionIndex].BeginAddress) {
-      RecordUnmappedBucket(Address, KV.second);
+      RecordUnmappedBucket(Address, Count);
       continue;
     }
-    SectionHotness[TextSections[TextSectionIndex].Name] += KV.second;
+    SectionStats &SecStats = Stat[TextSections[TextSectionIndex].Name];
+    ++SecStats.Buckets;
+    SecStats.Samples += Count;
   }
+  Stat["[total]"] = SectionStats{NumTotalCounts, Map.size()};
+  if (UnmappedHotness)
+    Stat["[unmapped]"] = SectionStats{UnmappedHotness, 0};
+
+  return Stat;
+}
 
+void Heatmap::printSectionHotness(const StringMap<SectionStats> &Stats,
+                                  raw_ostream &OS) const {
+  if (TextSections.empty())
+    return;
+
+  auto TotalIt = Stats.find("[total]");
+  assert(TotalIt != Stats.end() && "Malformed SectionStatsMap");
+  const uint64_t NumTotalCounts = TotalIt->second.Samples;
   assert(NumTotalCounts > 0 &&
          "total number of heatmap buckets should be greater than 0");
 
-  OS << "Section Name, Begin Address, End Address, Percentage Hotness\n";
-  for (auto &TextSection : TextSections) {
-    OS << TextSection.Name << ", 0x"
-       << Twine::utohexstr(TextSection.BeginAddress) << ", 0x"
-       << Twine::utohexstr(TextSection.EndAddress) << ", "
-       << format("%.4f",
-                 100.0 * SectionHotness[TextSection.Name] / NumTotalCounts)
-       << "\n";
+  OS << "Section Name, Begin Address, End Address, Percentage Hotness, "
+     << "Utilization Pct\n";
+  for (const auto [Name, Begin, End] : TextSections) {
+    uint64_t Samples = 0;
+    uint64_t Buckets = 0;
+    auto SectionIt = Stats.find(Name);
+    if (SectionIt != Stats.end()) {
+      Samples = SectionIt->second.Samples;
+      Buckets = SectionIt->second.Buckets;
+    }
+    const float RelHotness = 100. * Samples / NumTotalCounts;
+    const float BucketUtilization = 100. * Buckets / getNumBuckets(Begin, End);
+    OS << formatv("{0}, {1:x}, {2:x}, {3:f4}, {4:f4}\n", Name, Begin, End,
+                  RelHotness, BucketUtilization);
   }
-  if (UnmappedHotness > 0)
-    OS << "[unmapped], 0x0, 0x0, "
-       << format("%.4f", 100.0 * UnmappedHotness / NumTotalCounts) << "\n";
+  auto UnmappedIt = Stats.find("[unmapped]");
+  if (UnmappedIt == Stats.end())
+    return;
+  const float UnmappedPct = 100. * UnmappedIt->second.Samples / NumTotalCounts;
+  OS << formatv("[unmapped], 0x0, 0x0, {0:f4}, 0\n", UnmappedPct);
 }
 } // namespace bolt
 } // namespace llvm
diff --git a/bolt/test/X86/heatmap-preagg.test b/bolt/test/X86/heatmap-preagg.test
index 00d4d521b1adf..660d37fd03cbe 100644
--- a/bolt/test/X86/heatmap-preagg.test
+++ b/bolt/test/X86/heatmap-preagg.test
@@ -17,17 +17,19 @@ RUN: FileCheck %s --check-prefix CHECK-SEC-HOT-BAT --input-file %t2-section-hotn
 CHECK-HEATMAP: PERF2BOLT: read 81 aggregated LBR entries
 CHECK-HEATMAP: HEATMAP: invalid traces: 1
 
-CHECK-SEC-HOT:      .init, 0x401000, 0x40101b, 16.8545
-CHECK-SEC-HOT-NEXT: .plt, 0x401020, 0x4010b0, 4.7583
-CHECK-SEC-HOT-NEXT: .text, 0x4010b0, 0x401c25, 78.3872
-CHECK-SEC-HOT-NEXT: .fini, 0x401c28, 0x401c35, 0.0000
+CHECK-SEC-HOT: Section Name, Begin Address, End Address, Percentage Hotness, Utilization Pct
+CHECK-SEC-HOT-NEXT: .init, 0x401000, 0x40101b, 16.8545, 100.0000
+CHECK-SEC-HOT-NEXT: .plt, 0x401020, 0x4010b0, 4.7583, 66.6667
+CHECK-SEC-HOT-NEXT: .text, 0x4010b0, 0x401c25, 78.3872, 85.1064
+CHECK-SEC-HOT-NEXT: .fini, 0x401c28, 0x401c35, 0.0000, 0.0000
 
 CHECK-HEATMAP-BAT: PERF2BOLT: read 79 aggregated LBR entries
 CHECK-HEATMAP-BAT: HEATMAP: invalid traces: 2
 
-CHECK-SEC-HOT-BAT:      .init, 0x401000, 0x40101b, 17.2888
-CHECK-SEC-HOT-BAT-NEXT: .plt, 0x401020, 0x4010b0, 5.6132
+CHECK-SEC-HOT-BAT: Section Name, Begin Address, End Address, Percentage Hotness, Utilization Pct
+CHECK-SEC-HOT-BAT-NEXT: .init, 0x401000, 0x40101b, 17.2888, 100.0000
+CHECK-SEC-HOT-BAT-NEXT: .plt, 0x401020, 0x4010b0, 5.6132, 66.6667
 CHECK-SEC-HOT-BAT-NEXT: .bolt.org.text, 0x4010b0, 0x401c25, 38.3385
-CHECK-SEC-HOT-BAT-NEXT: .fini, 0x401c28, 0x401c35, 0.0000
-CHECK-SEC-HOT-BAT-NEXT: .text, 0x800000, 0x8002cc, 38.7595
-CHECK-SEC-HOT-BAT-NEXT: .text.cold, 0x800300, 0x800415, 0.0000
+CHECK-SEC-HOT-BAT-NEXT: .fini, 0x401c28, 0x401c35, 0.0000, 0.0000
+CHECK-SEC-HOT-BAT-NEXT: .text, 0x800000, 0x8002cc, 38.7595, 91.6667
+CHECK-SEC-HOT-BAT-NEXT: .text.cold, 0x800300, 0x800415, 0.0000, 0.0000

aaupov added 5 commits May 9, 2025 12:43
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
@aaupov aaupov changed the title [BOLT] Compute section utilization in heatmap [BOLT][heatmap] Compute section utilization and partition score May 9, 2025
aaupov added 2 commits May 10, 2025 19:07
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM.

For practical applications, we sometimes use 3-way function splitting resulting in code being broken into more than two partitions. In such case, the most interesting metrics and score should be attached to non-cold partitions. With "hot text" enabled, the corresponding address range of such partition could be identified by [__hot_start, __hot_end).

@aaupov
Copy link
Contributor Author

aaupov commented May 13, 2025

The code LGTM.

For practical applications, we sometimes use 3-way function splitting resulting in code being broken into more than two partitions. In such case, the most interesting metrics and score should be attached to non-cold partitions. With "hot text" enabled, the corresponding address range of such partition could be identified by [__hot_start, __hot_end).

I checked one such binary with .warm and .cold sections. We report them separately, e.g.

.text, 0xa200000, 0xb279c25, 81.8506, 72.3839, 0.7174
.text.warm, 0xb279c40, 0xb3777f1, 0.4027, 90.5474, 0.0044
.text.cold, 0xb377800, 0x18b84ca0, 0.3315, 1.6892, 0.0001

Do you mean we should bundle .text with .text.warm for reporting purposes?
To me, it makes more sense to report them separately to see their sample percentages. The compromise would be to add a synthetic entry based on hot_start/hot_end symbols.

@maksfb
Copy link
Contributor

maksfb commented May 13, 2025

Do you mean we should bundle .text with .text.warm for reporting purposes?

Yes, as the goal is to achieve the perfect score of "1" for that combo partition.

@aaupov
Copy link
Contributor Author

aaupov commented May 13, 2025

Do you mean we should bundle .text with .text.warm for reporting purposes?

Yes, as the goal is to achieve the perfect score of "1" for that combo partition.

Let me add that as a follow-up (a synthetic hot text section between the symbols).

aaupov added 2 commits May 13, 2025 13:19
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
@aaupov aaupov changed the base branch from users/aaupov/spr/main.bolt-compute-section-utilization-in-heatmap to main May 13, 2025 20:19
@aaupov aaupov merged commit 7f4febd into main May 13, 2025
9 of 15 checks passed
@aaupov aaupov deleted the users/aaupov/spr/bolt-compute-section-utilization-in-heatmap branch May 13, 2025 20:20
aaupov added a commit that referenced this pull request May 14, 2025
In heatmap mode, report samples and utilization of the section(s)
between hot text markers `[__hot_start, __hot_end)`.

The intended use is with multi-way splitting where there are several
sections that contain "hot" code (e.g. `.text.warm` with CDSplit).

Addresses the comment on #139193

#139193 (review)

Test Plan: updated heatmap-preagg.test
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 14, 2025
In heatmap mode, report samples and utilization of the section(s)
between hot text markers `[__hot_start, __hot_end)`.

The intended use is with multi-way splitting where there are several
sections that contain "hot" code (e.g. `.text.warm` with CDSplit).

Addresses the comment on #139193

llvm/llvm-project#139193 (review)

Test Plan: updated heatmap-preagg.test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants