Skip to content

[BOLT] Improve profile quality reporting #130810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 22, 2025
Merged

Conversation

ShatianWang
Copy link
Contributor

@ShatianWang ShatianWang commented Mar 11, 2025

Improve profile quality reporting by 1) fixing a format issue for small binaries, 2) adding new stats for exception handling usage, 3) excluding selected blocks when computing the CFG flow conservation score.

More specifically for 3), we are excluding blocks that satisfy at least one of the following characteristics: a) is a landing pad, b) has at least one landing pad with non-zero execution counts, c) ends with a recursive call. The reason for a) and b) is because the thrower --> landing pad edges are not explicitly represented in the CFG. The reason for c) is because the call-continuation fallthrough edge count is not important in case of recursive calls.

Modified test bolt/test/X86/profile-quality-reporting.test.
Added test bolt/test/X86/profile-quality-reporting-small-binary.s.

@ShatianWang ShatianWang changed the title [BOLT]Report 0 profile quality gaps if no eligible functions [BOLT]Fix profile quality scores reporting for small binaries Mar 11, 2025
@ShatianWang ShatianWang changed the title [BOLT]Fix profile quality scores reporting for small binaries [BOLT]Fix profile quality reporting for small binaries Mar 11, 2025
@ShatianWang ShatianWang marked this pull request as ready for review March 12, 2025 01:08
@llvmbot llvmbot added the BOLT label Mar 12, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 12, 2025

@llvm/pr-subscribers-bolt

Author: ShatianWang (ShatianWang)

Changes

Fix a format issue in profile quality reporting introduced in 127954.
The profile quality scores are computed on a subset of eligible functions and basic blocks. For instance, if a function contains a single block, then it is not eligible for the CF flow conservation analysis, meaning that the function will not affect the reported CFG flow conservation gap.

Previously, when the set of functions or basic blocks for a specific profile quality score is empty, then nothing will be printed for this specific score following the "BOLT-INFO: profile quality metrics..." line. For small test binaries, it is often the case that the eligible sets are empty for all three scores. As a result, the next "BOLT-INFO" will not start a new line. This PR fixes the issue by reporting 0 profile quality gap for each quality metric if the corresponding eligible set is empty.

Added test bolt/test/X86/profile-quality-reporting-small-binary.s.


Full diff: https://github.com/llvm/llvm-project/pull/130810.diff

2 Files Affected:

  • (modified) bolt/lib/Passes/ProfileQualityStats.cpp (+10-3)
  • (added) bolt/test/X86/profile-quality-reporting-small-binary.s (+35)
diff --git a/bolt/lib/Passes/ProfileQualityStats.cpp b/bolt/lib/Passes/ProfileQualityStats.cpp
index 332c78da8a1e3..61d67b4f9068e 100644
--- a/bolt/lib/Passes/ProfileQualityStats.cpp
+++ b/bolt/lib/Passes/ProfileQualityStats.cpp
@@ -157,8 +157,10 @@ void printCFGContinuityStats(raw_ostream &OS,
     FractionECUnreachables.push_back(FractionECUnreachable);
   }
 
-  if (FractionECUnreachables.empty())
+  if (FractionECUnreachables.empty()) {
+    OS << "function CFG discontinuity 0.00%; ";
     return;
+  }
 
   llvm::sort(FractionECUnreachables);
   const int Rank = int(FractionECUnreachables.size() *
@@ -251,8 +253,10 @@ void printCallGraphFlowConservationStats(
     }
   }
 
-  if (CallGraphGaps.empty())
+  if (CallGraphGaps.empty()) {
+    OS << "call graph flow conservation gap 0.00%; ";
     return;
+  }
 
   llvm::sort(CallGraphGaps);
   const int Rank =
@@ -340,8 +344,11 @@ void printCFGFlowConservationStats(raw_ostream &OS,
     }
   }
 
-  if (CFGGapsWeightedAvg.empty())
+  if (CFGGapsWeightedAvg.empty()) {
+    OS << "CFG flow conservation gap 0.00% (weighted) 0.00% (worst)\n";
     return;
+  }
+
   llvm::sort(CFGGapsWeightedAvg);
   const int RankWA = int(CFGGapsWeightedAvg.size() *
                          opts::PercentileForProfileQualityCheck / 100);
diff --git a/bolt/test/X86/profile-quality-reporting-small-binary.s b/bolt/test/X86/profile-quality-reporting-small-binary.s
new file mode 100644
index 0000000000000..603d5c3218bc3
--- /dev/null
+++ b/bolt/test/X86/profile-quality-reporting-small-binary.s
@@ -0,0 +1,35 @@
+## Test that BOLT-INFO is correctly formatted after profile quality reporting for
+## a small binary.
+
+# RUN: llvm-mc --filetype=obj --triple x86_64-unknown-unknown %s -o %t.o
+# RUN: link_fdata %s %t.o %t.fdata
+# RUN: llvm-strip --strip-unneeded %t.o
+# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q
+# RUN: llvm-bolt %t.exe -o %t.bolt --data=%t.fdata \
+# RUN:     2>&1 | FileCheck %s
+
+# CHECK: BOLT-INFO: profile quality metrics for the hottest 2 functions (reporting top 5% values): function CFG discontinuity 0.00%; call graph flow conservation gap 0.00%; CFG flow conservation gap 0.00% (weighted) 0.00% (worst)
+# CHECK-NEXT: BOLT-INFO:
+
+        .text
+        .globl  func
+        .type   func, @function
+func:
+        pushq   %rbp
+        ret
+LLfunc_end:
+        .size   func, LLfunc_end-func
+
+
+        .globl  main
+        .type   main, @function
+main:
+        pushq   %rbp
+        movq    %rsp, %rbp
+LLmain_func:
+        call    func
+# FDATA: 1 main #LLmain_func# 1 func 0 0 500
+        movl    $4, %edi
+        retq
+.Lmain_end:
+        .size   main, .Lmain_end-main

@@ -157,8 +157,10 @@ void printCFGContinuityStats(raw_ostream &OS,
FractionECUnreachables.push_back(FractionECUnreachable);
}

if (FractionECUnreachables.empty())
if (FractionECUnreachables.empty()) {
OS << "function CFG discontinuity 0.00%; ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is due to single-BB functions, right? Can we just assume the discontinuity is 0 for them, instead of skipping in L94-95? And apply a similar logic to other metrics.

@ShatianWang ShatianWang changed the title [BOLT]Fix profile quality reporting for small binaries [BOLT]Improve profile quality reporting Apr 3, 2025
Copy link

github-actions bot commented Apr 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good overall. Please address a few nits inline, and add a space between [BOLT] and the rest of the title. I would also split out changes related to reporting and exceptions, as recommended in https://llvm.org/docs/DeveloperPolicy.html#incremental-development.

@@ -187,8 +188,10 @@ void printCallGraphFlowConservationStats(
std::vector<double> CallGraphGaps;

for (const BinaryFunction *Function : Functions) {
if (Function->size() <= 1 || !Function->isSimple())
if (Function->size() <= 1 || !Function->isSimple()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should now allow non-simple functions participate in call graph flow conservation gap computation, as they might be important from the layout perspective, and we strive to make sure call graph profile is attached to/from them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do currently participate in the call graph flow conservation gap, because the function calls made by non-simple functions are used to construct CallGraphIncomingFlows in function computeFlowMappings.

We are skipping them here because I don't think we have a way to accurately get the net entry block CFG outflow for all non-simple functions.

continue;
}

uint64_t EntryInflow = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chunk is just changing the indent, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines 314 to 322
bool HasPosECLP = false;
for (const BinaryBasicBlock *LP : BB.landing_pads()) {
if (LP->getKnownExecutionCount() > 0) {
HasPosECLP = true;
break;
}
}
if (HasPosECLP)
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool HasPosECLP = false;
for (const BinaryBasicBlock *LP : BB.landing_pads()) {
if (LP->getKnownExecutionCount() > 0) {
HasPosECLP = true;
break;
}
}
if (HasPosECLP)
continue;
auto isPosEC = std::bind(&BinaryBasicBlock::getKnownExecutionCount, std::placeholders::_1);
if (llvm::any_of(BB.landing_pads(), isPosEC))
continue;

}
// We only consider functions with at least MinLPECSum counts in landing
// pads to avoid false positives due to sampling noise
const uint16_t MinLPECSum = 50;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move out to the header or the beginning of this file as a constant?

@@ -365,6 +396,74 @@ void printCFGFlowConservationStats(raw_ostream &OS,
}
}

void printExceptionHandlingStats(const BinaryContext &BC, raw_ostream &OS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand what's collected by this function: is it the part of EC that's attributed to LP blocks and invokes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things that are collected for each binary function

  1. Sum of execution counts on landing pads / sum of execution counts on all basic blocks (including landing pads)
  2. Sum of execution counts on landing pads / sum of call counts for all call instructions in the function that have a landing pad

@ShatianWang ShatianWang changed the title [BOLT]Improve profile quality reporting [BOLT] Improve profile quality reporting Apr 17, 2025
Comment on lines 321 to 323
auto isPosEC = std::bind(&BinaryBasicBlock::getKnownExecutionCount,
std::placeholders::_1);
if (llvm::any_of(BB.landing_pads(), isPosEC))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maksim told me about this trick:

Suggested change
auto isPosEC = std::bind(&BinaryBasicBlock::getKnownExecutionCount,
std::placeholders::_1);
if (llvm::any_of(BB.landing_pads(), isPosEC))
if (llvm::any_of(BB.landing_pads(),
std::mem_fn(&BinaryBasicBlock::getKnownExecutionCount))

Comment on lines 414 to 421
if (!BC.MIB->isCall(Inst))
continue;
if (BC.MIB->isInvoke(Inst)) {
const std::optional<MCPlus::MCLandingPad> EHInfo =
BC.MIB->getEHInfo(Inst);
if (EHInfo->first)
InvokeECSum += BBEC;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invoke must be a call:

Suggested change
if (!BC.MIB->isCall(Inst))
continue;
if (BC.MIB->isInvoke(Inst)) {
const std::optional<MCPlus::MCLandingPad> EHInfo =
BC.MIB->getEHInfo(Inst);
if (EHInfo->first)
InvokeECSum += BBEC;
}
if (!BC.MIB->isInvoke(Inst))
continue;
const std::optional<MCPlus::MCLandingPad> EHInfo =
BC.MIB->getEHInfo(Inst);
if (EHInfo->first)
InvokeECSum += BBEC;

@ShatianWang ShatianWang merged commit ce2b3ce into llvm:main Apr 22, 2025
10 checks passed
@ShatianWang ShatianWang deleted the ModifyFormat branch April 22, 2025 19:48
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Improve profile quality reporting by 1) fixing a format issue for small
binaries, 2) adding new stats for exception handling usage, 3) excluding
selected blocks when computing the CFG flow conservation score.

More specifically for 3), we are excluding blocks that satisfy at least
one of the following characteristics: a) is a landing pad, b) has at
least one landing pad with non-zero execution counts, c) ends with a
recursive call. The reason for a) and b) is because the thrower -->
landing pad edges are not explicitly represented in the CFG. The reason
for c) is because the call-continuation fallthrough edge count is not
important in case of recursive calls.

Modified test `bolt/test/X86/profile-quality-reporting.test`.
Added test `bolt/test/X86/profile-quality-reporting-small-binary.s`.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Improve profile quality reporting by 1) fixing a format issue for small
binaries, 2) adding new stats for exception handling usage, 3) excluding
selected blocks when computing the CFG flow conservation score.

More specifically for 3), we are excluding blocks that satisfy at least
one of the following characteristics: a) is a landing pad, b) has at
least one landing pad with non-zero execution counts, c) ends with a
recursive call. The reason for a) and b) is because the thrower -->
landing pad edges are not explicitly represented in the CFG. The reason
for c) is because the call-continuation fallthrough edge count is not
important in case of recursive calls.

Modified test `bolt/test/X86/profile-quality-reporting.test`.
Added test `bolt/test/X86/profile-quality-reporting-small-binary.s`.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Improve profile quality reporting by 1) fixing a format issue for small
binaries, 2) adding new stats for exception handling usage, 3) excluding
selected blocks when computing the CFG flow conservation score.

More specifically for 3), we are excluding blocks that satisfy at least
one of the following characteristics: a) is a landing pad, b) has at
least one landing pad with non-zero execution counts, c) ends with a
recursive call. The reason for a) and b) is because the thrower -->
landing pad edges are not explicitly represented in the CFG. The reason
for c) is because the call-continuation fallthrough edge count is not
important in case of recursive calls.

Modified test `bolt/test/X86/profile-quality-reporting.test`.
Added test `bolt/test/X86/profile-quality-reporting-small-binary.s`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants