Skip to content

[BOLT] Profile quality stats -- CFG discontinuity #109683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 8, 2024

Conversation

ShatianWang
Copy link
Contributor

@ShatianWang ShatianWang commented Sep 23, 2024

In a perfect profile, each positive-execution-count block in the function’s CFG should be reachable from a positive-execution-count function entry block through a positive-execution-count path. This new pass checks how well the BOLT input profile satisfies this “CFG continuity” property.

More specifically, for each of the hottest 1000 functions, the pass calculates the function’s fraction of basic block execution counts that is “unreachable”. It then reports the 95th percentile of the distribution of the 1000 unreachable fractions in a single BOLT-INFO line. The smaller the reported value is, the better the BOLT profile satisfies the CFG continuity property.

The default value of 1000 above can be changed via the hidden BOLT option -num-functions-for-continuity-check=[N]. If more detailed stats are needed, -v=1 can be added to the BOLT invocation: the hottest N functions will be grouped into 5 equally-sized buckets, from the hottest to the coldest; for each bucket, various summary statistics of the distribution of the fractions and the raw unreachable execution counts will be reported.

@ShatianWang ShatianWang marked this pull request as ready for review September 23, 2024 16:19
@llvmbot llvmbot added the BOLT label Sep 23, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 23, 2024

@llvm/pr-subscribers-bolt

Author: ShatianWang (ShatianWang)

Changes

Add the profile quality check for CFG discontinuity. By default, the option print-continuity-stats is true, and a single line of BOLT-INFO will be printed that contains a single number to summarize the profile CFG discontinuity (closer to 0 the better). If more detailed stats are needed, print-bucketed-stats can be added to the BOLT invocation.


Full diff: https://github.com/llvm/llvm-project/pull/109683.diff

5 Files Affected:

  • (added) bolt/include/bolt/Passes/ContinuityStats.h (+40)
  • (modified) bolt/lib/Passes/CMakeLists.txt (+1)
  • (added) bolt/lib/Passes/ContinuityStats.cpp (+270)
  • (modified) bolt/lib/Rewrite/BinaryPassManager.cpp (+9)
  • (added) bolt/test/X86/cfg-discontinuity-reporting.test (+5)
diff --git a/bolt/include/bolt/Passes/ContinuityStats.h b/bolt/include/bolt/Passes/ContinuityStats.h
new file mode 100644
index 00000000000000..b9edc09cbf55cc
--- /dev/null
+++ b/bolt/include/bolt/Passes/ContinuityStats.h
@@ -0,0 +1,40 @@
+//===- bolt/Passes/ContinuityStats.h - function cfg continuity analysis ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Conduct function CFG continuity analysis.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef BOLT_PASSES_CONTINUITYSTATS_H
+#define BOLT_PASSES_CONTINUITYSTATS_H
+
+#include <vector>
+#include "bolt/Passes/BinaryPasses.h"
+
+namespace llvm {
+
+class raw_ostream;
+
+namespace bolt {
+class BinaryContext;
+
+/// Compute and report to the user the function CFG continuity quality
+class PrintContinuityStats : public BinaryFunctionPass {
+public:
+  explicit PrintContinuityStats(const cl::opt<bool> &PrintPass)
+      : BinaryFunctionPass(PrintPass) {}
+
+  const char *getName() const override { return "continuity-stats"; }
+  bool shouldPrint(const BinaryFunction &) const override { return false; }
+  Error runOnFunctions(BinaryContext &BC) override;
+};
+
+} // namespace bolt
+} // namespace llvm
+
+#endif // BOLT_PASSES_CONTINUITYSTATS_H
diff --git a/bolt/lib/Passes/CMakeLists.txt b/bolt/lib/Passes/CMakeLists.txt
index 407d8b03f73977..1c1273b3d2420d 100644
--- a/bolt/lib/Passes/CMakeLists.txt
+++ b/bolt/lib/Passes/CMakeLists.txt
@@ -26,6 +26,7 @@ add_llvm_library(LLVMBOLTPasses
   PatchEntries.cpp
   PettisAndHansen.cpp
   PLTCall.cpp
+  ContinuityStats.cpp
   RegAnalysis.cpp
   RegReAssign.cpp
   ReorderAlgorithm.cpp
diff --git a/bolt/lib/Passes/ContinuityStats.cpp b/bolt/lib/Passes/ContinuityStats.cpp
new file mode 100644
index 00000000000000..d2ce97949c1c12
--- /dev/null
+++ b/bolt/lib/Passes/ContinuityStats.cpp
@@ -0,0 +1,270 @@
+//===- bolt/Passes/ContinuityStats.cpp - function cfg continuity analysis ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Conduct function CFG continuity analysis.
+//
+//===----------------------------------------------------------------------===//
+
+#include "bolt/Passes/ContinuityStats.h"
+#include "bolt/Core/BinaryBasicBlock.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Utils/CommandLineOpts.h"
+#include "llvm/Support/CommandLine.h"
+#include <queue>
+#include <unordered_map>
+#include <unordered_set>
+
+#define DEBUG_TYPE "bolt-opts"
+
+using namespace llvm;
+using namespace bolt;
+
+namespace opts {
+extern cl::opt<unsigned> Verbosity;
+cl::opt<unsigned> NumTopFunctions(
+    "num-top-functions",
+    cl::desc(
+        "number of hottest functions to print aggregated CFG discontinuity stats of."),
+    cl::init(1000), cl::ZeroOrMore, cl::Hidden, cl::cat(BoltOptCategory));
+cl::opt<bool> PrintBucketedStats(
+    "print-bucketed-stats",
+    cl::desc("print CFG discontinuity stats for the top functions divided into buckets "
+             "based on their execution counts."),
+    cl::Hidden, cl::cat(BoltCategory));
+cl::opt<unsigned>
+    NumFunctionsPerBucket("num-functions-per-bucket",
+                          cl::desc("maximum number of functions per bucket."),
+                          cl::init(500), cl::ZeroOrMore, cl::Hidden,
+                          cl::cat(BoltOptCategory));
+cl::opt<unsigned>
+    MinNumFunctions("min-num-functions",
+                          cl::desc("minimum number of hot functions in the binary to "
+                          "trigger profile CFG continuity check."),
+                          cl::init(5), cl::ZeroOrMore, cl::Hidden,
+                          cl::cat(BoltOptCategory));
+} // namespace opts
+
+namespace {
+using FunctionListType = std::vector<const BinaryFunction *>;
+using function_iterator = FunctionListType::iterator;
+
+template <typename T>
+void printDistribution(raw_ostream &OS, std::vector<T> &values,
+                       bool Fraction = false) {
+  if (values.empty())
+    return;
+  // Sort values from largest to smallest and print the MAX, TOP 1%, 5%, 10%,
+  // 20%, 50%, 80%, MIN. If Fraction is true, then values are printed as
+  // fractions instead of integers.
+  std::sort(values.begin(), values.end());
+
+  auto printLine = [&](std::string Text, double Percent) {
+    int Rank = int(values.size() * (1.0 - Percent / 100));
+    if (Percent == 0)
+      Rank = values.size() - 1;
+    if (Fraction)
+      OS << "  " << Text << std::string(9 - Text.length(), ' ') << ": "
+         << format("%.2lf%%", values[Rank] * 100) << "\n";
+    else
+      OS << "  " << Text << std::string(9 - Text.length(), ' ') << ": "
+         << values[Rank] << "\n";
+  };
+
+  printLine("MAX", 0);
+  int percentages[] = {1, 5, 10, 20, 50, 80};
+  for (size_t i = 0; i < sizeof(percentages) / sizeof(percentages[0]); ++i) {
+    printLine("TOP " + std::to_string(percentages[i]) + "%", percentages[i]);
+  }
+  printLine("MIN", 100);
+}
+
+void printCFGContinuityStats(raw_ostream &OS,
+                               iterator_range<function_iterator> &Functions, 
+                               bool Verbose=false) {
+  // Given a perfect profile, every positive-execution-count BB should be
+  // connected to an entry of the function through a positive-execution-count
+  // directed path in the control flow graph.
+  std::vector<size_t> NumUnreachables;
+  std::vector<size_t> SumECUnreachables;
+  std::vector<double> FractionECUnreachables;
+
+  for (auto it = Functions.begin(); it != Functions.end(); ++it) {
+    const BinaryFunction *Function = *it;
+    if (Function->size() <= 1)
+      continue;
+
+    // Compute the sum of all BB execution counts (ECs).
+    size_t NumPosECBBs = 0;
+    size_t SumAllBBEC = 0;
+    for (const BinaryBasicBlock &BB : *Function) {
+      size_t BBEC = BB.getKnownExecutionCount();
+      NumPosECBBs += BBEC > 0 ? 1 : 0;
+      SumAllBBEC += BBEC;
+    }
+
+    // Perform BFS on subgraph of CFG induced by positive weight edges.
+    // Compute the number of BBs reachable from the entry(s) of the function and
+    // the sum of their execution counts (ECs).
+    std::unordered_map<unsigned, const BinaryBasicBlock *> IndexToBB;
+    std::unordered_set<unsigned> Visited;
+    std::queue<unsigned> Queue;
+    for (const BinaryBasicBlock &BB : *Function) {
+      // Make sure BB.getIndex() is not already in IndexToBB.
+      assert(IndexToBB.find(BB.getIndex()) == IndexToBB.end());
+      IndexToBB[BB.getIndex()] = &BB;
+      if (BB.isEntryPoint() && BB.getKnownExecutionCount() > 0) {
+        Queue.push(BB.getIndex());
+        Visited.insert(BB.getIndex());
+      }
+    }
+    while (!Queue.empty()) {
+      unsigned BBIndex = Queue.front();
+      const BinaryBasicBlock *BB = IndexToBB[BBIndex];
+      Queue.pop();
+      auto SuccBIIter = BB->branch_info_begin();
+      for (BinaryBasicBlock *Succ : BB->successors()) {
+        uint64_t Count = SuccBIIter->Count;
+        if (Count == BinaryBasicBlock::COUNT_NO_PROFILE || Count == 0) {
+          ++SuccBIIter;
+          continue;
+        }
+        if (!Visited.insert(Succ->getIndex()).second) {
+          ++SuccBIIter;
+          continue;
+        }
+        Queue.push(Succ->getIndex());
+        ++SuccBIIter;
+      }
+    }
+
+    size_t NumReachableBBs = Visited.size();
+
+    // Loop through Visited, and sum the corresponding BBs' execution counts (ECs).
+    size_t SumReachableBBEC = 0;
+    for (unsigned BBIndex : Visited) {
+      const BinaryBasicBlock *BB = IndexToBB[BBIndex];
+      SumReachableBBEC += BB->getKnownExecutionCount();
+    }
+
+    size_t NumPosECBBsUnreachableFromEntry = NumPosECBBs - NumReachableBBs;
+    size_t SumUnreachableBBEC = SumAllBBEC - SumReachableBBEC;
+    double FractionECUnreachable = (double)SumUnreachableBBEC / SumAllBBEC;
+
+    if (opts::Verbosity >= 2 && FractionECUnreachable > 0.1 &&
+        SumUnreachableBBEC > 50) {
+      OS << "Non-trivial CFG discontinuity observed in function "
+         << Function->getPrintName() << "\n";
+      LLVM_DEBUG(Function->dump());
+    }
+
+    NumUnreachables.push_back(NumPosECBBsUnreachableFromEntry);
+    SumECUnreachables.push_back(SumUnreachableBBEC);
+    FractionECUnreachables.push_back(FractionECUnreachable);
+  }
+
+  if (!Verbose) {
+    if (FractionECUnreachables.empty()) {
+      OS << "no functions have more than 1 basic block and hence no CFG discontinuity.\n";
+      return;
+    }
+    std::sort(FractionECUnreachables.begin(), FractionECUnreachables.end());
+    int Rank = int(FractionECUnreachables.size() * 0.95);
+    OS << format("the TOP 5%% function CFG discontinuity is %.2lf%%", FractionECUnreachables[Rank] * 100) << "\n";
+    return;
+  }
+  
+  OS << format("Focus on %zu (%.2lf%%) of considered functions that have at "
+               "least 2 basic blocks\n",
+               SumECUnreachables.size(),
+               100.0 * (double)SumECUnreachables.size() /
+                   (std::distance(Functions.begin(), Functions.end())));
+
+  if (!NumUnreachables.empty()) {
+    OS << "- Distribution of NUM(unreachable POS BBs) among all focal "
+          "functions\n";
+    printDistribution(OS, NumUnreachables);
+  }
+  if (!SumECUnreachables.empty()) {
+    OS << "- Distribution of SUM(unreachable POS BBs) among all focal "
+          "functions\n";
+    printDistribution(OS, SumECUnreachables);
+  }
+  if (!FractionECUnreachables.empty()) {
+    OS << "- Distribution of [(SUM(unreachable POS BBs) / SUM(all "
+          "POS BBs))] among all focal functions\n";
+    printDistribution(OS, FractionECUnreachables, /*Fraction=*/true);
+  }
+}
+
+void printAll(BinaryContext &BC,
+              size_t NumFunctionsPerBucket,
+              size_t NumTopFunctions) {
+
+  // Create a list of functions with valid profiles.
+  FunctionListType ValidFunctions;
+  for (const auto &BFI : BC.getBinaryFunctions()) {
+    const BinaryFunction *Function = &BFI.second;
+    if (Function->empty() || !Function->hasValidProfile() ||
+        !Function->isSimple())
+      continue;
+    ValidFunctions.push_back(Function);
+  }
+
+  // Sort the list of functions by execution counts (reverse).
+  llvm::sort(ValidFunctions,
+             [&](const BinaryFunction *A, const BinaryFunction *B) {
+               return A->getKnownExecutionCount() > B->getKnownExecutionCount();
+             });
+
+  size_t RealNumTopFunctions = std::min(NumTopFunctions, ValidFunctions.size());
+  if (RealNumTopFunctions <= opts::MinNumFunctions)
+    return;
+  BC.outs() << format("BOLT-INFO: among the hottest %zu functions ", RealNumTopFunctions);
+  iterator_range<function_iterator> Functions(
+      ValidFunctions.begin(), ValidFunctions.begin() + RealNumTopFunctions);
+  printCFGContinuityStats(BC.outs(), Functions, /*Verbose=*/false);
+
+  // Print more detailed bucketed stats if requested.
+  if (opts::PrintBucketedStats) {
+    size_t PerBucketSize = std::min(NumFunctionsPerBucket, ValidFunctions.size());
+    if (PerBucketSize == 0)
+      return;
+    size_t NumBuckets = RealNumTopFunctions / PerBucketSize +
+                        (RealNumTopFunctions % PerBucketSize != 0);
+    BC.outs() << format("Detailed stats for %zu buckets, each with at most %zu functions:\n",
+                NumBuckets, PerBucketSize);
+    BC.outs() << "For each considered function, identify positive execution-count basic blocks\n"
+       << "(abbr. POS BBs) that are *unreachable* from the function entry through a\n"
+       << "positive-execution-count path.\n";
+
+    // For each bucket, print the CFG continuity stats of the functions in the bucket.
+    for (size_t BucketIndex = 0; BucketIndex < NumBuckets; ++BucketIndex) {
+      const size_t StartIndex = BucketIndex * PerBucketSize;
+      size_t EndIndex = std::min(StartIndex + PerBucketSize, NumTopFunctions);
+      EndIndex = std::min(EndIndex, ValidFunctions.size());
+      iterator_range<function_iterator> Functions(
+          ValidFunctions.begin() + StartIndex, ValidFunctions.begin() + EndIndex);
+      const size_t MaxFunctionExecutionCount =
+          ValidFunctions[StartIndex]->getKnownExecutionCount();
+      const size_t MinFunctionExecutionCount =
+          ValidFunctions[EndIndex - 1]->getKnownExecutionCount();
+      BC.outs() << format("----------------\n|   Bucket %zu:  "
+                  "|\n----------------\nExecution counts of the %zu functions in the bucket: "
+                  "%zu-%zu\n",
+                  BucketIndex + 1, EndIndex - StartIndex,
+                  MinFunctionExecutionCount, MaxFunctionExecutionCount);
+      printCFGContinuityStats(BC.outs(), Functions, /*Verbose=*/true);
+    }
+  }
+}
+} // namespace
+
+Error PrintContinuityStats::runOnFunctions(BinaryContext &BC) {
+  printAll(BC, opts::NumFunctionsPerBucket, opts::NumTopFunctions);
+  return Error::success();
+}
diff --git a/bolt/lib/Rewrite/BinaryPassManager.cpp b/bolt/lib/Rewrite/BinaryPassManager.cpp
index 5dfef0b71cc79f..aad4d15cf2869d 100644
--- a/bolt/lib/Rewrite/BinaryPassManager.cpp
+++ b/bolt/lib/Rewrite/BinaryPassManager.cpp
@@ -11,6 +11,7 @@
 #include "bolt/Passes/Aligner.h"
 #include "bolt/Passes/AllocCombiner.h"
 #include "bolt/Passes/AsmDump.h"
+#include "bolt/Passes/ContinuityStats.h"
 #include "bolt/Passes/CMOVConversion.h"
 #include "bolt/Passes/FixRISCVCallsPass.h"
 #include "bolt/Passes/FixRelaxationPass.h"
@@ -154,6 +155,11 @@ static cl::opt<bool>
                       cl::desc("print profile quality/bias analysis"),
                       cl::cat(BoltCategory));
 
+static cl::opt<bool>
+    PrintContinuityStats("print-continuity-stats",
+                      cl::desc("print profile function CFG continuity stats"),
+                      cl::init(true), cl::cat(BoltCategory));
+
 static cl::opt<bool>
     PrintRegReAssign("print-regreassign",
                      cl::desc("print functions after regreassign pass"),
@@ -373,6 +379,9 @@ Error BinaryFunctionPassManager::runAllPasses(BinaryContext &BC) {
   if (opts::PrintProfileStats)
     Manager.registerPass(std::make_unique<PrintProfileStats>(NeverPrint));
 
+  if (opts::PrintContinuityStats)
+    Manager.registerPass(std::make_unique<PrintContinuityStats>(NeverPrint));
+
   Manager.registerPass(std::make_unique<ValidateInternalCalls>(NeverPrint));
 
   Manager.registerPass(std::make_unique<ValidateMemRefs>(NeverPrint));
diff --git a/bolt/test/X86/cfg-discontinuity-reporting.test b/bolt/test/X86/cfg-discontinuity-reporting.test
new file mode 100644
index 00000000000000..563d55c3013ff8
--- /dev/null
+++ b/bolt/test/X86/cfg-discontinuity-reporting.test
@@ -0,0 +1,5 @@
+## Check profile discontinuity reporting
+RUN: yaml2obj %p/Inputs/blarge_new.yaml &> %t.exe
+RUN: llvm-bolt %t.exe -o %t.out --pa -p %p/Inputs/blarge_new.preagg.txt \
+RUN:   --print-continuity-stats --min-num-functions=1 | FileCheck %s
+CHECK: among the hottest 5 functions the TOP 5% function CFG discontinuity is 100.00%

Copy link

github-actions bot commented Sep 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

static cl::opt<bool>
PrintContinuityStats("print-continuity-stats",
cl::desc("print profile function CFG continuity stats"),
cl::init(true), cl::cat(BoltCategory));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave the pass off by default, in line with other profile analyses/reports available.

Comment on lines 212 to 214
if (Function->empty() || !Function->hasValidProfile() ||
!Function->isSimple())
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align with shouldOptimize?

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's wait for @maksfb review.

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a brief description in the summary of what profile continuity/discontinuity means.

@@ -0,0 +1,42 @@
//===- bolt/Passes/ContinuityStats.h - function cfg continuity analysis ---*-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fix line wrap

@@ -0,0 +1,288 @@
//===- bolt/Passes/ContinuityStats.cpp - function cfg continuity analysis ---*-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}
std::sort(FractionECUnreachables.begin(), FractionECUnreachables.end());
int Rank = int(FractionECUnreachables.size() * 0.95);
OS << format("the TOP 5%% function CFG discontinuity is %.2lf%%",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to capitalize "TOP" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Changed to "top" in the newest commit.

Comment on lines 30 to 52
cl::opt<unsigned>
NumTopFunctions("num-top-functions",
cl::desc("number of hottest functions to print aggregated "
"CFG discontinuity stats of."),
cl::init(1000), cl::ZeroOrMore, cl::Hidden,
cl::cat(BoltOptCategory));
cl::opt<bool>
PrintBucketedStats("print-bucketed-stats",
cl::desc("print CFG discontinuity stats for the top "
"functions divided into buckets "
"based on their execution counts."),
cl::Hidden, cl::cat(BoltCategory));
cl::opt<unsigned>
NumFunctionsPerBucket("num-functions-per-bucket",
cl::desc("maximum number of functions per bucket."),
cl::init(500), cl::ZeroOrMore, cl::Hidden,
cl::cat(BoltOptCategory));
cl::opt<unsigned>
MinNumFunctions("min-num-functions",
cl::desc("minimum number of hot functions in the binary to "
"trigger profile CFG continuity check."),
cl::init(5), cl::ZeroOrMore, cl::Hidden,
cl::cat(BoltOptCategory));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of pass-specific options that are sharing the global option namespace. Option names like "--num-top-functions" and "--min-num-functions" are ambiguous when taken out of the continuity context.

Rather then make these options pass-specific, I suggest we print the data they control under "-v=1". WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all but "-num-top-functions" and changed its name to "-num-functions-for-continuity-check" to make it more descriptive. As suggested, whether or not to print bucketed stats is now controlled by -v.

Comment on lines 158 to 161
static cl::opt<bool> PrintContinuityStats(
"print-continuity-stats",
cl::desc("print profile function CFG continuity stats"),
cl::cat(BoltCategory));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the pass has low overhead and prints a single line by default, I suggest we enable it by default and drop the option. Make sure it doesn't print anything when there is no profile available.

@ShatianWang ShatianWang force-pushed the UpstreamContinuityStats branch from 22b6a21 to 2da1163 Compare October 3, 2024 15:48
Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add const qualifier where applicable?

if (!Verbose) {
std::sort(FractionECUnreachables.begin(), FractionECUnreachables.end());
int Rank = int(FractionECUnreachables.size() * 0.95);
OS << format("BOLT-INFO: among the hottest %zu functions ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, we should print this line under -v too.

& add const qualifiers when applicable
@ShatianWang ShatianWang merged commit 4cab01f into llvm:main Oct 8, 2024
8 checks passed
@ShatianWang ShatianWang deleted the UpstreamContinuityStats branch October 28, 2024 15:13
aaupov added a commit that referenced this pull request Nov 8, 2024
#109683 identified an issue with pre-aggregated profile where a call to
continuation fallthrough edge count is missing (profile discontinuity).

This issue only affects pre-aggregated profile but not perf data since
LBR stack has the necessary information to determine if the trace (fall-
through) starts at call continuation, whereas pre-aggregated fallthrough
lacks this information.

The solution is to look at branch records in pre-aggregated profiles
that correspond to returns and assign counts to call to continuation
fallthrough:
- BranchFrom is in another function or DSO,
- BranchTo may be a call continuation site:
  - not an entry point/landing pad.

Note that we can't directly check if BranchFrom corresponds to a return
instruction if it's in external DSO.

Keep call continuation handling for perf data (`getFallthroughsInTrace`)
[1] as-is due to marginally better performance. The difference is that
return-converted call to continuation fallthrough is slightly more
frequent than other fallthroughs since the former only requires one LBR
address while the latter need two that belong to the profiled binary.
Hence return-converted fallthroughs have larger "weight" which affects
code layout.

[1] `DataAggregator::getFallthroughsInTrace`
https://github.com/llvm/llvm-project/blob/fea18afeed39fe4435d67eee1834f0f34b23013d/bolt/lib/Profile/DataAggregator.cpp#L906-L915

Test Plan: added callcont-fallthru.s

Reviewers: maksfb, ayermolo, ShatianWang, dcci

Reviewed By: maksfb, ShatianWang

Pull Request: #109486
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
llvm#109683 identified an issue with pre-aggregated profile where a call to
continuation fallthrough edge count is missing (profile discontinuity).

This issue only affects pre-aggregated profile but not perf data since
LBR stack has the necessary information to determine if the trace (fall-
through) starts at call continuation, whereas pre-aggregated fallthrough
lacks this information.

The solution is to look at branch records in pre-aggregated profiles
that correspond to returns and assign counts to call to continuation
fallthrough:
- BranchFrom is in another function or DSO,
- BranchTo may be a call continuation site:
  - not an entry point/landing pad.

Note that we can't directly check if BranchFrom corresponds to a return
instruction if it's in external DSO.

Keep call continuation handling for perf data (`getFallthroughsInTrace`)
[1] as-is due to marginally better performance. The difference is that
return-converted call to continuation fallthrough is slightly more
frequent than other fallthroughs since the former only requires one LBR
address while the latter need two that belong to the profiled binary.
Hence return-converted fallthroughs have larger "weight" which affects
code layout.

[1] `DataAggregator::getFallthroughsInTrace`
https://github.com/llvm/llvm-project/blob/fea18afeed39fe4435d67eee1834f0f34b23013d/bolt/lib/Profile/DataAggregator.cpp#L906-L915

Test Plan: added callcont-fallthru.s

Reviewers: maksfb, ayermolo, ShatianWang, dcci

Reviewed By: maksfb, ShatianWang

Pull Request: llvm#109486
ShatianWang added a commit that referenced this pull request Feb 28, 2025
Add two additional profile quality stats for CG (call graph) and CFG
(control flow graph) flow conservations besides the CFG discontinuity
stats introduced in #109683. The two new stats quantify how different
"in-flow" is from "out-flow" in the following cases where they should be
equal. The smaller the reported stats, the better the flow conservations
are.

CG flow conservation: for each function that is not a program entry, the
number of times the function is called according to CG ("in-flow")
should be equal to the number of times the transition from an entry
basic block of the function to another basic block within the function
is recorded ("out-flow").

CFG flow conservation: for each basic block that is not a function entry
or exit, the number of times the transition into this basic block from
another basic block within the function is recorded ("in-flow") should
be equal to the number of times the transition from this basic block to
another basic block within the function is recorded ("out-flow").

Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump
functions / basic blocks with bad flow conservations.
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
Add two additional profile quality stats for CG (call graph) and CFG
(control flow graph) flow conservations besides the CFG discontinuity
stats introduced in llvm#109683. The two new stats quantify how different
"in-flow" is from "out-flow" in the following cases where they should be
equal. The smaller the reported stats, the better the flow conservations
are.

CG flow conservation: for each function that is not a program entry, the
number of times the function is called according to CG ("in-flow")
should be equal to the number of times the transition from an entry
basic block of the function to another basic block within the function
is recorded ("out-flow").

CFG flow conservation: for each basic block that is not a function entry
or exit, the number of times the transition into this basic block from
another basic block within the function is recorded ("in-flow") should
be equal to the number of times the transition from this basic block to
another basic block within the function is recorded ("out-flow").

Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump
functions / basic blocks with bad flow conservations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants