Skip to content

[BOLT] Report flow conservation scores #127954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 28, 2025

Conversation

ShatianWang
Copy link
Contributor

@ShatianWang ShatianWang commented Feb 20, 2025

Add two additional profile quality stats for CG (call graph) and CFG (control flow graph) flow conservations besides the CFG discontinuity stats introduced in #109683. The two new stats quantify how different "in-flow" is from "out-flow" in the following cases where they should be equal. The smaller the reported stats, the better the flow conservations are.

CG flow conservation: for each function that is not a program entry, the number of times the function is called according to CG ("in-flow") should be equal to the number of times the transition from an entry basic block of the function to another basic block within the function is recorded ("out-flow").

CFG flow conservation: for each basic block that is not a function entry or exit, the number of times the transition into this basic block from another basic block within the function is recorded ("in-flow") should be equal to the number of times the transition from this basic block to another basic block within the function is recorded ("out-flow").

Use -v=1 for more detailed bucketed stats, and use -v=2 to dump functions / basic blocks with bad flow conservations.

@ShatianWang ShatianWang marked this pull request as ready for review February 20, 2025 18:21
@llvmbot llvmbot added the BOLT label Feb 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 20, 2025

@llvm/pr-subscribers-bolt

Author: ShatianWang (ShatianWang)

Changes

Add two additional profile quality stats for CG (call graph) and CFG (control flow graph) flow conservations besides the CFG discontinuity stats introduced in #109683. The two new stats quantify how different "in-flow" is from "out-flow" in the following cases where they should be equal. The smaller the reported stats, the better the flow conservations are.

CG flow conservation: for each function that is not a program entry, the number of times the function is called according to CG ("in-flow") should be equal to the number of times the transition from an entry basic block of the function to another basic block within the function is recorded ("out-flow").

CFG flow conservation: for each basic block that is not a function entry or exit, the number of times the transition into this basic block from another basic block within the function is recorded ("in-flow") should be equal to the number of times the transition from this basic block to another basic block within the function is recorded ("out-flow").

Use -v=1 for more detailed bucketed stats, and use -v=2 to dump functions / basic blocks with bad flow conservations.


Patch is 46.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127954.diff

8 Files Affected:

  • (removed) bolt/include/bolt/Passes/ContinuityStats.h (-61)
  • (added) bolt/include/bolt/Passes/ProfileQualityStats.h (+98)
  • (modified) bolt/lib/Passes/CMakeLists.txt (+1-1)
  • (removed) bolt/lib/Passes/ContinuityStats.cpp (-250)
  • (added) bolt/lib/Passes/ProfileQualityStats.cpp (+579)
  • (modified) bolt/lib/Rewrite/BinaryPassManager.cpp (+23-23)
  • (removed) bolt/test/X86/cfg-discontinuity-reporting.test (-4)
  • (added) bolt/test/X86/profile-quality-reporting.test (+6)
diff --git a/bolt/include/bolt/Passes/ContinuityStats.h b/bolt/include/bolt/Passes/ContinuityStats.h
deleted file mode 100644
index bd4d491ad4a55..0000000000000
--- a/bolt/include/bolt/Passes/ContinuityStats.h
+++ /dev/null
@@ -1,61 +0,0 @@
-//===- bolt/Passes/ContinuityStats.h ----------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This pass checks how well the BOLT input profile satisfies the following
-// "CFG continuity" property of a perfect profile:
-//
-//        Each positive-execution-count block in the function’s CFG
-//        should be *reachable* from a positive-execution-count function
-//        entry block through a positive-execution-count path.
-//
-// More specifically, for each of the hottest 1000 functions, the pass
-// calculates the function’s fraction of basic block execution counts
-// that is *unreachable*. It then reports the 95th percentile of the
-// distribution of the 1000 unreachable fractions in a single BOLT-INFO line.
-// The smaller the reported value is, the better the BOLT profile
-// satisfies the CFG continuity property.
-
-// The default value of 1000 above can be changed via the hidden BOLT option
-// `-num-functions-for-continuity-check=[N]`.
-// If more detailed stats are needed, `-v=1` can be used: the hottest N
-// functions will be grouped into 5 equally-sized buckets, from the hottest
-// to the coldest; for each bucket, various summary statistics of the
-// distribution of the unreachable fractions and the raw unreachable execution
-// counts will be reported.
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef BOLT_PASSES_CONTINUITYSTATS_H
-#define BOLT_PASSES_CONTINUITYSTATS_H
-
-#include "bolt/Passes/BinaryPasses.h"
-#include <vector>
-
-namespace llvm {
-
-class raw_ostream;
-
-namespace bolt {
-class BinaryContext;
-
-/// Compute and report to the user the function CFG continuity quality
-class PrintContinuityStats : public BinaryFunctionPass {
-public:
-  explicit PrintContinuityStats(const cl::opt<bool> &PrintPass)
-      : BinaryFunctionPass(PrintPass) {}
-
-  bool shouldOptimize(const BinaryFunction &BF) const override;
-  const char *getName() const override { return "continuity-stats"; }
-  bool shouldPrint(const BinaryFunction &) const override { return false; }
-  Error runOnFunctions(BinaryContext &BC) override;
-};
-
-} // namespace bolt
-} // namespace llvm
-
-#endif // BOLT_PASSES_CONTINUITYSTATS_H
diff --git a/bolt/include/bolt/Passes/ProfileQualityStats.h b/bolt/include/bolt/Passes/ProfileQualityStats.h
new file mode 100644
index 0000000000000..761fd33431a46
--- /dev/null
+++ b/bolt/include/bolt/Passes/ProfileQualityStats.h
@@ -0,0 +1,98 @@
+//===- bolt/Passes/ProfileQualityStats.h ------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass checks the BOLT input profile quality.
+//
+// Check 1: how well the input profile satisfies the following
+// "CFG continuity" property of a perfect profile:
+//
+//        Each positive-execution-count block in the function’s CFG
+//        is *reachable* from a positive-execution-count function
+//        entry block through a positive-execution-count path.
+//
+// More specifically, for each of the hottest 1000 functions, the pass
+// calculates the function’s fraction of basic block execution counts
+// that is *unreachable*. It then reports the 95th percentile of the
+// distribution of the 1000 unreachable fractions in a single BOLT-INFO line.
+// The smaller the reported value is, the better the BOLT profile
+// satisfies the CFG continuity property.
+//
+// Check 2: how well the input profile satisfies the "call graph flow
+// conservation" property of a perfect profile:
+//
+//        For each function that is not a program entry, the number of times the
+//        function is called is equal to the net CFG outflow of the
+//        function's entry block(s).
+//
+// More specifically, for each of the hottest 1000 functions, the pass obtains
+// A = number of times the function is called, B = the function's entry blocks'
+// inflow, C = the function's entry blocks' outflow, where B and C are computed
+// using the function's weighted CFG. It then computes gap = 1 - MIN(A,C-B) /
+// MAX(A, C-B). The pass reports the 95th percentile of the distribution of the
+// 1000 gaps in a single BOLT-INFO line. The smaller the reported value is, the
+// better the BOLT profile satisfies the call graph flow conservation property.
+//
+// Check 3: how well the input profile satisfies the "function CFG flow
+// conservation property" of a perfect profile:
+//
+//       A non-entry non-exit basic block's inflow is equal to its outflow.
+//
+// More specifically, for each of the hottest 1000 functions, the pass loops
+// over its basic blocks that are non-entry and non-exit, and for each block
+// obtains a block gap = 1 - MIN(block inflow, block outflow, block call count
+// if any) / MAX(block inflow, block outflow, block call count if any). It then
+// aggregates the block gaps into 2 values for the function: "weighted" is the
+// weighted average of the block conservation gaps, where the weights depend on
+// each block's execution count and instruction count; "worst" is the worst
+// (biggest) block gap acorss all basic blocks in the function with an execution
+// count of > 500. The pass then reports the 95th percentile of the weighted and
+// worst values of the 1000 functions in a single BOLT-INFO line. The smaller
+// the reported values are, the better the BOLT profile satisfies the function
+// CFG flow conservation property.
+//
+// The default value of 1000 above can be changed via the hidden BOLT option
+// `-num-functions-for-profile-quality-check=[N]`.
+// The default reporting of the 95th percentile can be changed via the hidden
+// BOLT option `-percentile-for-profile-quality-check=[M]`.
+//
+// If more detailed stats are needed, `-v=1` can be used: the hottest N
+// functions will be grouped into 5 equally-sized buckets, from the hottest
+// to the coldest; for each bucket, various summary statistics of the
+// profile quality will be reported.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef BOLT_PASSES_PROFILEQUALITYSTATS_H
+#define BOLT_PASSES_PROFILEQUALITYSTATS_H
+
+#include "bolt/Passes/BinaryPasses.h"
+#include <vector>
+
+namespace llvm {
+
+class raw_ostream;
+
+namespace bolt {
+class BinaryContext;
+
+/// Compute and report to the user the profile quality
+class PrintProfileQualityStats : public BinaryFunctionPass {
+public:
+  explicit PrintProfileQualityStats(const cl::opt<bool> &PrintPass)
+      : BinaryFunctionPass(PrintPass) {}
+
+  bool shouldOptimize(const BinaryFunction &BF) const override;
+  const char *getName() const override { return "profile-quality-stats"; }
+  bool shouldPrint(const BinaryFunction &) const override { return false; }
+  Error runOnFunctions(BinaryContext &BC) override;
+};
+
+} // namespace bolt
+} // namespace llvm
+
+#endif // BOLT_PASSES_PROFILEQUALITYSTATS_H
diff --git a/bolt/lib/Passes/CMakeLists.txt b/bolt/lib/Passes/CMakeLists.txt
index 1e3289484a5ba..692c82f2a5336 100644
--- a/bolt/lib/Passes/CMakeLists.txt
+++ b/bolt/lib/Passes/CMakeLists.txt
@@ -26,7 +26,7 @@ add_llvm_library(LLVMBOLTPasses
   PatchEntries.cpp
   PettisAndHansen.cpp
   PLTCall.cpp
-  ContinuityStats.cpp
+  ProfileQualityStats.cpp
   RegAnalysis.cpp
   RegReAssign.cpp
   ReorderAlgorithm.cpp
diff --git a/bolt/lib/Passes/ContinuityStats.cpp b/bolt/lib/Passes/ContinuityStats.cpp
deleted file mode 100644
index b32365b59065d..0000000000000
--- a/bolt/lib/Passes/ContinuityStats.cpp
+++ /dev/null
@@ -1,250 +0,0 @@
-//===- bolt/Passes/ContinuityStats.cpp --------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file implements the continuity stats calculation pass.
-//
-//===----------------------------------------------------------------------===//
-
-#include "bolt/Passes/ContinuityStats.h"
-#include "bolt/Core/BinaryBasicBlock.h"
-#include "bolt/Core/BinaryFunction.h"
-#include "bolt/Utils/CommandLineOpts.h"
-#include "llvm/Support/CommandLine.h"
-#include <queue>
-#include <unordered_map>
-#include <unordered_set>
-
-#define DEBUG_TYPE "bolt-opts"
-
-using namespace llvm;
-using namespace bolt;
-
-namespace opts {
-extern cl::opt<unsigned> Verbosity;
-cl::opt<unsigned> NumFunctionsForContinuityCheck(
-    "num-functions-for-continuity-check",
-    cl::desc("number of hottest functions to print aggregated "
-             "CFG discontinuity stats of."),
-    cl::init(1000), cl::ZeroOrMore, cl::Hidden, cl::cat(BoltOptCategory));
-} // namespace opts
-
-namespace {
-using FunctionListType = std::vector<const BinaryFunction *>;
-using function_iterator = FunctionListType::iterator;
-
-template <typename T>
-void printDistribution(raw_ostream &OS, std::vector<T> &values,
-                       bool Fraction = false) {
-  if (values.empty())
-    return;
-  // Sort values from largest to smallest and print the MAX, TOP 1%, 5%, 10%,
-  // 20%, 50%, 80%, MIN. If Fraction is true, then values are printed as
-  // fractions instead of integers.
-  std::sort(values.begin(), values.end());
-
-  auto printLine = [&](std::string Text, double Percent) {
-    int Rank = int(values.size() * (1.0 - Percent / 100));
-    if (Percent == 0)
-      Rank = values.size() - 1;
-    if (Fraction)
-      OS << "  " << Text << std::string(9 - Text.length(), ' ') << ": "
-         << format("%.2lf%%", values[Rank] * 100) << "\n";
-    else
-      OS << "  " << Text << std::string(9 - Text.length(), ' ') << ": "
-         << values[Rank] << "\n";
-  };
-
-  printLine("MAX", 0);
-  const int percentages[] = {1, 5, 10, 20, 50, 80};
-  for (size_t i = 0; i < sizeof(percentages) / sizeof(percentages[0]); ++i) {
-    printLine("TOP " + std::to_string(percentages[i]) + "%", percentages[i]);
-  }
-  printLine("MIN", 100);
-}
-
-void printCFGContinuityStats(raw_ostream &OS,
-                             iterator_range<function_iterator> &Functions) {
-  // Given a perfect profile, every positive-execution-count BB should be
-  // connected to an entry of the function through a positive-execution-count
-  // directed path in the control flow graph.
-  std::vector<size_t> NumUnreachables;
-  std::vector<size_t> SumECUnreachables;
-  std::vector<double> FractionECUnreachables;
-
-  for (auto it = Functions.begin(); it != Functions.end(); ++it) {
-    const BinaryFunction *Function = *it;
-    if (Function->size() <= 1)
-      continue;
-
-    // Compute the sum of all BB execution counts (ECs).
-    size_t NumPosECBBs = 0;
-    size_t SumAllBBEC = 0;
-    for (const BinaryBasicBlock &BB : *Function) {
-      const size_t BBEC = BB.getKnownExecutionCount();
-      NumPosECBBs += BBEC > 0 ? 1 : 0;
-      SumAllBBEC += BBEC;
-    }
-
-    // Perform BFS on subgraph of CFG induced by positive weight edges.
-    // Compute the number of BBs reachable from the entry(s) of the function and
-    // the sum of their execution counts (ECs).
-    std::unordered_map<unsigned, const BinaryBasicBlock *> IndexToBB;
-    std::unordered_set<unsigned> Visited;
-    std::queue<unsigned> Queue;
-    for (const BinaryBasicBlock &BB : *Function) {
-      // Make sure BB.getIndex() is not already in IndexToBB.
-      assert(IndexToBB.find(BB.getIndex()) == IndexToBB.end());
-      IndexToBB[BB.getIndex()] = &BB;
-      if (BB.isEntryPoint() && BB.getKnownExecutionCount() > 0) {
-        Queue.push(BB.getIndex());
-        Visited.insert(BB.getIndex());
-      }
-    }
-    while (!Queue.empty()) {
-      const unsigned BBIndex = Queue.front();
-      const BinaryBasicBlock *BB = IndexToBB[BBIndex];
-      Queue.pop();
-      auto SuccBIIter = BB->branch_info_begin();
-      for (const BinaryBasicBlock *Succ : BB->successors()) {
-        const uint64_t Count = SuccBIIter->Count;
-        if (Count == BinaryBasicBlock::COUNT_NO_PROFILE || Count == 0) {
-          ++SuccBIIter;
-          continue;
-        }
-        if (!Visited.insert(Succ->getIndex()).second) {
-          ++SuccBIIter;
-          continue;
-        }
-        Queue.push(Succ->getIndex());
-        ++SuccBIIter;
-      }
-    }
-
-    const size_t NumReachableBBs = Visited.size();
-
-    // Loop through Visited, and sum the corresponding BBs' execution counts
-    // (ECs).
-    size_t SumReachableBBEC = 0;
-    for (const unsigned BBIndex : Visited) {
-      const BinaryBasicBlock *BB = IndexToBB[BBIndex];
-      SumReachableBBEC += BB->getKnownExecutionCount();
-    }
-
-    const size_t NumPosECBBsUnreachableFromEntry =
-        NumPosECBBs - NumReachableBBs;
-    const size_t SumUnreachableBBEC = SumAllBBEC - SumReachableBBEC;
-    const double FractionECUnreachable =
-        (double)SumUnreachableBBEC / SumAllBBEC;
-
-    if (opts::Verbosity >= 2 && FractionECUnreachable >= 0.05) {
-      OS << "Non-trivial CFG discontinuity observed in function "
-         << Function->getPrintName() << "\n";
-      LLVM_DEBUG(Function->dump());
-    }
-
-    NumUnreachables.push_back(NumPosECBBsUnreachableFromEntry);
-    SumECUnreachables.push_back(SumUnreachableBBEC);
-    FractionECUnreachables.push_back(FractionECUnreachable);
-  }
-
-  if (FractionECUnreachables.empty())
-    return;
-
-  std::sort(FractionECUnreachables.begin(), FractionECUnreachables.end());
-  const int Rank = int(FractionECUnreachables.size() * 0.95);
-  OS << format("top 5%% function CFG discontinuity is %.2lf%%\n",
-               FractionECUnreachables[Rank] * 100);
-
-  if (opts::Verbosity >= 1) {
-    OS << "abbreviations: EC = execution count, POS BBs = positive EC BBs\n"
-       << "distribution of NUM(unreachable POS BBs) among all focal "
-          "functions\n";
-    printDistribution(OS, NumUnreachables);
-
-    OS << "distribution of SUM_EC(unreachable POS BBs) among all focal "
-          "functions\n";
-    printDistribution(OS, SumECUnreachables);
-
-    OS << "distribution of [(SUM_EC(unreachable POS BBs) / SUM_EC(all "
-          "POS BBs))] among all focal functions\n";
-    printDistribution(OS, FractionECUnreachables, /*Fraction=*/true);
-  }
-}
-
-void printAll(BinaryContext &BC, FunctionListType &ValidFunctions,
-              size_t NumTopFunctions) {
-  // Sort the list of functions by execution counts (reverse).
-  llvm::sort(ValidFunctions,
-             [&](const BinaryFunction *A, const BinaryFunction *B) {
-               return A->getKnownExecutionCount() > B->getKnownExecutionCount();
-             });
-
-  const size_t RealNumTopFunctions =
-      std::min(NumTopFunctions, ValidFunctions.size());
-
-  iterator_range<function_iterator> Functions(
-      ValidFunctions.begin(), ValidFunctions.begin() + RealNumTopFunctions);
-
-  BC.outs() << format("BOLT-INFO: among the hottest %zu functions ",
-                      RealNumTopFunctions);
-  printCFGContinuityStats(BC.outs(), Functions);
-
-  // Print more detailed bucketed stats if requested.
-  if (opts::Verbosity >= 1 && RealNumTopFunctions >= 5) {
-    const size_t PerBucketSize = RealNumTopFunctions / 5;
-    BC.outs() << format(
-        "Detailed stats for 5 buckets, each with  %zu functions:\n",
-        PerBucketSize);
-
-    // For each bucket, print the CFG continuity stats of the functions in the
-    // bucket.
-    for (size_t BucketIndex = 0; BucketIndex < 5; ++BucketIndex) {
-      const size_t StartIndex = BucketIndex * PerBucketSize;
-      const size_t EndIndex = StartIndex + PerBucketSize;
-      iterator_range<function_iterator> Functions(
-          ValidFunctions.begin() + StartIndex,
-          ValidFunctions.begin() + EndIndex);
-      const size_t MaxFunctionExecutionCount =
-          ValidFunctions[StartIndex]->getKnownExecutionCount();
-      const size_t MinFunctionExecutionCount =
-          ValidFunctions[EndIndex - 1]->getKnownExecutionCount();
-      BC.outs() << format("----------------\n|   Bucket %zu:  "
-                          "|\n----------------\n",
-                          BucketIndex + 1)
-                << format(
-                       "execution counts of the %zu functions in the bucket: "
-                       "%zu-%zu\n",
-                       EndIndex - StartIndex, MinFunctionExecutionCount,
-                       MaxFunctionExecutionCount);
-      printCFGContinuityStats(BC.outs(), Functions);
-    }
-  }
-}
-} // namespace
-
-bool PrintContinuityStats::shouldOptimize(const BinaryFunction &BF) const {
-  if (BF.empty() || !BF.hasValidProfile())
-    return false;
-
-  return BinaryFunctionPass::shouldOptimize(BF);
-}
-
-Error PrintContinuityStats::runOnFunctions(BinaryContext &BC) {
-  // Create a list of functions with valid profiles.
-  FunctionListType ValidFunctions;
-  for (const auto &BFI : BC.getBinaryFunctions()) {
-    const BinaryFunction *Function = &BFI.second;
-    if (PrintContinuityStats::shouldOptimize(*Function))
-      ValidFunctions.push_back(Function);
-  }
-  if (ValidFunctions.empty() || opts::NumFunctionsForContinuityCheck == 0)
-    return Error::success();
-
-  printAll(BC, ValidFunctions, opts::NumFunctionsForContinuityCheck);
-  return Error::success();
-}
diff --git a/bolt/lib/Passes/ProfileQualityStats.cpp b/bolt/lib/Passes/ProfileQualityStats.cpp
new file mode 100644
index 0000000000000..a42b4dc27947a
--- /dev/null
+++ b/bolt/lib/Passes/ProfileQualityStats.cpp
@@ -0,0 +1,579 @@
+//===- bolt/Passes/ProfileQualityStats.cpp ----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the profile quality stats calculation pass.
+//
+//===----------------------------------------------------------------------===//
+
+#include "bolt/Passes/ProfileQualityStats.h"
+#include "bolt/Core/BinaryBasicBlock.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Utils/CommandLineOpts.h"
+#include "llvm/Support/CommandLine.h"
+#include <queue>
+#include <unordered_map>
+#include <unordered_set>
+
+#define DEBUG_TYPE "bolt-opts"
+
+using namespace llvm;
+using namespace bolt;
+
+namespace opts {
+extern cl::opt<unsigned> Verbosity;
+cl::opt<unsigned> NumFunctionsForProfileQualityCheck(
+    "num-functions-for-profile-quality-check",
+    cl::desc("number of hottest functions to print aggregated "
+             "profile quality stats of."),
+    cl::init(1000), cl::ZeroOrMore, cl::Hidden, cl::cat(BoltOptCategory));
+cl::opt<unsigned> PercentileForProfileQualityCheck(
+    "percentile-for-profile-quality-check",
+    cl::desc("Percentile of profile quality distributions over hottest "
+             "functions to display."),
+    cl::init(95), cl::ZeroOrMore, cl::Hidden, cl::cat(BoltOptCategory));
+} // namespace opts
+
+namespace {
+using FunctionListType = std::vector<const BinaryFunction *>;
+using function_iterator = FunctionListType::iterator;
+
+// BB index -> flow count
+using FlowMapTy = std::unordered_map<unsigned, uint64_t>;
+// Function number -> FlowMapTy
+using TotalFlowMapTy = std::unordered_map<uint64_t, FlowMapTy>;
+// Function number -> flow count
+using FunctionFlowMapTy = std::unordered_map<uint64_t, uint64_t>;
+struct FlowInfo {
+  TotalFlowMapTy TotalIncomingMaps;
+  TotalFlowMapTy TotalOutgoingMaps;
+  TotalFlowMapTy TotalMaxCountMaps;
+  TotalFlowMapTy TotalMinCountMaps;
+  FunctionFlowMapTy CallGraphIncomingMap;
+};
+
+template <typename T>
+void printDistribution(r...
[truncated]

Comment on lines 97 to 98
for (auto it = Functions.begin(); it != Functions.end(); ++it) {
const BinaryFunction *Function = *it;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
for (auto it = Functions.begin(); it != Functions.end(); ++it) {
const BinaryFunction *Function = *it;
for (const BinaryFunction *Function : Functions) {

size_t SumAllBBEC = 0;
for (const BinaryBasicBlock &BB : *Function) {
const size_t BBEC = BB.getKnownExecutionCount();
NumPosECBBs += BBEC > 0 ? 1 : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
NumPosECBBs += BBEC > 0 ? 1 : 0;
NumPosECBBs += !!BBEC;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice trick to learn :)

// Make sure BB.getIndex() is not already in IndexToBB.
assert(IndexToBB.find(BB.getIndex()) == IndexToBB.end());
IndexToBB[BB.getIndex()] = &BB;
if (BB.isEntryPoint() && BB.getKnownExecutionCount() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Function->forEachEntry

// Perform BFS on subgraph of CFG induced by positive weight edges.
// Compute the number of BBs reachable from the entry(s) of the function and
// the sum of their execution counts (ECs).
std::unordered_map<unsigned, const BinaryBasicBlock *> IndexToBB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BF.getLayout() and Layout.getBlock()

Comment on lines 284 to 285
for (auto it = Functions.begin(); it != Functions.end(); ++it) {
const BinaryFunction *Function = *it;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

CallInfoTy Counts;
const MCSymbol *DstSym = BC.MIB->getTargetSymbol(Inst);

// If this is an indirect call use perf data directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit misleading to say we use perf data directly as we use attached annotation. I would drop the comment.

Comment on lines 473 to 477
for (const IndirectCallProfile &CSI : Function->getAllCallSites()) {
if (!CSI.Symbol)
continue;
recordCall(nullptr, CSI.Symbol, CSI.Count);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
for (const IndirectCallProfile &CSI : Function->getAllCallSites()) {
if (!CSI.Symbol)
continue;
recordCall(nullptr, CSI.Symbol, CSI.Count);
}
for (const IndirectCallProfile &CSI : Function->getAllCallSites())
if (CSI.Symbol)
recordCall(nullptr, CSI.Symbol, CSI.Count);

Comment on lines 481 to 492
for (BinaryBasicBlock &BB : *Function) {
for (MCInst &Inst : BB) {
if (!BC.MIB->isCall(Inst))
continue;
// Find call instructions and extract target symbols from each
// one.
const CallInfoTy CallInfo = getCallInfo(&BB, Inst);
for (const TargetDesc &CI : CallInfo) {
recordCall(&BB, CI.first, CI.second);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
for (BinaryBasicBlock &BB : *Function) {
for (MCInst &Inst : BB) {
if (!BC.MIB->isCall(Inst))
continue;
// Find call instructions and extract target symbols from each
// one.
const CallInfoTy CallInfo = getCallInfo(&BB, Inst);
for (const TargetDesc &CI : CallInfo) {
recordCall(&BB, CI.first, CI.second);
}
}
}
for (const BinaryBasicBlock &BB : *Function)
for (const MCInst &Inst : BB)
if (BC.MIB->isCall(Inst))
// Find call instructions and extract target symbols from each
// one.
for (const TargetDesc &CI : getCallInfo(&BB, Inst))
recordCall(&BB, CI.first, CI.second);

@maksfb maksfb changed the title [BOLT] Flow conservation scores [BOLT] Report flow conservation scores Feb 24, 2025
if (opts::Verbosity >= 2 && FractionECUnreachable >= 0.05) {
OS << "Non-trivial CFG discontinuity observed in function "
<< Function->getPrintName() << "\n";
LLVM_DEBUG(Function->dump());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using LLVM_DEBUG(), I suggest bumping the verbosity level. I.e. if (opts::Verbosity >= 3)....

Comment on lines 5 to 6
CHECK: among the hottest 5 functions top 5% call graph flow conservation gap is 60.00%
CHECK: among the hottest 5 functions top 5% CFG flow conservation gap is 45.53% (weighted) and 96.87% (worst)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShatianWang, does it make sense to group stats? I.e. add stats after a header that says profile quality metrics for hottest/top N functions....

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the effort. Left a few more comments, but otherwise looks good.

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, with a couple of minor nits and the comment about whether or not to include stale functions. Please address before landing.

TotalFlowMapTy &TotalOutgoingMaps = TotalFlowMap.TotalOutgoingMaps;
for (const auto &BFI : BC.getBinaryFunctions()) {
const BinaryFunction *Function = &BFI.second;
if (Function->empty() || !Function->hasValidProfile())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards still including functions with stale profiles here

Then you would need to drop !Function->hasValidProfile() condition here

@ShatianWang ShatianWang merged commit 7e33beb into llvm:main Feb 28, 2025
10 checks passed
@ShatianWang ShatianWang deleted the ProfileQualityStatsPR branch February 28, 2025 16:07
cheezeburglar pushed a commit to cheezeburglar/llvm-project that referenced this pull request Feb 28, 2025
Add two additional profile quality stats for CG (call graph) and CFG
(control flow graph) flow conservations besides the CFG discontinuity
stats introduced in llvm#109683. The two new stats quantify how different
"in-flow" is from "out-flow" in the following cases where they should be
equal. The smaller the reported stats, the better the flow conservations
are.

CG flow conservation: for each function that is not a program entry, the
number of times the function is called according to CG ("in-flow")
should be equal to the number of times the transition from an entry
basic block of the function to another basic block within the function
is recorded ("out-flow").

CFG flow conservation: for each basic block that is not a function entry
or exit, the number of times the transition into this basic block from
another basic block within the function is recorded ("in-flow") should
be equal to the number of times the transition from this basic block to
another basic block within the function is recorded ("out-flow").

Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump
functions / basic blocks with bad flow conservations.
@ShatianWang ShatianWang restored the ProfileQualityStatsPR branch April 3, 2025 14:54
@ShatianWang ShatianWang deleted the ProfileQualityStatsPR branch April 3, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants