Skip to content

Introduce UnpredictableProfileLoader for PMU branch-miss profiles #99027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tcreech-intel
Copy link
Contributor

@tcreech-intel tcreech-intel commented Jul 16, 2024

This pass reads IP-based profiles of branch-miss PMU events and uses them to add !unpredictable metadata. This can be thought of as automatically adding __builtin_unpredictable() hints on branch conditions based on PMU feedback.

On Linux, such a profile may be created with something like:

perf record -b -e branch-misses:uppp ...
llvm-profgen --leading-ip-only --perfdata perf.data ...

(See also #99026.)

This branch mispredict profile should be accompanied by an SPGO execution frequency profile.

Branch mispredict profiles are a first step toward more general HWPGO1, allowing feedback from a wider range of hardware events.

Footnotes

  1. https://llvm.org/devmtg/2024-04/slides/TechnicalTalks/Xiao-EnablingHW-BasedPGO.pdf

This pass reads IP-based profiles of branch-miss PMU events and uses
them to add !unpredictable metadata. This can be thought of as
automatically adding __builtin_unpredictable() hints on branch
conditions based on PMU feedback.

On Linux, such a profile may be created with something like:

    perf record -b -e branch-misses:uppp ...
    llvm-profgen --leading-ip-only --perfdata perf.data ...

This branch mispredict profile should be accompanied by an SPGO
execution frequency profile.
@tcreech-intel tcreech-intel marked this pull request as ready for review July 16, 2024 13:06
@llvmbot
Copy link
Member

llvmbot commented Jul 16, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Tim Creech (tcreech-intel)

Changes

This pass reads IP-based profiles of branch-miss PMU events and uses them to add !unpredictable metadata. This can be thought of as automatically adding __builtin_unpredictable() hints on branch conditions based on PMU feedback.

On Linux, such a profile may be created with something like:

perf record -b -e branch-misses:uppp ...
llvm-profgen --leading-ip-only --perfdata perf.data ...

(See also #99026.)

This branch mispredict profile should be accompanied by an SPGO execution frequency profile.


Patch is 38.67 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/99027.diff

17 Files Affected:

  • (added) llvm/include/llvm/Transforms/IPO/UnpredictableProfileLoader.h (+36)
  • (modified) llvm/lib/Passes/PassBuilder.cpp (+1)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+4)
  • (modified) llvm/lib/Passes/PassRegistry.def (+1)
  • (modified) llvm/lib/Transforms/IPO/CMakeLists.txt (+1)
  • (added) llvm/lib/Transforms/IPO/UnpredictableProfileLoader.cpp (+220)
  • (modified) llvm/test/Other/new-pm-pgo.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+1)
  • (added) llvm/test/Transforms/UnpredictableProfileLoader/Inputs/frequency.prof (+3)
  • (added) llvm/test/Transforms/UnpredictableProfileLoader/Inputs/inline.freq.prof (+3)
  • (added) llvm/test/Transforms/UnpredictableProfileLoader/Inputs/inline.misp.prof (+3)
  • (added) llvm/test/Transforms/UnpredictableProfileLoader/Inputs/mispredict.prof (+4)
  • (added) llvm/test/Transforms/UnpredictableProfileLoader/inlined.ll (+113)
  • (added) llvm/test/Transforms/UnpredictableProfileLoader/unpredictable_branch.ll (+87)
  • (added) llvm/test/Transforms/UnpredictableProfileLoader/unpredictable_select.ll (+75)
  • (added) llvm/test/Transforms/UnpredictableProfileLoader/unpredictable_switch.ll (+87)
diff --git a/llvm/include/llvm/Transforms/IPO/UnpredictableProfileLoader.h b/llvm/include/llvm/Transforms/IPO/UnpredictableProfileLoader.h
new file mode 100644
index 0000000000000..703f66bd6706a
--- /dev/null
+++ b/llvm/include/llvm/Transforms/IPO/UnpredictableProfileLoader.h
@@ -0,0 +1,36 @@
+//===-- UnpredictableProfileLoader.h - Unpredictable Profile Loader -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_IPO_UNPREDICTABLEPROFILELOADER_H
+#define LLVM_TRANSFORMS_IPO_UNPREDICTABLEPROFILELOADER_H
+
+#include "llvm/IR/PassManager.h"
+#include "llvm/ProfileData/SampleProfReader.h"
+
+namespace llvm {
+
+class Module;
+
+struct UnpredictableProfileLoaderPass
+    : PassInfoMixin<UnpredictableProfileLoaderPass> {
+  UnpredictableProfileLoaderPass(StringRef FrequencyProfileFile);
+  UnpredictableProfileLoaderPass();
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
+  std::unique_ptr<SampleProfileReader> FreqReader, MispReader;
+  bool loadSampleProfile(Module &M);
+  bool addUpredictableMetadata(Module &F);
+  bool addUpredictableMetadata(Function &F);
+  ErrorOr<double> getMispredictRatio(const FunctionSamples *FreqSamples,
+                                     const FunctionSamples *MispSamples,
+                                     const Instruction *I);
+  const std::string FrequencyProfileFile;
+};
+
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_IPO_UNPREDICTABLEPROFILELOADER_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 929690c2c74d6..dc055ee827d17 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -176,6 +176,7 @@
 #include "llvm/Transforms/IPO/StripDeadPrototypes.h"
 #include "llvm/Transforms/IPO/StripSymbols.h"
 #include "llvm/Transforms/IPO/SyntheticCountsPropagation.h"
+#include "llvm/Transforms/IPO/UnpredictableProfileLoader.h"
 #include "llvm/Transforms/IPO/WholeProgramDevirt.h"
 #include "llvm/Transforms/InstCombine/InstCombine.h"
 #include "llvm/Transforms/Instrumentation.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 4fd5ee1946bb7..50cf9a1c74c9d 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -67,6 +67,7 @@
 #include "llvm/Transforms/IPO/SampleProfile.h"
 #include "llvm/Transforms/IPO/SampleProfileProbe.h"
 #include "llvm/Transforms/IPO/SyntheticCountsPropagation.h"
+#include "llvm/Transforms/IPO/UnpredictableProfileLoader.h"
 #include "llvm/Transforms/IPO/WholeProgramDevirt.h"
 #include "llvm/Transforms/InstCombine/InstCombine.h"
 #include "llvm/Transforms/Instrumentation/CGProfile.h"
@@ -1092,6 +1093,9 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
     // Cache ProfileSummaryAnalysis once to avoid the potential need to insert
     // RequireAnalysisPass for PSI before subsequent non-module passes.
     MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());
+    // Run after inlining decisions made by SampleProfileLoader. This can apply
+    // mispredict metadata to specific inlined callees.
+    MPM.addPass(UnpredictableProfileLoaderPass(PGOOpt->ProfileFile));
     // Do not invoke ICP in the LTOPrelink phase as it makes it hard
     // for the profile annotation to be accurate in the LTO backend.
     if (!isLTOPreLink(Phase))
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 3b92823cd283b..6f6252932ebc5 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -140,6 +140,7 @@ MODULE_PASS("strip-nonlinetable-debuginfo", StripNonLineTableDebugInfoPass())
 MODULE_PASS("synthetic-counts-propagation", SyntheticCountsPropagation())
 MODULE_PASS("trigger-crash-module", TriggerCrashModulePass())
 MODULE_PASS("trigger-verifier-error", TriggerVerifierErrorPass())
+MODULE_PASS("unpredictable-profile-loader", UnpredictableProfileLoaderPass())
 MODULE_PASS("tsan-module", ModuleThreadSanitizerPass())
 MODULE_PASS("verify", VerifierPass())
 MODULE_PASS("view-callgraph", CallGraphViewerPass())
diff --git a/llvm/lib/Transforms/IPO/CMakeLists.txt b/llvm/lib/Transforms/IPO/CMakeLists.txt
index 92a9697720efd..4d09d0a70e13f 100644
--- a/llvm/lib/Transforms/IPO/CMakeLists.txt
+++ b/llvm/lib/Transforms/IPO/CMakeLists.txt
@@ -43,6 +43,7 @@ add_llvm_component_library(LLVMipo
   StripSymbols.cpp
   SyntheticCountsPropagation.cpp
   ThinLTOBitcodeWriter.cpp
+  UnpredictableProfileLoader.cpp
   WholeProgramDevirt.cpp
 
   ADDITIONAL_HEADER_DIRS
diff --git a/llvm/lib/Transforms/IPO/UnpredictableProfileLoader.cpp b/llvm/lib/Transforms/IPO/UnpredictableProfileLoader.cpp
new file mode 100644
index 0000000000000..15994232fc10c
--- /dev/null
+++ b/llvm/lib/Transforms/IPO/UnpredictableProfileLoader.cpp
@@ -0,0 +1,220 @@
+//=== UnpredictableProfileLoader.cpp - Unpredictable Profile Loader -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass reads a sample profile containing mispredict counts and a sample
+// profile containing execution counts and computes branch mispredict ratios for
+// each conditional instruction. If a sufficiently high mispredict ratio is
+// found !unpredictable metadata is added.
+//
+// Note that this requires that the mispredict and frequency profiles have
+// comparable magnitudes.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/IPO/UnpredictableProfileLoader.h"
+#include "llvm/IR/DebugInfoMetadata.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Module.h"
+#include "llvm/ProfileData/SampleProf.h"
+#include "llvm/ProfileData/SampleProfReader.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Transforms/IPO.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "unpredictable-profile-loader"
+
+static cl::opt<std::string> UnpredictableHintsFile(
+    "unpredictable-hints-file",
+    cl::desc("Path to the unpredictability hints profile"), cl::Hidden);
+
+// Typically this file will be provided via PGOOpt. This option is provided
+// primarily for debugging and testing.
+static cl::opt<std::string>
+    FrequencyProfileOption("unpredictable-hints-frequency-profile",
+                           cl::desc("Path to an execution frequency profile to "
+                                    "use as a baseline for unpredictability"),
+                           cl::Hidden);
+
+// This determines the minimum apparent mispredict ratio which should earn a
+// mispredict metadata annotation.
+static cl::opt<double> MinimumRatio(
+    "unpredictable-hints-min-ratio",
+    cl::desc(
+        "Absolute minimum branch miss ratio to apply MD_unpredictable from"),
+    cl::init(0.2), cl::Hidden);
+
+// This option is useful for dealing with two different sampling frequencies.
+static cl::opt<double>
+    RatioFactor("unpredictable-hints-factor",
+                cl::desc("Multiply all ratios by this factor"), cl::init(1.0),
+                cl::ReallyHidden);
+
+// Lookup samples for an Instruction's corresponding location in a
+// FunctionSamples profile. The count returned is directly from the profile
+// representing the number of samples seen.
+ErrorOr<double> UnpredictableProfileLoaderPass::getMispredictRatio(
+    const FunctionSamples *FuncFreqSamples,
+    const FunctionSamples *FuncMispSamples, const Instruction *I) {
+
+  const auto &Loc = I->getDebugLoc();
+  if (!Loc)
+    return std::error_code();
+
+  const FunctionSamples *FreqSamples =
+      FuncFreqSamples->findFunctionSamples(Loc, FreqReader->getRemapper());
+  if (!FreqSamples)
+    return std::error_code();
+  const ErrorOr<uint64_t> FreqCount = FreqSamples->findSamplesAt(
+      FunctionSamples::getOffset(Loc), Loc->getBaseDiscriminator());
+  if (!FreqCount)
+    return std::error_code();
+
+  const FunctionSamples *MispSamples =
+      FuncMispSamples->findFunctionSamples(Loc, MispReader->getRemapper());
+  if (!MispSamples)
+    return std::error_code();
+  const ErrorOr<uint64_t> MispCount = MispSamples->findSamplesAt(
+      FunctionSamples::getOffset(Loc), Loc->getBaseDiscriminator());
+  if (!MispCount)
+    return std::error_code();
+
+  const double Freq = FreqCount.get();
+  if (!Freq)
+    return std::error_code();
+
+  const double Misp = MispCount.get();
+  const double MissRatio = (Misp * RatioFactor) / Freq;
+
+  LLVM_DEBUG(dbgs() << "Computing mispredict ratio of " << format("%0.2f", Misp)
+                    << "/" << format("%0.2f", Freq) << " * "
+                    << format("%0.2f", RatioFactor.getValue()) << " = "
+                    << format("%0.2f", MissRatio) << " for instruction\n"
+                    << *I << "\n");
+  return MissRatio;
+}
+
+// Examine all Select and BranchInsts in a function, adding !unpredictable
+// metadata if they appear in the mispredict profile with sufficient weight.
+bool UnpredictableProfileLoaderPass::addUpredictableMetadata(Function &F) {
+
+  const FunctionSamples *FreqSamples = FreqReader->getSamplesFor(F);
+  if (!FreqSamples)
+    return false;
+
+  const FunctionSamples *MispSamples = MispReader->getSamplesFor(F);
+  if (!MispSamples)
+    return false;
+
+  bool MadeChange = false;
+  for (BasicBlock &BB : F) {
+    for (Instruction &I : BB) {
+      if (!isa<BranchInst>(&I) && !isa<SelectInst>(&I) && !isa<SwitchInst>(&I))
+        continue;
+      if (I.hasMetadata(LLVMContext::MD_unpredictable))
+        continue;
+
+      const ErrorOr<double> RatioOrError =
+          getMispredictRatio(FreqSamples, MispSamples, &I);
+      if (!RatioOrError)
+        continue;
+      const double MissRatio = RatioOrError.get();
+
+      if (MissRatio < MinimumRatio) {
+        LLVM_DEBUG(dbgs() << "\tRatio " << format("%0.2f", MissRatio)
+                          << " is below threshold of "
+                          << format("%0.2f", MinimumRatio.getValue())
+                          << "; ignoring.\n");
+        continue;
+      }
+
+      // In the future we probably want to attach more information here, such as
+      // the mispredict count or ratio.
+      MDNode *MD = MDNode::get(I.getContext(), std::nullopt);
+      I.setMetadata(LLVMContext::MD_unpredictable, MD);
+      MadeChange = true;
+    }
+  }
+
+  return MadeChange;
+}
+
+bool UnpredictableProfileLoaderPass::addUpredictableMetadata(Module &M) {
+  bool MadeChange = false;
+
+  for (Function &F : M)
+    MadeChange |= addUpredictableMetadata(F);
+
+  // Return an indication of whether we changed anything or not.
+  return MadeChange;
+}
+
+bool UnpredictableProfileLoaderPass::loadSampleProfile(Module &M) {
+  if (MispReader && FreqReader)
+    return true;
+
+  assert(!MispReader && !FreqReader &&
+         "Expected both or neither profile readers");
+
+  LLVMContext &Ctx = M.getContext();
+  auto FS = vfs::getRealFileSystem();
+
+  auto ReadProfile = [&Ctx,
+                      &FS](const std::string ProfileFile,
+                           std::unique_ptr<SampleProfileReader> &ReaderPtr) {
+    if (ProfileFile.empty())
+      return false;
+
+    ErrorOr<std::unique_ptr<SampleProfileReader>> ReaderOrErr =
+        SampleProfileReader::create(ProfileFile, Ctx, *FS);
+    if (std::error_code EC = ReaderOrErr.getError()) {
+      std::string Msg = "Could not open profile: " + EC.message();
+      Ctx.diagnose(DiagnosticInfoSampleProfile(ProfileFile, Msg,
+                                               DiagnosticSeverity::DS_Warning));
+      return false;
+    }
+
+    ReaderPtr = std::move(ReaderOrErr.get());
+    ReaderPtr->read();
+
+    return true;
+  };
+
+  if (!ReadProfile(UnpredictableHintsFile, MispReader))
+    return false;
+
+  if (!ReadProfile(FrequencyProfileFile, FreqReader))
+    return false;
+
+  return true;
+}
+
+UnpredictableProfileLoaderPass::UnpredictableProfileLoaderPass()
+    : FrequencyProfileFile(FrequencyProfileOption) {}
+
+UnpredictableProfileLoaderPass::UnpredictableProfileLoaderPass(
+    StringRef PGOProfileFile)
+    : FrequencyProfileFile(FrequencyProfileOption.empty()
+                               ? PGOProfileFile
+                               : FrequencyProfileOption) {}
+
+PreservedAnalyses UnpredictableProfileLoaderPass::run(Module &M,
+                                                      ModuleAnalysisManager &) {
+  if (!loadSampleProfile(M))
+    return PreservedAnalyses::all();
+
+  if (addUpredictableMetadata(M)) {
+    PreservedAnalyses PA;
+    PA.preserveSet<CFGAnalyses>();
+    return PA;
+  }
+
+  return PreservedAnalyses::all();
+}
diff --git a/llvm/test/Other/new-pm-pgo.ll b/llvm/test/Other/new-pm-pgo.ll
index 4f856faacd332..b4be524158497 100644
--- a/llvm/test/Other/new-pm-pgo.ll
+++ b/llvm/test/Other/new-pm-pgo.ll
@@ -25,6 +25,7 @@
 ; SAMPLE_USE_PRE_LINK: Running pass: SROAPass
 ; SAMPLE_USE_PRE_LINK: Running pass: EarlyCSEPass
 ; SAMPLE_USE: Running pass: SampleProfileLoaderPass
+; SAMPLE_USE: Running pass: UnpredictableProfileLoaderPass
 ; SAMPLE_USE_O: Running pass: PGOIndirectCallPromotion
 ; SAMPLE_USE_POST_LINK-NOT: Running pass: GlobalOptPass
 ; SAMPLE_USE_POST_LINK: Running pass: PGOIndirectCallPromotion
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index ac80a31d8fd4b..3339630b42da4 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -35,6 +35,7 @@
 ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
 ; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
 ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
+; CHECK-O-NEXT: Running pass: UnpredictableProfileLoaderPass
 ; CHECK-O-NEXT: Running pass: PGOIndirectCallPromotion
 ; CHECK-O-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
 ; CHECK-O-NEXT: Running pass: OpenMPOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index 210a4ef1f7664..eaef729619cb8 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -47,6 +47,7 @@
 ; CHECK-O-NEXT: Running analysis: ProfileSummaryAnalysis
 ; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
 ; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ProfileSummaryAnalysis
+; CHECK-O-NEXT: Running pass: UnpredictableProfileLoaderPass
 ; CHECK-O-NEXT: Running pass: OpenMPOptPass
 ; CHECK-O-NEXT: Running pass: IPSCCPPass
 ; CHECK-O-NEXT: Running pass: CalledValuePropagationPass
diff --git a/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/frequency.prof b/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/frequency.prof
new file mode 100644
index 0000000000000..5bdb6df9f5176
--- /dev/null
+++ b/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/frequency.prof
@@ -0,0 +1,3 @@
+# This is a standard SPGO profile indicating basic block execution frequency.
+sel_arr:1:0
+ 11: 4000
diff --git a/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/inline.freq.prof b/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/inline.freq.prof
new file mode 100644
index 0000000000000..aa8934672cf22
--- /dev/null
+++ b/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/inline.freq.prof
@@ -0,0 +1,3 @@
+caller:1:0
+ 1: callee:1
+  3: 997
diff --git a/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/inline.misp.prof b/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/inline.misp.prof
new file mode 100644
index 0000000000000..d1e06a971e3a8
--- /dev/null
+++ b/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/inline.misp.prof
@@ -0,0 +1,3 @@
+caller:1:0
+ 1: callee:1
+  3: 400
diff --git a/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/mispredict.prof b/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/mispredict.prof
new file mode 100644
index 0000000000000..14769e7de9c30
--- /dev/null
+++ b/llvm/test/Transforms/UnpredictableProfileLoader/Inputs/mispredict.prof
@@ -0,0 +1,4 @@
+# This profile indicates 1000 mispredict samples for instructions 11 source
+# lines into in the sel_arr function.
+sel_arr:1:0
+  11: 1000
diff --git a/llvm/test/Transforms/UnpredictableProfileLoader/inlined.ll b/llvm/test/Transforms/UnpredictableProfileLoader/inlined.ll
new file mode 100644
index 0000000000000..60fb56af2f7ef
--- /dev/null
+++ b/llvm/test/Transforms/UnpredictableProfileLoader/inlined.ll
@@ -0,0 +1,113 @@
+; RUN: opt < %s -passes=unpredictable-profile-loader -unpredictable-hints-file=%S/Inputs/inline.misp.prof -unpredictable-hints-frequency-profile=%S/Inputs/inline.freq.prof -unpredictable-hints-min-ratio=0.1 -S | FileCheck %s
+; RUN: opt < %s -passes=unpredictable-profile-loader -unpredictable-hints-file=%S/Inputs/inline.misp.prof -unpredictable-hints-frequency-profile=%S/Inputs/inline.freq.prof -unpredictable-hints-min-ratio=0.5 -S | FileCheck --check-prefixes=MIN %s
+
+; Test that we can apply branch mispredict profile data when the branch of
+; interest in `callee` has been inlined into `caller`.
+
+; // Original C source:
+; static int callee(double *A, double *B) {
+;   int count = 0;
+;   for(int i=0; i<1000000; ++i)
+;     if(A[i] > 100)
+;       count += B[i] * 3;
+;
+;   return count;
+; }
+;
+; int caller(double *X, double *Y) {
+;   return callee(X, Y);
+; }
+
+; CHECK-LABEL: @caller
+define dso_local i32 @caller(ptr nocapture noundef readonly %X, ptr nocapture noundef readonly %Y) local_unnamed_addr !dbg !7 {
+entry:
+  tail call void @llvm.dbg.value(metadata ptr %X, metadata !14, metadata !DIExpression()), !dbg !16
+  tail call void @llvm.dbg.value(metadata ptr %Y, metadata !15, metadata !DIExpression()), !dbg !16
+  tail call void @llvm.dbg.value(metadata ptr %X, metadata !17, metadata !DIExpression()), !dbg !24
+  tail call void @llvm.dbg.value(metadata ptr %Y, metadata !20, metadata !DIExpression()), !dbg !24
+  tail call void @llvm.dbg.value(metadata i32 0, metadata !21, metadata !DIExpression()), !dbg !24
+  tail call void @llvm.dbg.value(metadata i32 0, metadata !22, metadata !DIExpression()), !dbg !26
+  br label %for.body.i, !dbg !27
+
+for.body.i:                                       ; preds = %for.inc.i, %entry
+  %indvars.iv.i = phi i64 [ 0, %entry ], [ %indvars.iv.next.i, %for.inc.i ]
+  %count.09.i = phi i32 [ 0, %entry ], [ %count.1.i, %for.inc.i ]
+  tail call void @llvm.dbg.value(metadata i64 %indvars.iv.i, metadata !22, metadata !DIExpression()), !dbg !26
+  tail call void @llvm.dbg.value(metadata i32 %count.09.i, metadata !21, metadata !DIExpression()), !dbg !24
+  %arrayidx.i = getelementptr inbounds double, ptr %X, i64 %indvars.iv.i, !dbg !28
+  %0 = load double, ptr %arrayidx.i, align 8, !dbg !28
+  %cmp1.i = fcmp reassoc nsz arcp contract afn ogt double %0, 1.000000e+02, !dbg !35
+; CHECK: br i1 %cmp1.i, label %if.then.i, label %for.inc.i
+; CHECK-SAME: !unpredictable
+; MIN: br i1 %cmp1.i, label %if.then.i, label %for.inc.i
+; MIN-NOT: !unpredictable
+  br i1 %cmp1.i, label %if.then.i, label %for.inc.i, !dbg !36
+
+if.then.i:                                        ; preds = %for.body.i
+  %arrayidx3.i = getelementptr inbounds double, ptr %Y, i64 %indvars.iv.i, !dbg !37
+  %1 = load double, ptr %arrayidx3.i, align 8, !dbg !37
+  %mul.i = fmul reassoc nsz arcp contract afn double %1, 3.000000e+00, !dbg !38
+  %conv.i = sitofp i32 %count.09.i to double, !dbg !39
+  %add.i = fadd reassoc nsz arcp contract afn double %mul.i, %conv.i, !dbg !39
+  %conv4.i = fptosi double %add.i to i32, !dbg !39
+  tail call void @llvm.dbg.value(metadata i32 %conv4.i, metadata !21, metadata !DIExpression()), !dbg !24
+  br label %for.inc.i, !dbg !40
+
+for.inc.i:                                        ; preds = %if.then.i, %for.body.i
+  %count.1.i = phi i32 [ %conv4.i, %if.then.i ], [ %count.0...
[truncated]

@david-xl
Copy link
Contributor

@sapostolakis does CMOV optimization take advantage of the hint?

@david-xl david-xl requested a review from WenleiHe July 16, 2024 18:11
@david-xl david-xl marked this pull request as draft July 16, 2024 18:11
@sapostolakis
Copy link
Contributor

@sapostolakis does CMOV optimization take advantage of the hint?

Yes, it checks the !unpredictable metadata (https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectOptimize.cpp#L917). So, given that this PR adds !unpredictable metadata, it would work for the select-optimize pass.

@tcreech-intel
Copy link
Contributor Author

!unpredictable metadata is preserved as a MachineInstr flag and may be useful to backends too, such as here:

// Check if we found a X86::CMOVrr instruction. If it is marked as
// unpredictable, skip it and do not convert it to branch.
if (CC != X86::COND_INVALID &&
!I.getFlag(MachineInstr::MIFlag::Unpredictable) &&
(IncludeLoads || !I.mayLoad())) {

Also I see an additional use of IR !unpredictable is proposed in #98495. (Ping @HaohaiWen, @williamweixiao, @tianqingw for awareness/feedback.)

cl::ReallyHidden);

// Lookup samples for an Instruction's corresponding location in a
// FunctionSamples profile. The count returned is directly from the profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it returns MispredictRatio instead of count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Fixed in f74a236.

return MissRatio;
}

// Examine all Select and BranchInsts in a function, adding !unpredictable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examine all Branch, Select and Switch Insts in a function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in f74a236.

}

ReaderPtr = std::move(ReaderOrErr.get());
ReaderPtr->read();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check the return value of "ReaderPtr->read()"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be better to check it. Fixed in 565ba9a.

@williamweixiao
Copy link
Contributor

LGTM. please change the PR status.

@WenleiHe WenleiHe requested review from wlei-llvm and mtrofin July 22, 2024 05:50
@WenleiHe
Copy link
Member

WenleiHe commented Jul 22, 2024

Before we get into patch review, could you start an RFC to outline: 1) high level design (also including how this interacts with other PGO, and usage instructions), 2) performance results, 3) future plan?

While I understand the rational behind this optimization, I think it'd be useful to show some results on benchmarks (e.g. on spec, how much additional perf we're getting here on top of sample PGO) before we introduce this in LLVM upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants