Skip to content

[ctx_prof] Simple ICP criteria during module inliner #109881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

mtrofin
Copy link
Member

@mtrofin mtrofin commented Sep 24, 2024

This is mostly for test: under contextual profiling, we perform ICP for those indirect callsites which have targets marked as alwaysinline.

This helped uncover a bug with the way the profile was updated upon ICP, where we were skipping over the update if the target wasn't called in that context. That was resulting in incorrect counts for the indirect BB.

Also flyby fix to the total/direct count values, they should be 64-bit (as all counters are in the contextual profile)

Copy link
Member Author

mtrofin commented Sep 24, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @mtrofin and the rest of your teammates on Graphite Graphite

Copy link

github-actions bot commented Sep 24, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@mtrofin mtrofin force-pushed the users/mtrofin/09-23-_ctx_prof_simple_icp_criteria_during_module_inliner branch 2 times, most recently from 0423775 to 8f13507 Compare September 24, 2024 23:14
@mtrofin mtrofin marked this pull request as ready for review September 24, 2024 23:16
@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Sep 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 24, 2024

@llvm/pr-subscribers-llvm-analysis

Author: Mircea Trofin (mtrofin)

Changes

This is mostly for test: under contextual profiling, we perform ICP for those indirect callsites which have targets marked as alwaysinline.

This helped uncover a bug with the way the profile was updated upon ICP, where we were skipping over the update if the target wasn't called in that context. That was resulting in incorrect counts for the indirect BB.

Also flyby fix to the total/direct count values, they should be 64-bit (as all counters are in the contextual profile)


Full diff: https://github.com/llvm/llvm-project/pull/109881.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Analysis/CtxProfAnalysis.h (+13)
  • (modified) llvm/lib/Analysis/CtxProfAnalysis.cpp (+23)
  • (modified) llvm/lib/Transforms/IPO/ModuleInliner.cpp (+35-8)
  • (modified) llvm/lib/Transforms/Utils/CallPromotionUtils.cpp (+19-16)
  • (added) llvm/test/Analysis/CtxProfAnalysis/flatten-icp.ll (+55)
diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 0a5beb92fcbcc0..0a9543f037eb58 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -9,6 +9,7 @@
 #ifndef LLVM_ANALYSIS_CTXPROFANALYSIS_H
 #define LLVM_ANALYSIS_CTXPROFANALYSIS_H
 
+#include "llvm/ADT/SetVector.h"
 #include "llvm/IR/GlobalValue.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/IntrinsicInst.h"
@@ -63,6 +64,13 @@ class PGOContextualProfile {
     return getDefinedFunctionGUID(F) != 0;
   }
 
+  StringRef getFunctionName(GlobalValue::GUID GUID) const {
+    auto It = FuncInfo.find(GUID);
+    if (It == FuncInfo.end())
+      return "";
+    return It->second.Name;
+  }
+
   uint32_t getNumCounters(const Function &F) const {
     assert(isFunctionKnown(F));
     return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCounterIndex;
@@ -120,6 +128,11 @@ class CtxProfAnalysis : public AnalysisInfoMixin<CtxProfAnalysis> {
 
   /// Get the step instrumentation associated with a `select`
   static InstrProfIncrementInstStep *getSelectInstrumentation(SelectInst &SI);
+
+  // FIXME: refactor to an advisor model, and separate
+  static void collectIndirectCallPromotionList(
+      CallBase &IC, Result &Profile,
+      SetVector<std::pair<CallBase *, Function *>> &Candidates);
 };
 
 class CtxProfAnalysisPrinterPass
diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp b/llvm/lib/Analysis/CtxProfAnalysis.cpp
index 7517011395a7d6..873277cf51d6b9 100644
--- a/llvm/lib/Analysis/CtxProfAnalysis.cpp
+++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp
@@ -21,6 +21,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/JSON.h"
 #include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Transforms/Utils/CallPromotionUtils.h"
 
 #define DEBUG_TYPE "ctx_prof"
 
@@ -309,3 +310,25 @@ const CtxProfFlatProfile PGOContextualProfile::flatten() const {
       });
   return Flat;
 }
+
+void CtxProfAnalysis::collectIndirectCallPromotionList(
+    CallBase &IC, Result &Profile,
+    SetVector<std::pair<CallBase *, Function *>> &Candidates) {
+  const auto *Instr = CtxProfAnalysis::getCallsiteInstrumentation(IC);
+  if (!Instr)
+    return;
+  Module &M = *IC.getParent()->getModule();
+  const uint32_t CallID = Instr->getIndex()->getZExtValue();
+  Profile.visit(
+      [&](const PGOCtxProfContext &Ctx) {
+        const auto &Targets = Ctx.callsites().find(CallID);
+        if (Targets == Ctx.callsites().end())
+          return;
+        for (const auto &[Guid, _] : Targets->second)
+          if (auto Name = Profile.getFunctionName(Guid); !Name.empty())
+            if (auto *Target = M.getFunction(Name))
+              if (Target->hasFnAttribute(Attribute::AlwaysInline))
+                Candidates.insert({&IC, Target});
+      },
+      IC.getCaller());
+}
diff --git a/llvm/lib/Transforms/IPO/ModuleInliner.cpp b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
index 542c319b880747..cf2b34b5a6367b 100644
--- a/llvm/lib/Transforms/IPO/ModuleInliner.cpp
+++ b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
@@ -49,6 +49,13 @@ using namespace llvm;
 STATISTIC(NumInlined, "Number of functions inlined");
 STATISTIC(NumDeleted, "Number of functions deleted because all callers found");
 
+cl::opt<bool> CtxProfPromoteAlwaysInline(
+    "ctx-prof-promote-alwaysinline", cl::init(false), cl::Hidden,
+    cl::desc("If using a contextual profile in this module, and an indirect "
+             "call target is marked as alwaysinline, perform indirect call "
+             "promotion for that target. If multiple targets for an indirect "
+             "call site fit this description, they are all promoted."));
+
 /// Return true if the specified inline history ID
 /// indicates an inline history that includes the specified function.
 static bool inlineHistoryIncludes(
@@ -145,10 +152,11 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
   assert(Calls != nullptr && "Expected an initialized InlineOrder");
 
   // Populate the initial list of calls in this module.
+  SetVector<std::pair<CallBase *, Function *>> ICPCandidates;
   for (Function &F : M) {
     auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
-    for (Instruction &I : instructions(F))
-      if (auto *CB = dyn_cast<CallBase>(&I))
+    for (Instruction &I : instructions(F)) {
+      if (auto *CB = dyn_cast<CallBase>(&I)) {
         if (Function *Callee = CB->getCalledFunction()) {
           if (!Callee->isDeclaration())
             Calls->push({CB, -1});
@@ -163,7 +171,17 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
                      << setIsVerbose();
             });
           }
+        } else if (CtxProfPromoteAlwaysInline && CtxProf &&
+                   CB->isIndirectCall()) {
+          CtxProfAnalysis::collectIndirectCallPromotionList(*CB, CtxProf,
+                                                            ICPCandidates);
         }
+      }
+    }
+  }
+  for (auto &[CB, Target] : ICPCandidates) {
+    if (auto *DirectCB = promoteCallWithIfThenElse(*CB, *Target, CtxProf))
+      Calls->push({DirectCB, -1});
   }
   if (Calls->empty())
     return PreservedAnalyses::all();
@@ -242,13 +260,22 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
           // iteration because the next iteration may not happen and we may
           // miss inlining it.
           // FIXME: enable for ctxprof.
-          if (!CtxProf)
-            if (tryPromoteCall(*ICB))
-              NewCallee = ICB->getCalledFunction();
+          if (CtxProfPromoteAlwaysInline && CtxProf) {
+            SetVector<std::pair<CallBase *, Function *>> Candidates;
+            CtxProfAnalysis::collectIndirectCallPromotionList(*ICB, CtxProf,
+                                                              Candidates);
+            for (auto &[DC, _] : Candidates) {
+              assert(!DC->isIndirectCall());
+              assert(!DC->getCalledFunction()->isDeclaration() &&
+                     "CtxProf promotes calls to defined targets only");
+              Calls->push({DC, NewHistoryID});
+            }
+          } else if (tryPromoteCall(*ICB)) {
+            NewCallee = ICB->getCalledFunction();
+            if (NewCallee && !NewCallee->isDeclaration())
+              Calls->push({ICB, NewHistoryID});
+          }
         }
-        if (NewCallee)
-          if (!NewCallee->isDeclaration())
-            Calls->push({ICB, NewHistoryID});
       }
     }
 
diff --git a/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp b/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp
index 5f872c352429c1..3d2fa226ff15b9 100644
--- a/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp
+++ b/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp
@@ -623,34 +623,37 @@ CallBase *llvm::promoteCallWithIfThenElse(CallBase &CB, Function &Callee,
     // All the ctx-es belonging to a function must have the same size counters.
     Ctx.resizeCounters(NewCountersSize);
 
-    // Maybe in this context, the indirect callsite wasn't observed at all
+    // Maybe in this context, the indirect callsite wasn't observed at all. That
+    // would make both direct and indirect BBs cold - which is what we already
+    // have from resising the counters.
     if (!Ctx.hasCallsite(CSIndex))
       return;
     auto &CSData = Ctx.callsite(CSIndex);
-    auto It = CSData.find(CalleeGUID);
 
-    // Maybe we did notice the indirect callsite, but to other targets.
-    if (It == CSData.end())
-      return;
-
-    assert(CalleeGUID == It->second.guid());
-
-    uint32_t DirectCount = It->second.getEntrycount();
-    uint32_t TotalCount = 0;
+    uint64_t TotalCount = 0;
     for (const auto &[_, V] : CSData)
       TotalCount += V.getEntrycount();
+    uint64_t DirectCount = 0;
+    // If we called the direct target, update the DirectCount. If we didn't, we
+    // still want to update the indirect BB (to which the TotalCount goes, in
+    // that case).
+    if (auto It = CSData.find(CalleeGUID); It != CSData.end()) {
+      assert(CalleeGUID == It->second.guid());
+      DirectCount = It->second.getEntrycount();
+      // This direct target needs to be moved to this caller under the
+      // newly-allocated callsite index.
+      assert(Ctx.callsites().count(NewCSID) == 0);
+      Ctx.ingestContext(NewCSID, std::move(It->second));
+      CSData.erase(CalleeGUID);
+    }
+
     assert(TotalCount >= DirectCount);
-    uint32_t IndirectCount = TotalCount - DirectCount;
+    uint64_t IndirectCount = TotalCount - DirectCount;
     // The ICP's effect is as-if the direct BB would have been taken DirectCount
     // times, and the indirect BB, IndirectCount times
     Ctx.counters()[DirectID] = DirectCount;
     Ctx.counters()[IndirectID] = IndirectCount;
 
-    // This particular indirect target needs to be moved to this caller under
-    // the newly-allocated callsite index.
-    assert(Ctx.callsites().count(NewCSID) == 0);
-    Ctx.ingestContext(NewCSID, std::move(It->second));
-    CSData.erase(CalleeGUID);
   };
   CtxProf.update(ProfileUpdater, &Caller);
   return &DirectCall;
diff --git a/llvm/test/Analysis/CtxProfAnalysis/flatten-icp.ll b/llvm/test/Analysis/CtxProfAnalysis/flatten-icp.ll
new file mode 100644
index 00000000000000..f7529432d4251d
--- /dev/null
+++ b/llvm/test/Analysis/CtxProfAnalysis/flatten-icp.ll
@@ -0,0 +1,55 @@
+; RUN: split-file %s %t
+; RUN: llvm-ctxprof-util fromJSON --input %t/profile.json --output %t/profile.ctxprofdata
+;
+; In the given profile, in one of the contexts the indirect call is taken, the
+; target we're trying to ICP - GUID:2000 - doesn't appear at all. That should
+; contribute to the count of the "indirect call BB".
+; RUN: opt %t/test.ll -S -passes='require<ctx-prof-analysis>,module-inline,ctx-prof-flatten' -use-ctx-profile=%t/profile.ctxprofdata -ctx-prof-promote-alwaysinline 
+
+; CHECK-LABEL: define i32 @caller(ptr %c)
+; CHECK-NEXT:     [[CND:[0-9]+]] = icmp eq ptr %c, @one
+; CHECK-NEXT:     br i1 [[CND]], label %{{.*}}, label %{{.*}}, !prof ![[BW:[0-9]+]]
+
+; CHECK: ![[BW]] = !{!"branch_weights", i32 10, i32 10}
+
+;--- test.ll
+declare i32 @external(i32 %x)
+define i32 @one() #0 !guid !0 {
+  call void @llvm.instrprof.increment(ptr @one, i64 123, i32 1, i32 0)
+  call void @llvm.instrprof.callsite(ptr @one, i64 123, i32 1, i32 0, ptr @external)
+  %ret = call i32 @external(i32 1)
+  ret i32 %ret
+}
+
+define i32 @caller(ptr %c) #1 !guid !1 {
+  call void @llvm.instrprof.increment(ptr @caller, i64 567, i32 1, i32 0)
+  call void @llvm.instrprof.callsite(ptr @caller, i64 567, i32 1, i32 0, ptr %c)
+  %ret = call i32 %c()
+  ret i32 %ret
+}
+
+define i32 @root(ptr %c) !guid !2 {
+  call void @llvm.instrprof.increment(ptr @root, i64 432, i32 1, i32 0)
+  call void @llvm.instrprof.callsite(ptr @root, i64 432, i32 2, i32 0, ptr @caller)
+  %a = call i32 @caller(ptr %c)
+  call void @llvm.instrprof.callsite(ptr @root, i64 432, i32 2, i32 1, ptr @caller)
+  %b = call i32 @caller(ptr %c)
+  %ret = add i32 %a, %b
+  ret i32 %ret
+
+}
+
+attributes #0 = { alwaysinline }
+attributes #1 = { noinline }
+!0 = !{i64 1000}
+!1 = !{i64 3000}
+!2 = !{i64 4000}
+
+;--- profile.json
+[ {
+  "Guid": 4000, "Counters":[10], "Callsites": [
+    [{"Guid":3000, "Counters":[10], "Callsites":[[{"Guid":1000, "Counters":[10]}]]}],
+    [{"Guid":3000, "Counters":[10], "Callsites":[[{"Guid":9000, "Counters":[10]}]]}]
+  ]
+}
+]
\ No newline at end of file

@llvmbot
Copy link
Member

llvmbot commented Sep 24, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Mircea Trofin (mtrofin)

Changes

This is mostly for test: under contextual profiling, we perform ICP for those indirect callsites which have targets marked as alwaysinline.

This helped uncover a bug with the way the profile was updated upon ICP, where we were skipping over the update if the target wasn't called in that context. That was resulting in incorrect counts for the indirect BB.

Also flyby fix to the total/direct count values, they should be 64-bit (as all counters are in the contextual profile)


Full diff: https://github.com/llvm/llvm-project/pull/109881.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Analysis/CtxProfAnalysis.h (+13)
  • (modified) llvm/lib/Analysis/CtxProfAnalysis.cpp (+23)
  • (modified) llvm/lib/Transforms/IPO/ModuleInliner.cpp (+35-8)
  • (modified) llvm/lib/Transforms/Utils/CallPromotionUtils.cpp (+19-16)
  • (added) llvm/test/Analysis/CtxProfAnalysis/flatten-icp.ll (+55)
diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 0a5beb92fcbcc0..0a9543f037eb58 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -9,6 +9,7 @@
 #ifndef LLVM_ANALYSIS_CTXPROFANALYSIS_H
 #define LLVM_ANALYSIS_CTXPROFANALYSIS_H
 
+#include "llvm/ADT/SetVector.h"
 #include "llvm/IR/GlobalValue.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/IntrinsicInst.h"
@@ -63,6 +64,13 @@ class PGOContextualProfile {
     return getDefinedFunctionGUID(F) != 0;
   }
 
+  StringRef getFunctionName(GlobalValue::GUID GUID) const {
+    auto It = FuncInfo.find(GUID);
+    if (It == FuncInfo.end())
+      return "";
+    return It->second.Name;
+  }
+
   uint32_t getNumCounters(const Function &F) const {
     assert(isFunctionKnown(F));
     return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCounterIndex;
@@ -120,6 +128,11 @@ class CtxProfAnalysis : public AnalysisInfoMixin<CtxProfAnalysis> {
 
   /// Get the step instrumentation associated with a `select`
   static InstrProfIncrementInstStep *getSelectInstrumentation(SelectInst &SI);
+
+  // FIXME: refactor to an advisor model, and separate
+  static void collectIndirectCallPromotionList(
+      CallBase &IC, Result &Profile,
+      SetVector<std::pair<CallBase *, Function *>> &Candidates);
 };
 
 class CtxProfAnalysisPrinterPass
diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp b/llvm/lib/Analysis/CtxProfAnalysis.cpp
index 7517011395a7d6..873277cf51d6b9 100644
--- a/llvm/lib/Analysis/CtxProfAnalysis.cpp
+++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp
@@ -21,6 +21,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/JSON.h"
 #include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Transforms/Utils/CallPromotionUtils.h"
 
 #define DEBUG_TYPE "ctx_prof"
 
@@ -309,3 +310,25 @@ const CtxProfFlatProfile PGOContextualProfile::flatten() const {
       });
   return Flat;
 }
+
+void CtxProfAnalysis::collectIndirectCallPromotionList(
+    CallBase &IC, Result &Profile,
+    SetVector<std::pair<CallBase *, Function *>> &Candidates) {
+  const auto *Instr = CtxProfAnalysis::getCallsiteInstrumentation(IC);
+  if (!Instr)
+    return;
+  Module &M = *IC.getParent()->getModule();
+  const uint32_t CallID = Instr->getIndex()->getZExtValue();
+  Profile.visit(
+      [&](const PGOCtxProfContext &Ctx) {
+        const auto &Targets = Ctx.callsites().find(CallID);
+        if (Targets == Ctx.callsites().end())
+          return;
+        for (const auto &[Guid, _] : Targets->second)
+          if (auto Name = Profile.getFunctionName(Guid); !Name.empty())
+            if (auto *Target = M.getFunction(Name))
+              if (Target->hasFnAttribute(Attribute::AlwaysInline))
+                Candidates.insert({&IC, Target});
+      },
+      IC.getCaller());
+}
diff --git a/llvm/lib/Transforms/IPO/ModuleInliner.cpp b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
index 542c319b880747..cf2b34b5a6367b 100644
--- a/llvm/lib/Transforms/IPO/ModuleInliner.cpp
+++ b/llvm/lib/Transforms/IPO/ModuleInliner.cpp
@@ -49,6 +49,13 @@ using namespace llvm;
 STATISTIC(NumInlined, "Number of functions inlined");
 STATISTIC(NumDeleted, "Number of functions deleted because all callers found");
 
+cl::opt<bool> CtxProfPromoteAlwaysInline(
+    "ctx-prof-promote-alwaysinline", cl::init(false), cl::Hidden,
+    cl::desc("If using a contextual profile in this module, and an indirect "
+             "call target is marked as alwaysinline, perform indirect call "
+             "promotion for that target. If multiple targets for an indirect "
+             "call site fit this description, they are all promoted."));
+
 /// Return true if the specified inline history ID
 /// indicates an inline history that includes the specified function.
 static bool inlineHistoryIncludes(
@@ -145,10 +152,11 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
   assert(Calls != nullptr && "Expected an initialized InlineOrder");
 
   // Populate the initial list of calls in this module.
+  SetVector<std::pair<CallBase *, Function *>> ICPCandidates;
   for (Function &F : M) {
     auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
-    for (Instruction &I : instructions(F))
-      if (auto *CB = dyn_cast<CallBase>(&I))
+    for (Instruction &I : instructions(F)) {
+      if (auto *CB = dyn_cast<CallBase>(&I)) {
         if (Function *Callee = CB->getCalledFunction()) {
           if (!Callee->isDeclaration())
             Calls->push({CB, -1});
@@ -163,7 +171,17 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
                      << setIsVerbose();
             });
           }
+        } else if (CtxProfPromoteAlwaysInline && CtxProf &&
+                   CB->isIndirectCall()) {
+          CtxProfAnalysis::collectIndirectCallPromotionList(*CB, CtxProf,
+                                                            ICPCandidates);
         }
+      }
+    }
+  }
+  for (auto &[CB, Target] : ICPCandidates) {
+    if (auto *DirectCB = promoteCallWithIfThenElse(*CB, *Target, CtxProf))
+      Calls->push({DirectCB, -1});
   }
   if (Calls->empty())
     return PreservedAnalyses::all();
@@ -242,13 +260,22 @@ PreservedAnalyses ModuleInlinerPass::run(Module &M,
           // iteration because the next iteration may not happen and we may
           // miss inlining it.
           // FIXME: enable for ctxprof.
-          if (!CtxProf)
-            if (tryPromoteCall(*ICB))
-              NewCallee = ICB->getCalledFunction();
+          if (CtxProfPromoteAlwaysInline && CtxProf) {
+            SetVector<std::pair<CallBase *, Function *>> Candidates;
+            CtxProfAnalysis::collectIndirectCallPromotionList(*ICB, CtxProf,
+                                                              Candidates);
+            for (auto &[DC, _] : Candidates) {
+              assert(!DC->isIndirectCall());
+              assert(!DC->getCalledFunction()->isDeclaration() &&
+                     "CtxProf promotes calls to defined targets only");
+              Calls->push({DC, NewHistoryID});
+            }
+          } else if (tryPromoteCall(*ICB)) {
+            NewCallee = ICB->getCalledFunction();
+            if (NewCallee && !NewCallee->isDeclaration())
+              Calls->push({ICB, NewHistoryID});
+          }
         }
-        if (NewCallee)
-          if (!NewCallee->isDeclaration())
-            Calls->push({ICB, NewHistoryID});
       }
     }
 
diff --git a/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp b/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp
index 5f872c352429c1..3d2fa226ff15b9 100644
--- a/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp
+++ b/llvm/lib/Transforms/Utils/CallPromotionUtils.cpp
@@ -623,34 +623,37 @@ CallBase *llvm::promoteCallWithIfThenElse(CallBase &CB, Function &Callee,
     // All the ctx-es belonging to a function must have the same size counters.
     Ctx.resizeCounters(NewCountersSize);
 
-    // Maybe in this context, the indirect callsite wasn't observed at all
+    // Maybe in this context, the indirect callsite wasn't observed at all. That
+    // would make both direct and indirect BBs cold - which is what we already
+    // have from resising the counters.
     if (!Ctx.hasCallsite(CSIndex))
       return;
     auto &CSData = Ctx.callsite(CSIndex);
-    auto It = CSData.find(CalleeGUID);
 
-    // Maybe we did notice the indirect callsite, but to other targets.
-    if (It == CSData.end())
-      return;
-
-    assert(CalleeGUID == It->second.guid());
-
-    uint32_t DirectCount = It->second.getEntrycount();
-    uint32_t TotalCount = 0;
+    uint64_t TotalCount = 0;
     for (const auto &[_, V] : CSData)
       TotalCount += V.getEntrycount();
+    uint64_t DirectCount = 0;
+    // If we called the direct target, update the DirectCount. If we didn't, we
+    // still want to update the indirect BB (to which the TotalCount goes, in
+    // that case).
+    if (auto It = CSData.find(CalleeGUID); It != CSData.end()) {
+      assert(CalleeGUID == It->second.guid());
+      DirectCount = It->second.getEntrycount();
+      // This direct target needs to be moved to this caller under the
+      // newly-allocated callsite index.
+      assert(Ctx.callsites().count(NewCSID) == 0);
+      Ctx.ingestContext(NewCSID, std::move(It->second));
+      CSData.erase(CalleeGUID);
+    }
+
     assert(TotalCount >= DirectCount);
-    uint32_t IndirectCount = TotalCount - DirectCount;
+    uint64_t IndirectCount = TotalCount - DirectCount;
     // The ICP's effect is as-if the direct BB would have been taken DirectCount
     // times, and the indirect BB, IndirectCount times
     Ctx.counters()[DirectID] = DirectCount;
     Ctx.counters()[IndirectID] = IndirectCount;
 
-    // This particular indirect target needs to be moved to this caller under
-    // the newly-allocated callsite index.
-    assert(Ctx.callsites().count(NewCSID) == 0);
-    Ctx.ingestContext(NewCSID, std::move(It->second));
-    CSData.erase(CalleeGUID);
   };
   CtxProf.update(ProfileUpdater, &Caller);
   return &DirectCall;
diff --git a/llvm/test/Analysis/CtxProfAnalysis/flatten-icp.ll b/llvm/test/Analysis/CtxProfAnalysis/flatten-icp.ll
new file mode 100644
index 00000000000000..f7529432d4251d
--- /dev/null
+++ b/llvm/test/Analysis/CtxProfAnalysis/flatten-icp.ll
@@ -0,0 +1,55 @@
+; RUN: split-file %s %t
+; RUN: llvm-ctxprof-util fromJSON --input %t/profile.json --output %t/profile.ctxprofdata
+;
+; In the given profile, in one of the contexts the indirect call is taken, the
+; target we're trying to ICP - GUID:2000 - doesn't appear at all. That should
+; contribute to the count of the "indirect call BB".
+; RUN: opt %t/test.ll -S -passes='require<ctx-prof-analysis>,module-inline,ctx-prof-flatten' -use-ctx-profile=%t/profile.ctxprofdata -ctx-prof-promote-alwaysinline 
+
+; CHECK-LABEL: define i32 @caller(ptr %c)
+; CHECK-NEXT:     [[CND:[0-9]+]] = icmp eq ptr %c, @one
+; CHECK-NEXT:     br i1 [[CND]], label %{{.*}}, label %{{.*}}, !prof ![[BW:[0-9]+]]
+
+; CHECK: ![[BW]] = !{!"branch_weights", i32 10, i32 10}
+
+;--- test.ll
+declare i32 @external(i32 %x)
+define i32 @one() #0 !guid !0 {
+  call void @llvm.instrprof.increment(ptr @one, i64 123, i32 1, i32 0)
+  call void @llvm.instrprof.callsite(ptr @one, i64 123, i32 1, i32 0, ptr @external)
+  %ret = call i32 @external(i32 1)
+  ret i32 %ret
+}
+
+define i32 @caller(ptr %c) #1 !guid !1 {
+  call void @llvm.instrprof.increment(ptr @caller, i64 567, i32 1, i32 0)
+  call void @llvm.instrprof.callsite(ptr @caller, i64 567, i32 1, i32 0, ptr %c)
+  %ret = call i32 %c()
+  ret i32 %ret
+}
+
+define i32 @root(ptr %c) !guid !2 {
+  call void @llvm.instrprof.increment(ptr @root, i64 432, i32 1, i32 0)
+  call void @llvm.instrprof.callsite(ptr @root, i64 432, i32 2, i32 0, ptr @caller)
+  %a = call i32 @caller(ptr %c)
+  call void @llvm.instrprof.callsite(ptr @root, i64 432, i32 2, i32 1, ptr @caller)
+  %b = call i32 @caller(ptr %c)
+  %ret = add i32 %a, %b
+  ret i32 %ret
+
+}
+
+attributes #0 = { alwaysinline }
+attributes #1 = { noinline }
+!0 = !{i64 1000}
+!1 = !{i64 3000}
+!2 = !{i64 4000}
+
+;--- profile.json
+[ {
+  "Guid": 4000, "Counters":[10], "Callsites": [
+    [{"Guid":3000, "Counters":[10], "Callsites":[[{"Guid":1000, "Counters":[10]}]]}],
+    [{"Guid":3000, "Counters":[10], "Callsites":[[{"Guid":9000, "Counters":[10]}]]}]
+  ]
+}
+]
\ No newline at end of file

@mtrofin mtrofin force-pushed the users/mtrofin/09-23-_ctx_prof_simple_icp_criteria_during_module_inliner branch 3 times, most recently from ba2db15 to 28af688 Compare September 25, 2024 20:57
@mtrofin mtrofin force-pushed the users/mtrofin/09-23-_ctx_prof_simple_icp_criteria_during_module_inliner branch from 28af688 to 0a0a1d5 Compare September 25, 2024 20:59
Copy link
Contributor

@kazutakahirata kazutakahirata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Member Author

mtrofin commented Sep 25, 2024

Merge activity

  • Sep 25, 6:04 PM EDT: @mtrofin started a stack merge that includes this pull request via Graphite.
  • Sep 25, 6:05 PM EDT: @mtrofin merged this pull request with Graphite.

@mtrofin mtrofin merged commit c8365fe into main Sep 25, 2024
5 of 8 checks passed
@mtrofin mtrofin deleted the users/mtrofin/09-23-_ctx_prof_simple_icp_criteria_during_module_inliner branch September 25, 2024 22:05
Copy link
Contributor

@chapuni chapuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely layering violation.

Copy link
Member Author

mtrofin commented Sep 26, 2024

Fixed in 3d01af7.

Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Sep 27, 2024
This is mostly for test: under contextual profiling, we perform ICP for those indirect callsites which have targets marked as `alwaysinline`.

This helped uncover a bug with the way the profile was updated upon ICP, where we were skipping over the update if the target wasn't called in that context. That was resulting in incorrect counts for the indirect BB.

Also flyby fix to the total/direct count values, they should be 64-bit (as all counters are in the contextual profile)
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Oct 31, 2024
…8b7a41bc2

Local branch amd-gfx 2c28b7a Merged main:aea06684992873f70c5834e2f455f913e5b8d671 into amd-gfx:617ef4684340
Remote branch main c8365fe [ctx_prof] Simple ICP criteria during module inliner (llvm#109881)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants