[ctxprof] Track unhandled call targets #131417

mtrofin · 2025-03-15T02:26:24Z

Collect profiles for functions we encounter when collecting a contextual profile, that are not associated with a call site. This is expected to happen for signal handlers, but it also - problematically - currently happens for mem{memset|copy|move|set}, which are currently inserted after profile instrumentation.

Collecting a "regular" flat profile in these cases would hide the problem - that we loose better profile opportunities.

mtrofin · 2025-03-15T02:26:40Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-03-18T03:04:44Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-pgo

Author: Mircea Trofin (mtrofin)

Changes

Collect profiles for functions we encounter when collecting a contextual profile, that are not associated with a call site. This is expected to happen for signal handlers, but it also - problematically - currently happens for mem{memset|copy|move|set}, which are currently inserted after profile instrumentation.

Collecting a "regular" flat profile in these cases would hide the problem - that we loose better profile opportunities.

Patch is 31.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131417.diff

14 Files Affected:

(modified) compiler-rt/lib/ctx_profile/CtxInstrContextNode.h (+1)
(modified) compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp (+26-8)
(modified) compiler-rt/lib/ctx_profile/CtxInstrProfiling.h (+15)
(modified) compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp (+1-1)
(modified) compiler-rt/test/ctx_profile/TestCases/generate-context.cpp (+14-1)
(modified) llvm/include/llvm/ProfileData/CtxInstrContextNode.h (+1)
(modified) llvm/include/llvm/ProfileData/PGOCtxProfReader.h (+17-9)
(modified) llvm/include/llvm/ProfileData/PGOCtxProfWriter.h (+4-2)
(modified) llvm/lib/ProfileData/PGOCtxProfReader.cpp (+48-17)
(modified) llvm/lib/ProfileData/PGOCtxProfWriter.cpp (+23-4)
(modified) llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (+13)
(added) llvm/test/tools/llvm-ctxprof-util/Inputs/valid-with-unhandled.yaml (+26)
(modified) llvm/test/tools/llvm-ctxprof-util/llvm-ctxprof-util.test (+15-6)
(modified) llvm/unittests/ProfileData/PGOCtxProfReaderWriterTest.cpp (+7-7)

diff --git a/compiler-rt/lib/ctx_profile/CtxInstrContextNode.h b/compiler-rt/lib/ctx_profile/CtxInstrContextNode.h
index 55962df57fb58..a176662b5cb3d 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrContextNode.h
+++ b/compiler-rt/lib/ctx_profile/CtxInstrContextNode.h
@@ -121,6 +121,7 @@ class ProfileWriter {
 public:
   virtual void startContextSection() = 0;
   virtual void writeContextual(const ctx_profile::ContextNode &RootNode,
+                               const ctx_profile::ContextNode *Unhandled,
                                uint64_t TotalRootEntryCount) = 0;
   virtual void endContextSection() = 0;
 
diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
index 6ef7076d93e31..26f22926a5704 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
@@ -246,22 +246,37 @@ ContextNode *getFlatProfile(FunctionData &Data, GUID Guid,
 
 ContextNode *getUnhandledContext(FunctionData &Data, GUID Guid,
                                  uint32_t NumCounters) {
-  // 1) if we are under a root (regardless if this thread is collecting or not a
+
+  // 1) if we are currently collecting a contextual profile, fetch a ContextNode
+  // in the `Unhandled` set. We want to do this regardless of `ProfilingStarted`
+  // to (hopefully) offset the penalty of creating these contexts to before
+  // profiling.
+  //
+  // 2) if we are under a root (regardless if this thread is collecting or not a
   // contextual profile for that root), do not collect a flat profile. We want
   // to keep flat profiles only for activations that can't happen under a root,
   // to avoid confusing profiles. We can, for example, combine flattened and
   // flat profiles meaningfully, as we wouldn't double-count anything.
   //
-  // 2) to avoid lengthy startup, don't bother with flat profiles until the
+  // 3) to avoid lengthy startup, don't bother with flat profiles until the
   // profiling started. We would reset them anyway when profiling starts.
   // HOWEVER. This does lose profiling for message pumps: those functions are
   // entered once and never exit. They should be assumed to be entered before
   // profiling starts - because profiling should start after the server is up
   // and running (which is equivalent to "message pumps are set up").
-  if (IsUnderContext || !__sanitizer::atomic_load_relaxed(&ProfilingStarted))
-    return TheScratchContext;
-  return markAsScratch(
-      onContextEnter(*getFlatProfile(Data, Guid, NumCounters)));
+  ContextRoot *R = __llvm_ctx_profile_current_context_root;
+  if (!R) {
+    if (IsUnderContext || !__sanitizer::atomic_load_relaxed(&ProfilingStarted))
+      return TheScratchContext;
+    else
+      return markAsScratch(
+          onContextEnter(*getFlatProfile(Data, Guid, NumCounters)));
+  }
+  auto It = R->Unhandled.insert({Guid, nullptr});
+  if (It.second)
+    It.first->second =
+        getCallsiteSlow(Guid, &R->FirstUnhandledCalleeNode, NumCounters, 0);
+  return markAsScratch(onContextEnter(*It.first->second));
 }
 
 ContextNode *__llvm_ctx_profile_get_context(FunctionData *Data, void *Callee,
@@ -396,6 +411,8 @@ void __llvm_ctx_profile_start_collection() {
       ++NumMemUnits;
 
     resetContextNode(*Root->FirstNode);
+    if (Root->FirstUnhandledCalleeNode)
+      resetContextNode(*Root->FirstUnhandledCalleeNode);
     __sanitizer::atomic_store_relaxed(&Root->TotalEntries, 0);
   }
   __sanitizer::atomic_store_relaxed(&ProfilingStarted, true);
@@ -416,8 +433,9 @@ bool __llvm_ctx_profile_fetch(ProfileWriter &Writer) {
       __sanitizer::Printf("[ctxprof] Contextual Profile is %s\n", "invalid");
       return false;
     }
-    Writer.writeContextual(*Root->FirstNode, __sanitizer::atomic_load_relaxed(
-                                                 &Root->TotalEntries));
+    Writer.writeContextual(
+        *Root->FirstNode, Root->FirstUnhandledCalleeNode,
+        __sanitizer::atomic_load_relaxed(&Root->TotalEntries));
   }
   Writer.endContextSection();
   Writer.startFlatSection();
diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
index cdbfa571247b7..5478a3b67ac25 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
@@ -10,6 +10,7 @@
 #define CTX_PROFILE_CTXINSTRPROFILING_H_
 
 #include "CtxInstrContextNode.h"
+#include "sanitizer_common/sanitizer_dense_map.h"
 #include "sanitizer_common/sanitizer_mutex.h"
 #include <sanitizer/common_interface_defs.h>
 
@@ -83,6 +84,20 @@ struct ContextRoot {
   // Count the number of entries - regardless if we could take the `Taken` mutex
   ::__sanitizer::atomic_uint64_t TotalEntries = {};
 
+  // Profiles for functions we encounter when collecting a contexutal profile,
+  // that are not associated with a callsite. This is expected to happen for
+  // signal handlers, but it also - problematically - currently happens for
+  // mem{memset|copy|move|set}, which are currently inserted after profile
+  // instrumentation.
+  // `Unhandled` serves 2 purposes:
+  //   1. identifying such cases (like the memops)
+  //   2. collecting a profile for them, which can be at least used as a flat
+  //   profile
+  ::__sanitizer::DenseMap<GUID, ContextNode *> Unhandled;
+  // Keep the unhandled contexts in a list, as we allocate them, as it makes it
+  // simpler to send to the writer when the profile is fetched.
+  ContextNode *FirstUnhandledCalleeNode = nullptr;
+
   // Taken is used to ensure only one thread traverses the contextual graph -
   // either to read it or to write it. On server side, the same entrypoint will
   // be entered by numerous threads, but over time, the profile aggregated by
diff --git a/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp b/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp
index ccb8f0e87fcdd..83756fed0d6e6 100644
--- a/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp
+++ b/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp
@@ -240,7 +240,7 @@ TEST_F(ContextTest, Dump) {
     TestProfileWriter(ContextRoot *Root, size_t Entries)
         : Root(Root), Entries(Entries) {}
 
-    void writeContextual(const ContextNode &Node,
+    void writeContextual(const ContextNode &Node, const ContextNode *Unhandled,
                          uint64_t TotalRootEntryCount) override {
       EXPECT_EQ(TotalRootEntryCount, Entries);
       EXPECT_EQ(EnteredSectionCount, 1);
diff --git a/compiler-rt/test/ctx_profile/TestCases/generate-context.cpp b/compiler-rt/test/ctx_profile/TestCases/generate-context.cpp
index 319f17debe48f..3dc53637a35d8 100644
--- a/compiler-rt/test/ctx_profile/TestCases/generate-context.cpp
+++ b/compiler-rt/test/ctx_profile/TestCases/generate-context.cpp
@@ -5,7 +5,8 @@
 // RUN: cp %llvm_src/include/llvm/ProfileData/CtxInstrContextNode.h %t_include/
 //
 // Compile with ctx instrumentation "on". We treat "theRoot" as callgraph root.
-// RUN: %clangxx %s %ctxprofilelib -I%t_include -O2 -o %t.bin -mllvm -profile-context-root=theRoot
+// RUN: %clangxx %s %ctxprofilelib -I%t_include -O2 -o %t.bin -mllvm -profile-context-root=theRoot \
+// RUN:   -mllvm -ctx-prof-skip-callsite-instr=skip_me
 //
 // Run the binary, and observe the profile fetch handler's output.
 // RUN: %t.bin | FileCheck %s
@@ -20,11 +21,14 @@ extern "C" bool __llvm_ctx_profile_fetch(ProfileWriter &);
 
 // avoid name mangling
 extern "C" {
+__attribute__((noinline)) void skip_me() {}
+
 __attribute__((noinline)) void someFunction(int I) {
   if (I % 2)
     printf("check odd\n");
   else
     printf("check even\n");
+  skip_me();
 }
 
 // block inlining because the pre-inliner otherwise will inline this - it's
@@ -36,6 +40,7 @@ __attribute__((noinline)) void theRoot() {
   for (auto I = 0; I < 2; ++I) {
     someFunction(I);
   }
+  skip_me();
 }
 
 __attribute__((noinline)) void flatFct() {
@@ -85,9 +90,13 @@ class TestProfileWriter : public ProfileWriter {
   }
 
   void writeContextual(const ContextNode &RootNode,
+                       const ContextNode *Unhandled,
                        uint64_t EntryCount) override {
     std::cout << "Entering Root " << RootNode.guid()
               << " with total entry count " << EntryCount << std::endl;
+    for (const auto *P = Unhandled; P; P = P->next())
+      std::cout << "Unhandled GUID: " << P->guid() << " entered "
+                << P->entrycount() << " times" << std::endl;
     printProfile(RootNode, "", "");
   }
 
@@ -119,6 +128,8 @@ class TestProfileWriter : public ProfileWriter {
 // branches would be taken once, so the second counter is 1.
 // CHECK-NEXT: Entered Context Section
 // CHECK-NEXT: Entering Root 8657661246551306189 with total entry count 1
+// skip_me is entered 4 times: 3 via `someFunction`, and once from `theRoot`
+// CHECK-NEXT: Unhandled GUID: 17928815489886282963 entered 4 times
 // CHECK-NEXT: Guid: 8657661246551306189
 // CHECK-NEXT: Entries: 1
 // CHECK-NEXT: 2 counters and 3 callsites
@@ -135,6 +146,8 @@ class TestProfileWriter : public ProfileWriter {
 // CHECK-NEXT:   Counter values: 2 1
 // CHECK-NEXT: Exited Context Section
 // CHECK-NEXT: Entered Flat Section
+// This is `skip_me`. Entered 3 times via `someFunction`
+// CHECK-NEXT: Flat: 17928815489886282963 3
 // CHECK-NEXT: Flat: 6759619411192316602 3,1
 // This is flatFct (guid: 14569438697463215220)
 // CHECK-NEXT: Flat: 14569438697463215220 1,2
diff --git a/llvm/include/llvm/ProfileData/CtxInstrContextNode.h b/llvm/include/llvm/ProfileData/CtxInstrContextNode.h
index 55962df57fb58..a176662b5cb3d 100644
--- a/llvm/include/llvm/ProfileData/CtxInstrContextNode.h
+++ b/llvm/include/llvm/ProfileData/CtxInstrContextNode.h
@@ -121,6 +121,7 @@ class ProfileWriter {
 public:
   virtual void startContextSection() = 0;
   virtual void writeContextual(const ctx_profile::ContextNode &RootNode,
+                               const ctx_profile::ContextNode *Unhandled,
                                uint64_t TotalRootEntryCount) = 0;
   virtual void endContextSection() = 0;
 
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
index 65be54323998b..37c2f3d856c73 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
@@ -74,6 +74,12 @@ class IndexNode {
 };
 } // namespace internal
 
+// Setting initial capacity to 1 because all contexts must have at least 1
+// counter, and then, because all contexts belonging to a function have the same
+// size, there'll be at most one other heap allocation.
+using CtxProfFlatProfile =
+    std::map<GlobalValue::GUID, SmallVector<uint64_t, 1>>;
+
 /// A node (context) in the loaded contextual profile, suitable for mutation
 /// during IPO passes. We generally expect a fraction of counters and
 /// callsites to be populated. We continue to model counters as vectors, but
@@ -93,11 +99,16 @@ class PGOCtxProfContext final : public internal::IndexNode {
   GlobalValue::GUID GUID = 0;
   SmallVector<uint64_t, 16> Counters;
   const std::optional<uint64_t> RootEntryCount;
+  std::optional<CtxProfFlatProfile> Unhandled{};
   CallsiteMapTy Callsites;
 
-  PGOCtxProfContext(GlobalValue::GUID G, SmallVectorImpl<uint64_t> &&Counters,
-                    std::optional<uint64_t> RootEntryCount = std::nullopt)
-      : GUID(G), Counters(std::move(Counters)), RootEntryCount(RootEntryCount) {
+  PGOCtxProfContext(
+      GlobalValue::GUID G, SmallVectorImpl<uint64_t> &&Counters,
+      std::optional<uint64_t> RootEntryCount = std::nullopt,
+      std::optional<CtxProfFlatProfile> &&Unhandled = std::nullopt)
+      : GUID(G), Counters(std::move(Counters)), RootEntryCount(RootEntryCount),
+        Unhandled(std::move(Unhandled)) {
+    assert(RootEntryCount.has_value() == Unhandled.has_value());
   }
 
   Expected<PGOCtxProfContext &>
@@ -121,6 +132,8 @@ class PGOCtxProfContext final : public internal::IndexNode {
   bool isRoot() const { return RootEntryCount.has_value(); }
   uint64_t getTotalRootEntryCount() const { return RootEntryCount.value(); }
 
+  const CtxProfFlatProfile &getUnhandled() const { return Unhandled.value(); }
+
   uint64_t getEntrycount() const {
     assert(!Counters.empty() &&
            "Functions are expected to have at their entry BB instrumented, so "
@@ -170,12 +183,6 @@ class PGOCtxProfContext final : public internal::IndexNode {
   }
 };
 
-// Setting initial capacity to 1 because all contexts must have at least 1
-// counter, and then, because all contexts belonging to a function have the same
-// size, there'll be at most one other heap allocation.
-using CtxProfFlatProfile =
-    std::map<GlobalValue::GUID, SmallVector<uint64_t, 1>>;
-
 using CtxProfContextualProfiles =
     std::map<GlobalValue::GUID, PGOCtxProfContext>;
 struct PGOCtxProfile {
@@ -205,6 +212,7 @@ class PGOCtxProfileReader final {
 
   Error loadContexts(CtxProfContextualProfiles &P);
   Error loadFlatProfiles(CtxProfFlatProfile &P);
+  Error loadFlatProfileList(CtxProfFlatProfile &P);
 
 public:
   PGOCtxProfileReader(StringRef Buffer)
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfWriter.h b/llvm/include/llvm/ProfileData/PGOCtxProfWriter.h
index b2bb8fea10cfe..36ad5622a8544 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfWriter.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfWriter.h
@@ -36,7 +36,8 @@ enum PGOCtxProfileBlockIDs {
   ContextNodeBlockID = ContextRootBlockID + 1,
   FlatProfilesSectionBlockID = ContextNodeBlockID + 1,
   FlatProfileBlockID = FlatProfilesSectionBlockID + 1,
-  LAST_VALID = FlatProfileBlockID
+  UnhandledBlockID = FlatProfileBlockID + 1,
+  LAST_VALID = UnhandledBlockID
 };
 
 /// Write one or more ContextNodes to the provided raw_fd_stream.
@@ -94,6 +95,7 @@ class PGOCtxProfileWriter final : public ctx_profile::ProfileWriter {
 
   void startContextSection() override;
   void writeContextual(const ctx_profile::ContextNode &RootNode,
+                       const ctx_profile::ContextNode *Unhandled,
                        uint64_t TotalRootEntryCount) override;
   void endContextSection() override;
 
@@ -104,7 +106,7 @@ class PGOCtxProfileWriter final : public ctx_profile::ProfileWriter {
 
   // constants used in writing which a reader may find useful.
   static constexpr unsigned CodeLen = 2;
-  static constexpr uint32_t CurrentVersion = 3;
+  static constexpr uint32_t CurrentVersion = 4;
   static constexpr unsigned VBREncodingBits = 6;
   static constexpr StringRef ContainerMagic = "CTXP";
 };
diff --git a/llvm/lib/ProfileData/PGOCtxProfReader.cpp b/llvm/lib/ProfileData/PGOCtxProfReader.cpp
index f53f2956a7b7e..ec801d43c8588 100644
--- a/llvm/lib/ProfileData/PGOCtxProfReader.cpp
+++ b/llvm/lib/ProfileData/PGOCtxProfReader.cpp
@@ -97,7 +97,7 @@ PGOCtxProfileReader::readProfile(PGOCtxProfileBlockIDs Kind) {
   std::optional<SmallVector<uint64_t, 16>> Counters;
   std::optional<uint32_t> CallsiteIndex;
   std::optional<uint64_t> TotalEntryCount;
-
+  std::optional<CtxProfFlatProfile> Unhandled;
   SmallVector<uint64_t, 1> RecordValues;
 
   const bool ExpectIndex = Kind == PGOCtxProfileBlockIDs::ContextNodeBlockID;
@@ -108,14 +108,24 @@ PGOCtxProfileReader::readProfile(PGOCtxProfileBlockIDs Kind) {
   auto GotAllWeNeed = [&]() {
     return Guid.has_value() && Counters.has_value() &&
            (!ExpectIndex || CallsiteIndex.has_value()) &&
-           (!IsRoot || TotalEntryCount.has_value());
+           (!IsRoot || TotalEntryCount.has_value()) &&
+           (!IsRoot || Unhandled.has_value());
   };
+
   while (!GotAllWeNeed()) {
     RecordValues.clear();
     EXPECT_OR_RET(Entry, advance());
-    if (Entry->Kind != BitstreamEntry::Record)
+    if (Entry->Kind != BitstreamEntry::Record) {
+      if (IsRoot && Entry->Kind == BitstreamEntry::SubBlock &&
+          Entry->ID == PGOCtxProfileBlockIDs::UnhandledBlockID) {
+        RET_ON_ERR(enterBlockWithID(PGOCtxProfileBlockIDs::UnhandledBlockID));
+        Unhandled = CtxProfFlatProfile();
+        RET_ON_ERR(loadFlatProfileList(*Unhandled));
+        continue;
+      }
       return wrongValue(
           "Expected records before encountering more subcontexts");
+    }
     EXPECT_OR_RET(ReadRecord,
                   Cursor.readRecord(bitc::UNABBREV_RECORD, RecordValues));
     switch (*ReadRecord) {
@@ -152,7 +162,8 @@ PGOCtxProfileReader::readProfile(PGOCtxProfileBlockIDs Kind) {
     }
   }
 
-  PGOCtxProfContext Ret(*Guid, std::move(*Counters), TotalEntryCount);
+  PGOCtxProfContext Ret(*Guid, std::move(*Counters), TotalEntryCount,
+                        std::move(Unhandled));
 
   while (canEnterBlockWithID(PGOCtxProfileBlockIDs::ContextNodeBlockID)) {
     EXPECT_OR_RET(SC, readProfile(PGOCtxProfileBlockIDs::ContextNodeBlockID));
@@ -212,9 +223,7 @@ Error PGOCtxProfileReader::loadContexts(CtxProfContextualProfiles &P) {
   return Error::success();
 }
 
-Error PGOCtxProfileReader::loadFlatProfiles(CtxProfFlatProfile &P) {
-  RET_ON_ERR(
-      enterBlockWithID(PGOCtxProfileBlockIDs::FlatProfilesSectionBlockID));
+Error PGOCtxProfileReader::loadFlatProfileList(CtxProfFlatProfile &P) {
   while (canEnterBlockWithID(PGOCtxProfileBlockIDs::FlatProfileBlockID)) {
     EXPECT_OR_RET(E, readProfile(PGOCtxProfileBlockIDs::FlatProfileBlockID));
     auto Guid = E->second.guid();
@@ -224,6 +233,12 @@ Error PGOCtxProfileReader::loadFlatProfiles(CtxProfFlatProfile &P) {
   return Error::success();
 }
 
+Error PGOCtxProfileReader::loadFlatProfiles(CtxProfFlatProfile &P) {
+  RET_ON_ERR(
+      enterBlockWithID(PGOCtxProfileBlockIDs::FlatProfilesSectionBlockID));
+  return loadFlatProfileList(P);
+}
+
 Expected<PGOCtxProfile> PGOCtxProfileReader::loadProfiles() {
   RET_ON_ERR(readMetadata());
   PGOCtxProfile Ret;
@@ -287,10 +302,13 @@ void toYaml(yaml::Output &Out,
   Out.endSequence();
 }
 
+void toYaml(yaml::Output &Out, const CtxProfFlatProfile &Flat);
+
 void toYaml(yaml::Output &Out, GlobalValue::GUID Guid,
             const SmallVectorImpl<uint64_t> &Counters,
             const PGOCtxProfContext::CallsiteMapTy &Callsites,
-            std::optional<uint64_t> TotalRootEntryCount = std::nullopt) {
+            std::optional<uint64_t> TotalRootEntryCount = std::nullopt,
+            CtxProfFlatProfile Unhandled = {}) {
   yaml::EmptyContext Empty;
   Out.beginMapping();
   void *SaveInfo = nullptr;
@@ -318,6 +336,14 @@ void toYaml(yaml::Output &Out, GlobalValue::GUID Guid,
     Out.endFlowSequence();
     Out.postflightKey(nullptr);
   }
+
+  if (!Unhandled.empty()) {
+    assert(TotalRootEntryCount.has_value());
+    Out.preflightKey("Unhandled", false, false, UseDefault, SaveInfo);
+    toYaml(Out, Unhandled);
+    Out.postflightKey(nullptr);
+  }
+
   if (!Callsites.empty()) {
     Out.preflightKey("Callsites", true, false, UseDefault, SaveInfo);
     toYaml(Out, Callsites);
@@ -326,10 +352,22 @@ void toYaml(yaml::Output &Out, GlobalValue::GUID Guid,
   Out.endMapping();
 }
 
+void toYaml(yaml::Output &Out, const CtxProfFlatProfile &Flat) {
+  void *SaveInfo = nullptr;
+  Out.beginSequence();
+  size_t ElemID = 0;
+  for (const auto &[Guid, Counters] : Flat) {
+    Out.preflightElement(ElemID++, SaveInfo);
+    toYaml(Out, Guid, Counters, {});
+    Out.postflightElement(nullptr);
+  }
+  Out.endSequence();
+}
+
 void toYaml(yaml::Output &Out, const PGOCtxProfContext &Ctx) {
   if (Ctx.isRoot())
     toYaml(Out, Ctx.guid(), Ctx.counters(), Ctx.callsites(),
-           Ctx.getTotalRootEntryCount());
+           Ctx.getTotalRootEntryCount(), Ctx.getUnhandled());
   else
     toYaml(Out, Ctx.guid(), Ctx.counters(), Ctx.callsites());
 }
@@ -348,14 +386,7 @@ void llvm::convertCtxProfToYaml(raw_ostream &OS, const PGOCtxProfile &Profile) {
   }
   if (!Profile.FlatProfiles.empty()) {
     Out.preflightKey("FlatProfiles", false, false, UseDefault, SaveInfo);
-    Out.beginSequence();
-    size_t ElemID = 0;
-    for (const auto &[Guid, Counters] : Profile.FlatProfiles) {
-      Out.preflightElement(ElemID++, SaveInfo);
-      toYaml(Out, Guid, Counters, {});
-      Out.postflightElement(nullptr);
-    }
-    Out.endSequence();
+    toYaml(Out, Profile.FlatProfiles);
     Out.postflightKey(nullptr);
   }
   Out.endMapping();
diff --git a/llvm/lib/ProfileData/PGOCtxProfWriter.cpp b/llvm/lib/ProfileData/PGOCtxProfWriter.cpp
index 910842647591d..fe41438b7d0a7 100644
--- a/llvm/li...
[truncated]

compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp

compiler-rt/lib/ctx_profile/CtxInstrProfiling.h

snehasish · 2025-03-18T04:08:36Z

llvm/lib/ProfileData/PGOCtxProfWriter.cpp

+      for (const auto &U : DC.Unhandled) {
+        SerializableCtxRepresentation Unhandled;
+        Unhandled.Guid = U.first;
+        Unhandled.Counters.insert(Unhandled.Counters.begin(), U.second.begin(),


Repeatedly inserting in the front is inefficient. How about reserve(size), append and then reverse the vector?

wdym. this copies the counter values from U.second to Unhandled. insert with a first/last iterator pair will - afaik - allocate the diff (last - first) (caveats about difference being computable, but the source is also a vector). Also why would I reverse?

There is a more efficient alternative - having a createNode that takes the guid and the counters separately - but not quite worth it for the yaml converter, it's a test utility.

Since they are both vectors, the allocation is optional (depending on the capacity of Unhandled.Counters) but there will always be a copy to shift the original elements to the right. This is what I was referring to as inefficient. But since you mention this is a utility for a test, it doesn't matter much.

there are no original elements tho... maybe this is clearer as assignment

snehasish

lgtm

snehasish · 2025-03-18T16:32:30Z

llvm/lib/ProfileData/PGOCtxProfWriter.cpp

+      for (const auto &U : DC.Unhandled) {
+        SerializableCtxRepresentation Unhandled;
+        Unhandled.Guid = U.first;
+        Unhandled.Counters.insert(Unhandled.Counters.begin(), U.second.begin(),


Since they are both vectors, the allocation is optional (depending on the capacity of Unhandled.Counters) but there will always be a copy to shift the original elements to the right. This is what I was referring to as inefficient. But since you mention this is a utility for a test, it doesn't matter much.

mtrofin · 2025-03-19T20:49:39Z

Merge activity

Mar 19, 4:49 PM EDT: A user started a stack merge that includes this pull request via Graphite.
Mar 19, 4:51 PM EDT: A user merged this pull request with Graphite.

mtrofin mentioned this pull request Mar 15, 2025

[ctxprof] Capture sampling info for context roots #131201

Merged

mtrofin mentioned this pull request Mar 15, 2025

[ctxprof] Make ContextRoot an implementation detail #131416

Merged

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 95e5589 to 9f54f19 Compare March 15, 2025 03:28

mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch 2 times, most recently from f671b9b to e6d651d Compare March 15, 2025 04:04

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 9f54f19 to 2712325 Compare March 15, 2025 04:04

mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from e6d651d to 2b2597a Compare March 15, 2025 04:11

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 2712325 to 37fbe76 Compare March 15, 2025 04:12

mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from 2b2597a to 551cb1a Compare March 15, 2025 05:03

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch 2 times, most recently from 0421249 to 5916e7d Compare March 15, 2025 05:32

mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from 551cb1a to 94a272a Compare March 15, 2025 05:32

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 5916e7d to debbc1f Compare March 17, 2025 15:04

mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from 94a272a to 35e3208 Compare March 17, 2025 15:04

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from debbc1f to 0842b89 Compare March 18, 2025 02:32

mtrofin marked this pull request as ready for review March 18, 2025 03:04

llvmbot added compiler-rt PGO Profile Guided Optimizations llvm:transforms labels Mar 18, 2025

mtrofin requested review from kazutakahirata, snehasish and teresajohnson March 18, 2025 03:04

snehasish reviewed Mar 18, 2025

View reviewed changes

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 0842b89 to 82e70ee Compare March 18, 2025 05:08

snehasish approved these changes Mar 18, 2025

View reviewed changes

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch 2 times, most recently from ac74a3b to 881ef87 Compare March 18, 2025 21:15

mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from 35e3208 to 64ae963 Compare March 18, 2025 21:15

Base automatically changed from users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail to main March 19, 2025 05:03

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch 2 times, most recently from 4ccf0b9 to 43a2e22 Compare March 19, 2025 05:17

[ctxprof] Track unhandled call targets

ff1615a

mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 43a2e22 to ff1615a Compare March 19, 2025 15:39

mtrofin merged commit 0668bb2 into main Mar 19, 2025
11 checks passed

mtrofin deleted the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch March 19, 2025 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ctxprof] Track unhandled call targets #131417

[ctxprof] Track unhandled call targets #131417

Uh oh!

mtrofin commented Mar 15, 2025 •

edited

Loading

Uh oh!

mtrofin commented Mar 15, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snehasish Mar 18, 2025

Uh oh!

mtrofin Mar 18, 2025

Uh oh!

snehasish Mar 18, 2025

Uh oh!

mtrofin Mar 18, 2025

Uh oh!

snehasish left a comment

Uh oh!

snehasish Mar 18, 2025

Uh oh!

mtrofin commented Mar 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[ctxprof] Track unhandled call targets #131417

[ctxprof] Track unhandled call targets #131417

Uh oh!

Conversation

mtrofin commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mtrofin commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snehasish Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

mtrofin Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

snehasish Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

mtrofin Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

snehasish left a comment

Choose a reason for hiding this comment

Uh oh!

snehasish Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

mtrofin commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Uh oh!

mtrofin commented Mar 15, 2025 •

edited

Loading

mtrofin commented Mar 15, 2025 •

edited

Loading

llvmbot commented Mar 18, 2025 •

edited

Loading

mtrofin commented Mar 19, 2025 •

edited

Loading