Skip to content

[ctxprof] Track unhandled call targets #131417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mtrofin
Copy link
Member

@mtrofin mtrofin commented Mar 15, 2025

Collect profiles for functions we encounter when collecting a contextual profile, that are not associated with a call site. This is expected to happen for signal handlers, but it also - problematically - currently happens for mem{memset|copy|move|set}, which are currently inserted after profile instrumentation.

Collecting a "regular" flat profile in these cases would hide the problem - that we loose better profile opportunities.

Copy link
Member Author

mtrofin commented Mar 15, 2025

@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 95e5589 to 9f54f19 Compare March 15, 2025 03:28
@mtrofin mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch 2 times, most recently from f671b9b to e6d651d Compare March 15, 2025 04:04
@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 9f54f19 to 2712325 Compare March 15, 2025 04:04
@mtrofin mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from e6d651d to 2b2597a Compare March 15, 2025 04:11
@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 2712325 to 37fbe76 Compare March 15, 2025 04:12
@mtrofin mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from 2b2597a to 551cb1a Compare March 15, 2025 05:03
@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch 2 times, most recently from 0421249 to 5916e7d Compare March 15, 2025 05:32
@mtrofin mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from 551cb1a to 94a272a Compare March 15, 2025 05:32
@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 5916e7d to debbc1f Compare March 17, 2025 15:04
@mtrofin mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from 94a272a to 35e3208 Compare March 17, 2025 15:04
@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from debbc1f to 0842b89 Compare March 18, 2025 02:32
@mtrofin mtrofin marked this pull request as ready for review March 18, 2025 03:04
@llvmbot llvmbot added compiler-rt PGO Profile Guided Optimizations llvm:transforms labels Mar 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-pgo

Author: Mircea Trofin (mtrofin)

Changes

Collect profiles for functions we encounter when collecting a contextual profile, that are not associated with a call site. This is expected to happen for signal handlers, but it also - problematically - currently happens for mem{memset|copy|move|set}, which are currently inserted after profile instrumentation.

Collecting a "regular" flat profile in these cases would hide the problem - that we loose better profile opportunities.


Patch is 31.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131417.diff

14 Files Affected:

  • (modified) compiler-rt/lib/ctx_profile/CtxInstrContextNode.h (+1)
  • (modified) compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp (+26-8)
  • (modified) compiler-rt/lib/ctx_profile/CtxInstrProfiling.h (+15)
  • (modified) compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp (+1-1)
  • (modified) compiler-rt/test/ctx_profile/TestCases/generate-context.cpp (+14-1)
  • (modified) llvm/include/llvm/ProfileData/CtxInstrContextNode.h (+1)
  • (modified) llvm/include/llvm/ProfileData/PGOCtxProfReader.h (+17-9)
  • (modified) llvm/include/llvm/ProfileData/PGOCtxProfWriter.h (+4-2)
  • (modified) llvm/lib/ProfileData/PGOCtxProfReader.cpp (+48-17)
  • (modified) llvm/lib/ProfileData/PGOCtxProfWriter.cpp (+23-4)
  • (modified) llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (+13)
  • (added) llvm/test/tools/llvm-ctxprof-util/Inputs/valid-with-unhandled.yaml (+26)
  • (modified) llvm/test/tools/llvm-ctxprof-util/llvm-ctxprof-util.test (+15-6)
  • (modified) llvm/unittests/ProfileData/PGOCtxProfReaderWriterTest.cpp (+7-7)
diff --git a/compiler-rt/lib/ctx_profile/CtxInstrContextNode.h b/compiler-rt/lib/ctx_profile/CtxInstrContextNode.h
index 55962df57fb58..a176662b5cb3d 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrContextNode.h
+++ b/compiler-rt/lib/ctx_profile/CtxInstrContextNode.h
@@ -121,6 +121,7 @@ class ProfileWriter {
 public:
   virtual void startContextSection() = 0;
   virtual void writeContextual(const ctx_profile::ContextNode &RootNode,
+                               const ctx_profile::ContextNode *Unhandled,
                                uint64_t TotalRootEntryCount) = 0;
   virtual void endContextSection() = 0;
 
diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
index 6ef7076d93e31..26f22926a5704 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
@@ -246,22 +246,37 @@ ContextNode *getFlatProfile(FunctionData &Data, GUID Guid,
 
 ContextNode *getUnhandledContext(FunctionData &Data, GUID Guid,
                                  uint32_t NumCounters) {
-  // 1) if we are under a root (regardless if this thread is collecting or not a
+
+  // 1) if we are currently collecting a contextual profile, fetch a ContextNode
+  // in the `Unhandled` set. We want to do this regardless of `ProfilingStarted`
+  // to (hopefully) offset the penalty of creating these contexts to before
+  // profiling.
+  //
+  // 2) if we are under a root (regardless if this thread is collecting or not a
   // contextual profile for that root), do not collect a flat profile. We want
   // to keep flat profiles only for activations that can't happen under a root,
   // to avoid confusing profiles. We can, for example, combine flattened and
   // flat profiles meaningfully, as we wouldn't double-count anything.
   //
-  // 2) to avoid lengthy startup, don't bother with flat profiles until the
+  // 3) to avoid lengthy startup, don't bother with flat profiles until the
   // profiling started. We would reset them anyway when profiling starts.
   // HOWEVER. This does lose profiling for message pumps: those functions are
   // entered once and never exit. They should be assumed to be entered before
   // profiling starts - because profiling should start after the server is up
   // and running (which is equivalent to "message pumps are set up").
-  if (IsUnderContext || !__sanitizer::atomic_load_relaxed(&ProfilingStarted))
-    return TheScratchContext;
-  return markAsScratch(
-      onContextEnter(*getFlatProfile(Data, Guid, NumCounters)));
+  ContextRoot *R = __llvm_ctx_profile_current_context_root;
+  if (!R) {
+    if (IsUnderContext || !__sanitizer::atomic_load_relaxed(&ProfilingStarted))
+      return TheScratchContext;
+    else
+      return markAsScratch(
+          onContextEnter(*getFlatProfile(Data, Guid, NumCounters)));
+  }
+  auto It = R->Unhandled.insert({Guid, nullptr});
+  if (It.second)
+    It.first->second =
+        getCallsiteSlow(Guid, &R->FirstUnhandledCalleeNode, NumCounters, 0);
+  return markAsScratch(onContextEnter(*It.first->second));
 }
 
 ContextNode *__llvm_ctx_profile_get_context(FunctionData *Data, void *Callee,
@@ -396,6 +411,8 @@ void __llvm_ctx_profile_start_collection() {
       ++NumMemUnits;
 
     resetContextNode(*Root->FirstNode);
+    if (Root->FirstUnhandledCalleeNode)
+      resetContextNode(*Root->FirstUnhandledCalleeNode);
     __sanitizer::atomic_store_relaxed(&Root->TotalEntries, 0);
   }
   __sanitizer::atomic_store_relaxed(&ProfilingStarted, true);
@@ -416,8 +433,9 @@ bool __llvm_ctx_profile_fetch(ProfileWriter &Writer) {
       __sanitizer::Printf("[ctxprof] Contextual Profile is %s\n", "invalid");
       return false;
     }
-    Writer.writeContextual(*Root->FirstNode, __sanitizer::atomic_load_relaxed(
-                                                 &Root->TotalEntries));
+    Writer.writeContextual(
+        *Root->FirstNode, Root->FirstUnhandledCalleeNode,
+        __sanitizer::atomic_load_relaxed(&Root->TotalEntries));
   }
   Writer.endContextSection();
   Writer.startFlatSection();
diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
index cdbfa571247b7..5478a3b67ac25 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
@@ -10,6 +10,7 @@
 #define CTX_PROFILE_CTXINSTRPROFILING_H_
 
 #include "CtxInstrContextNode.h"
+#include "sanitizer_common/sanitizer_dense_map.h"
 #include "sanitizer_common/sanitizer_mutex.h"
 #include <sanitizer/common_interface_defs.h>
 
@@ -83,6 +84,20 @@ struct ContextRoot {
   // Count the number of entries - regardless if we could take the `Taken` mutex
   ::__sanitizer::atomic_uint64_t TotalEntries = {};
 
+  // Profiles for functions we encounter when collecting a contexutal profile,
+  // that are not associated with a callsite. This is expected to happen for
+  // signal handlers, but it also - problematically - currently happens for
+  // mem{memset|copy|move|set}, which are currently inserted after profile
+  // instrumentation.
+  // `Unhandled` serves 2 purposes:
+  //   1. identifying such cases (like the memops)
+  //   2. collecting a profile for them, which can be at least used as a flat
+  //   profile
+  ::__sanitizer::DenseMap<GUID, ContextNode *> Unhandled;
+  // Keep the unhandled contexts in a list, as we allocate them, as it makes it
+  // simpler to send to the writer when the profile is fetched.
+  ContextNode *FirstUnhandledCalleeNode = nullptr;
+
   // Taken is used to ensure only one thread traverses the contextual graph -
   // either to read it or to write it. On server side, the same entrypoint will
   // be entered by numerous threads, but over time, the profile aggregated by
diff --git a/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp b/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp
index ccb8f0e87fcdd..83756fed0d6e6 100644
--- a/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp
+++ b/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp
@@ -240,7 +240,7 @@ TEST_F(ContextTest, Dump) {
     TestProfileWriter(ContextRoot *Root, size_t Entries)
         : Root(Root), Entries(Entries) {}
 
-    void writeContextual(const ContextNode &Node,
+    void writeContextual(const ContextNode &Node, const ContextNode *Unhandled,
                          uint64_t TotalRootEntryCount) override {
       EXPECT_EQ(TotalRootEntryCount, Entries);
       EXPECT_EQ(EnteredSectionCount, 1);
diff --git a/compiler-rt/test/ctx_profile/TestCases/generate-context.cpp b/compiler-rt/test/ctx_profile/TestCases/generate-context.cpp
index 319f17debe48f..3dc53637a35d8 100644
--- a/compiler-rt/test/ctx_profile/TestCases/generate-context.cpp
+++ b/compiler-rt/test/ctx_profile/TestCases/generate-context.cpp
@@ -5,7 +5,8 @@
 // RUN: cp %llvm_src/include/llvm/ProfileData/CtxInstrContextNode.h %t_include/
 //
 // Compile with ctx instrumentation "on". We treat "theRoot" as callgraph root.
-// RUN: %clangxx %s %ctxprofilelib -I%t_include -O2 -o %t.bin -mllvm -profile-context-root=theRoot
+// RUN: %clangxx %s %ctxprofilelib -I%t_include -O2 -o %t.bin -mllvm -profile-context-root=theRoot \
+// RUN:   -mllvm -ctx-prof-skip-callsite-instr=skip_me
 //
 // Run the binary, and observe the profile fetch handler's output.
 // RUN: %t.bin | FileCheck %s
@@ -20,11 +21,14 @@ extern "C" bool __llvm_ctx_profile_fetch(ProfileWriter &);
 
 // avoid name mangling
 extern "C" {
+__attribute__((noinline)) void skip_me() {}
+
 __attribute__((noinline)) void someFunction(int I) {
   if (I % 2)
     printf("check odd\n");
   else
     printf("check even\n");
+  skip_me();
 }
 
 // block inlining because the pre-inliner otherwise will inline this - it's
@@ -36,6 +40,7 @@ __attribute__((noinline)) void theRoot() {
   for (auto I = 0; I < 2; ++I) {
     someFunction(I);
   }
+  skip_me();
 }
 
 __attribute__((noinline)) void flatFct() {
@@ -85,9 +90,13 @@ class TestProfileWriter : public ProfileWriter {
   }
 
   void writeContextual(const ContextNode &RootNode,
+                       const ContextNode *Unhandled,
                        uint64_t EntryCount) override {
     std::cout << "Entering Root " << RootNode.guid()
               << " with total entry count " << EntryCount << std::endl;
+    for (const auto *P = Unhandled; P; P = P->next())
+      std::cout << "Unhandled GUID: " << P->guid() << " entered "
+                << P->entrycount() << " times" << std::endl;
     printProfile(RootNode, "", "");
   }
 
@@ -119,6 +128,8 @@ class TestProfileWriter : public ProfileWriter {
 // branches would be taken once, so the second counter is 1.
 // CHECK-NEXT: Entered Context Section
 // CHECK-NEXT: Entering Root 8657661246551306189 with total entry count 1
+// skip_me is entered 4 times: 3 via `someFunction`, and once from `theRoot`
+// CHECK-NEXT: Unhandled GUID: 17928815489886282963 entered 4 times
 // CHECK-NEXT: Guid: 8657661246551306189
 // CHECK-NEXT: Entries: 1
 // CHECK-NEXT: 2 counters and 3 callsites
@@ -135,6 +146,8 @@ class TestProfileWriter : public ProfileWriter {
 // CHECK-NEXT:   Counter values: 2 1
 // CHECK-NEXT: Exited Context Section
 // CHECK-NEXT: Entered Flat Section
+// This is `skip_me`. Entered 3 times via `someFunction`
+// CHECK-NEXT: Flat: 17928815489886282963 3
 // CHECK-NEXT: Flat: 6759619411192316602 3,1
 // This is flatFct (guid: 14569438697463215220)
 // CHECK-NEXT: Flat: 14569438697463215220 1,2
diff --git a/llvm/include/llvm/ProfileData/CtxInstrContextNode.h b/llvm/include/llvm/ProfileData/CtxInstrContextNode.h
index 55962df57fb58..a176662b5cb3d 100644
--- a/llvm/include/llvm/ProfileData/CtxInstrContextNode.h
+++ b/llvm/include/llvm/ProfileData/CtxInstrContextNode.h
@@ -121,6 +121,7 @@ class ProfileWriter {
 public:
   virtual void startContextSection() = 0;
   virtual void writeContextual(const ctx_profile::ContextNode &RootNode,
+                               const ctx_profile::ContextNode *Unhandled,
                                uint64_t TotalRootEntryCount) = 0;
   virtual void endContextSection() = 0;
 
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
index 65be54323998b..37c2f3d856c73 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
@@ -74,6 +74,12 @@ class IndexNode {
 };
 } // namespace internal
 
+// Setting initial capacity to 1 because all contexts must have at least 1
+// counter, and then, because all contexts belonging to a function have the same
+// size, there'll be at most one other heap allocation.
+using CtxProfFlatProfile =
+    std::map<GlobalValue::GUID, SmallVector<uint64_t, 1>>;
+
 /// A node (context) in the loaded contextual profile, suitable for mutation
 /// during IPO passes. We generally expect a fraction of counters and
 /// callsites to be populated. We continue to model counters as vectors, but
@@ -93,11 +99,16 @@ class PGOCtxProfContext final : public internal::IndexNode {
   GlobalValue::GUID GUID = 0;
   SmallVector<uint64_t, 16> Counters;
   const std::optional<uint64_t> RootEntryCount;
+  std::optional<CtxProfFlatProfile> Unhandled{};
   CallsiteMapTy Callsites;
 
-  PGOCtxProfContext(GlobalValue::GUID G, SmallVectorImpl<uint64_t> &&Counters,
-                    std::optional<uint64_t> RootEntryCount = std::nullopt)
-      : GUID(G), Counters(std::move(Counters)), RootEntryCount(RootEntryCount) {
+  PGOCtxProfContext(
+      GlobalValue::GUID G, SmallVectorImpl<uint64_t> &&Counters,
+      std::optional<uint64_t> RootEntryCount = std::nullopt,
+      std::optional<CtxProfFlatProfile> &&Unhandled = std::nullopt)
+      : GUID(G), Counters(std::move(Counters)), RootEntryCount(RootEntryCount),
+        Unhandled(std::move(Unhandled)) {
+    assert(RootEntryCount.has_value() == Unhandled.has_value());
   }
 
   Expected<PGOCtxProfContext &>
@@ -121,6 +132,8 @@ class PGOCtxProfContext final : public internal::IndexNode {
   bool isRoot() const { return RootEntryCount.has_value(); }
   uint64_t getTotalRootEntryCount() const { return RootEntryCount.value(); }
 
+  const CtxProfFlatProfile &getUnhandled() const { return Unhandled.value(); }
+
   uint64_t getEntrycount() const {
     assert(!Counters.empty() &&
            "Functions are expected to have at their entry BB instrumented, so "
@@ -170,12 +183,6 @@ class PGOCtxProfContext final : public internal::IndexNode {
   }
 };
 
-// Setting initial capacity to 1 because all contexts must have at least 1
-// counter, and then, because all contexts belonging to a function have the same
-// size, there'll be at most one other heap allocation.
-using CtxProfFlatProfile =
-    std::map<GlobalValue::GUID, SmallVector<uint64_t, 1>>;
-
 using CtxProfContextualProfiles =
     std::map<GlobalValue::GUID, PGOCtxProfContext>;
 struct PGOCtxProfile {
@@ -205,6 +212,7 @@ class PGOCtxProfileReader final {
 
   Error loadContexts(CtxProfContextualProfiles &P);
   Error loadFlatProfiles(CtxProfFlatProfile &P);
+  Error loadFlatProfileList(CtxProfFlatProfile &P);
 
 public:
   PGOCtxProfileReader(StringRef Buffer)
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfWriter.h b/llvm/include/llvm/ProfileData/PGOCtxProfWriter.h
index b2bb8fea10cfe..36ad5622a8544 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfWriter.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfWriter.h
@@ -36,7 +36,8 @@ enum PGOCtxProfileBlockIDs {
   ContextNodeBlockID = ContextRootBlockID + 1,
   FlatProfilesSectionBlockID = ContextNodeBlockID + 1,
   FlatProfileBlockID = FlatProfilesSectionBlockID + 1,
-  LAST_VALID = FlatProfileBlockID
+  UnhandledBlockID = FlatProfileBlockID + 1,
+  LAST_VALID = UnhandledBlockID
 };
 
 /// Write one or more ContextNodes to the provided raw_fd_stream.
@@ -94,6 +95,7 @@ class PGOCtxProfileWriter final : public ctx_profile::ProfileWriter {
 
   void startContextSection() override;
   void writeContextual(const ctx_profile::ContextNode &RootNode,
+                       const ctx_profile::ContextNode *Unhandled,
                        uint64_t TotalRootEntryCount) override;
   void endContextSection() override;
 
@@ -104,7 +106,7 @@ class PGOCtxProfileWriter final : public ctx_profile::ProfileWriter {
 
   // constants used in writing which a reader may find useful.
   static constexpr unsigned CodeLen = 2;
-  static constexpr uint32_t CurrentVersion = 3;
+  static constexpr uint32_t CurrentVersion = 4;
   static constexpr unsigned VBREncodingBits = 6;
   static constexpr StringRef ContainerMagic = "CTXP";
 };
diff --git a/llvm/lib/ProfileData/PGOCtxProfReader.cpp b/llvm/lib/ProfileData/PGOCtxProfReader.cpp
index f53f2956a7b7e..ec801d43c8588 100644
--- a/llvm/lib/ProfileData/PGOCtxProfReader.cpp
+++ b/llvm/lib/ProfileData/PGOCtxProfReader.cpp
@@ -97,7 +97,7 @@ PGOCtxProfileReader::readProfile(PGOCtxProfileBlockIDs Kind) {
   std::optional<SmallVector<uint64_t, 16>> Counters;
   std::optional<uint32_t> CallsiteIndex;
   std::optional<uint64_t> TotalEntryCount;
-
+  std::optional<CtxProfFlatProfile> Unhandled;
   SmallVector<uint64_t, 1> RecordValues;
 
   const bool ExpectIndex = Kind == PGOCtxProfileBlockIDs::ContextNodeBlockID;
@@ -108,14 +108,24 @@ PGOCtxProfileReader::readProfile(PGOCtxProfileBlockIDs Kind) {
   auto GotAllWeNeed = [&]() {
     return Guid.has_value() && Counters.has_value() &&
            (!ExpectIndex || CallsiteIndex.has_value()) &&
-           (!IsRoot || TotalEntryCount.has_value());
+           (!IsRoot || TotalEntryCount.has_value()) &&
+           (!IsRoot || Unhandled.has_value());
   };
+
   while (!GotAllWeNeed()) {
     RecordValues.clear();
     EXPECT_OR_RET(Entry, advance());
-    if (Entry->Kind != BitstreamEntry::Record)
+    if (Entry->Kind != BitstreamEntry::Record) {
+      if (IsRoot && Entry->Kind == BitstreamEntry::SubBlock &&
+          Entry->ID == PGOCtxProfileBlockIDs::UnhandledBlockID) {
+        RET_ON_ERR(enterBlockWithID(PGOCtxProfileBlockIDs::UnhandledBlockID));
+        Unhandled = CtxProfFlatProfile();
+        RET_ON_ERR(loadFlatProfileList(*Unhandled));
+        continue;
+      }
       return wrongValue(
           "Expected records before encountering more subcontexts");
+    }
     EXPECT_OR_RET(ReadRecord,
                   Cursor.readRecord(bitc::UNABBREV_RECORD, RecordValues));
     switch (*ReadRecord) {
@@ -152,7 +162,8 @@ PGOCtxProfileReader::readProfile(PGOCtxProfileBlockIDs Kind) {
     }
   }
 
-  PGOCtxProfContext Ret(*Guid, std::move(*Counters), TotalEntryCount);
+  PGOCtxProfContext Ret(*Guid, std::move(*Counters), TotalEntryCount,
+                        std::move(Unhandled));
 
   while (canEnterBlockWithID(PGOCtxProfileBlockIDs::ContextNodeBlockID)) {
     EXPECT_OR_RET(SC, readProfile(PGOCtxProfileBlockIDs::ContextNodeBlockID));
@@ -212,9 +223,7 @@ Error PGOCtxProfileReader::loadContexts(CtxProfContextualProfiles &P) {
   return Error::success();
 }
 
-Error PGOCtxProfileReader::loadFlatProfiles(CtxProfFlatProfile &P) {
-  RET_ON_ERR(
-      enterBlockWithID(PGOCtxProfileBlockIDs::FlatProfilesSectionBlockID));
+Error PGOCtxProfileReader::loadFlatProfileList(CtxProfFlatProfile &P) {
   while (canEnterBlockWithID(PGOCtxProfileBlockIDs::FlatProfileBlockID)) {
     EXPECT_OR_RET(E, readProfile(PGOCtxProfileBlockIDs::FlatProfileBlockID));
     auto Guid = E->second.guid();
@@ -224,6 +233,12 @@ Error PGOCtxProfileReader::loadFlatProfiles(CtxProfFlatProfile &P) {
   return Error::success();
 }
 
+Error PGOCtxProfileReader::loadFlatProfiles(CtxProfFlatProfile &P) {
+  RET_ON_ERR(
+      enterBlockWithID(PGOCtxProfileBlockIDs::FlatProfilesSectionBlockID));
+  return loadFlatProfileList(P);
+}
+
 Expected<PGOCtxProfile> PGOCtxProfileReader::loadProfiles() {
   RET_ON_ERR(readMetadata());
   PGOCtxProfile Ret;
@@ -287,10 +302,13 @@ void toYaml(yaml::Output &Out,
   Out.endSequence();
 }
 
+void toYaml(yaml::Output &Out, const CtxProfFlatProfile &Flat);
+
 void toYaml(yaml::Output &Out, GlobalValue::GUID Guid,
             const SmallVectorImpl<uint64_t> &Counters,
             const PGOCtxProfContext::CallsiteMapTy &Callsites,
-            std::optional<uint64_t> TotalRootEntryCount = std::nullopt) {
+            std::optional<uint64_t> TotalRootEntryCount = std::nullopt,
+            CtxProfFlatProfile Unhandled = {}) {
   yaml::EmptyContext Empty;
   Out.beginMapping();
   void *SaveInfo = nullptr;
@@ -318,6 +336,14 @@ void toYaml(yaml::Output &Out, GlobalValue::GUID Guid,
     Out.endFlowSequence();
     Out.postflightKey(nullptr);
   }
+
+  if (!Unhandled.empty()) {
+    assert(TotalRootEntryCount.has_value());
+    Out.preflightKey("Unhandled", false, false, UseDefault, SaveInfo);
+    toYaml(Out, Unhandled);
+    Out.postflightKey(nullptr);
+  }
+
   if (!Callsites.empty()) {
     Out.preflightKey("Callsites", true, false, UseDefault, SaveInfo);
     toYaml(Out, Callsites);
@@ -326,10 +352,22 @@ void toYaml(yaml::Output &Out, GlobalValue::GUID Guid,
   Out.endMapping();
 }
 
+void toYaml(yaml::Output &Out, const CtxProfFlatProfile &Flat) {
+  void *SaveInfo = nullptr;
+  Out.beginSequence();
+  size_t ElemID = 0;
+  for (const auto &[Guid, Counters] : Flat) {
+    Out.preflightElement(ElemID++, SaveInfo);
+    toYaml(Out, Guid, Counters, {});
+    Out.postflightElement(nullptr);
+  }
+  Out.endSequence();
+}
+
 void toYaml(yaml::Output &Out, const PGOCtxProfContext &Ctx) {
   if (Ctx.isRoot())
     toYaml(Out, Ctx.guid(), Ctx.counters(), Ctx.callsites(),
-           Ctx.getTotalRootEntryCount());
+           Ctx.getTotalRootEntryCount(), Ctx.getUnhandled());
   else
     toYaml(Out, Ctx.guid(), Ctx.counters(), Ctx.callsites());
 }
@@ -348,14 +386,7 @@ void llvm::convertCtxProfToYaml(raw_ostream &OS, const PGOCtxProfile &Profile) {
   }
   if (!Profile.FlatProfiles.empty()) {
     Out.preflightKey("FlatProfiles", false, false, UseDefault, SaveInfo);
-    Out.beginSequence();
-    size_t ElemID = 0;
-    for (const auto &[Guid, Counters] : Profile.FlatProfiles) {
-      Out.preflightElement(ElemID++, SaveInfo);
-      toYaml(Out, Guid, Counters, {});
-      Out.postflightElement(nullptr);
-    }
-    Out.endSequence();
+    toYaml(Out, Profile.FlatProfiles);
     Out.postflightKey(nullptr);
   }
   Out.endMapping();
diff --git a/llvm/lib/ProfileData/PGOCtxProfWriter.cpp b/llvm/lib/ProfileData/PGOCtxProfWriter.cpp
index 910842647591d..fe41438b7d0a7 100644
--- a/llvm/li...
[truncated]

for (const auto &U : DC.Unhandled) {
SerializableCtxRepresentation Unhandled;
Unhandled.Guid = U.first;
Unhandled.Counters.insert(Unhandled.Counters.begin(), U.second.begin(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeatedly inserting in the front is inefficient. How about reserve(size), append and then reverse the vector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym. this copies the counter values from U.second to Unhandled. insert with a first/last iterator pair will - afaik - allocate the diff (last - first) (caveats about difference being computable, but the source is also a vector). Also why would I reverse?

There is a more efficient alternative - having a createNode that takes the guid and the counters separately - but not quite worth it for the yaml converter, it's a test utility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they are both vectors, the allocation is optional (depending on the capacity of Unhandled.Counters) but there will always be a copy to shift the original elements to the right. This is what I was referring to as inefficient. But since you mention this is a utility for a test, it doesn't matter much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no original elements tho... maybe this is clearer as assignment

@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 0842b89 to 82e70ee Compare March 18, 2025 05:08
Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

for (const auto &U : DC.Unhandled) {
SerializableCtxRepresentation Unhandled;
Unhandled.Guid = U.first;
Unhandled.Counters.insert(Unhandled.Counters.begin(), U.second.begin(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they are both vectors, the allocation is optional (depending on the capacity of Unhandled.Counters) but there will always be a copy to shift the original elements to the right. This is what I was referring to as inefficient. But since you mention this is a utility for a test, it doesn't matter much.

@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch 2 times, most recently from ac74a3b to 881ef87 Compare March 18, 2025 21:15
@mtrofin mtrofin force-pushed the users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail branch from 35e3208 to 64ae963 Compare March 18, 2025 21:15
Base automatically changed from users/mtrofin/03-13-_ctxprof_make_contextroot_an_implementation_detail to main March 19, 2025 05:03
@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch 2 times, most recently from 4ccf0b9 to 43a2e22 Compare March 19, 2025 05:17
@mtrofin mtrofin force-pushed the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch from 43a2e22 to ff1615a Compare March 19, 2025 15:39
Copy link
Member Author

mtrofin commented Mar 19, 2025

Merge activity

  • Mar 19, 4:49 PM EDT: A user started a stack merge that includes this pull request via Graphite.
  • Mar 19, 4:51 PM EDT: A user merged this pull request with Graphite.

@mtrofin mtrofin merged commit 0668bb2 into main Mar 19, 2025
11 checks passed
@mtrofin mtrofin deleted the users/mtrofin/03-14-_ctxprof_track_unhandled_call_targets branch March 19, 2025 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-rt llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants