Skip to content

[MemProf] Prune unneeded non-cold contexts #124823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 29, 2025

Conversation

teresajohnson
Copy link
Contributor

We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.

Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.

For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.

We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.

Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.

For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.
@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jan 28, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2025

@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

Changes

We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.

Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.

For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.


Patch is 24.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124823.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Analysis/MemoryProfileInfo.h (+33-1)
  • (modified) llvm/lib/Analysis/MemoryProfileInfo.cpp (+47-11)
  • (modified) llvm/test/Transforms/PGOProfile/memprof.ll (+7-12)
  • (modified) llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll (+1-3)
  • (modified) llvm/unittests/Analysis/MemoryProfileInfoTest.cpp (+108-19)
diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index deb7ab134c1617..f75783a4fef50e 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -56,6 +56,37 @@ class CallStackTrie {
     // Allocation types for call context sharing the context prefix at this
     // node.
     uint8_t AllocTypes;
+    // Updated as we add allocations to note if this is the deepest point in the
+    // trie that has an ambiguous allocation type (both Cold and NotCold). It is
+    // used to prune unneeded NotCold contexts, taking advantage of the fact
+    // that we later will only clone Cold contexts, as NotCold is the allocation
+    // default. We only need to keep as metadata the NotCold contexts that
+    // overlap the longest with Cold allocations, so that we know how deeply we
+    // need to clone. For example, assume we add the following contexts to the
+    // trie:
+    //    1 3 (notcold)
+    //    1 2 4 (cold)
+    //    1 2 5 (notcold)
+    //    1 2 6 (notcold)
+    // the trie looks like:
+    //         1
+    //        / \
+    //       2   3
+    //      /|\
+    //     4 5 6
+    //
+    // It is sufficient to prune all but one not cold contexts (either 1,2,5 or
+    // 1,2,6, we arbitrarily keep the first one we encounter which will be
+    // 1,2,5). We'll initially have DeepestAmbiguousAllocType set false for trie
+    // node 1 after the trie is built, and true for node 2. This indicates that
+    // the not cold context ending in 3 is not needed (its immediate callee has
+    // this value set false). The first notcold context we encounter when
+    // iterating the callers of node 2 will be the context ending in 5 (since
+    // std::map iteration is in sorted order of key), which will see that this
+    // field is true for its callee, so we will keep it. But at that point we
+    // set the callee's flag to false which prevents adding the not cold context
+    // ending in 6 unnecessarily.
+    bool DeepestAmbiguousAllocType = true;
     // If the user has requested reporting of hinted sizes, keep track of the
     // associated full stack id and profiled sizes. Can have more than one
     // after trimming (e.g. when building from metadata). This is only placed on
@@ -103,7 +134,8 @@ class CallStackTrie {
   bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
                      std::vector<uint64_t> &MIBCallStack,
                      std::vector<Metadata *> &MIBNodes,
-                     bool CalleeHasAmbiguousCallerContext);
+                     bool CalleeHasAmbiguousCallerContext,
+                     bool &CalleeDeepestAmbiguousAllocType);
 
 public:
   CallStackTrie() = default;
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 5553a2e2dd24ba..409748ec80fe79 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -51,6 +51,13 @@ cl::opt<bool> MemProfReportHintedSizes(
     "memprof-report-hinted-sizes", cl::init(false), cl::Hidden,
     cl::desc("Report total allocation sizes of hinted allocations"));
 
+// This is useful if we have enabled reporting of hinted sizes, and want to get
+// information from the indexing step for all contexts (especially for testing),
+// or have specified a value less than 100% for -memprof-cloning-cold-threshold.
+cl::opt<bool> MemProfKeepAllNotColdContexts(
+    "memprof-keep-all-not-cold-contexts", cl::init(false), cl::Hidden,
+    cl::desc("Disable pruning of non-cold contexts unneeded for cold cloning"));
+
 AllocationType llvm::memprof::getAllocType(uint64_t TotalLifetimeAccessDensity,
                                            uint64_t AllocCount,
                                            uint64_t TotalLifetime) {
@@ -156,10 +163,16 @@ void CallStackTrie::addCallStack(
       continue;
     }
     // Update existing caller node if it exists.
+    CallStackTrieNode *Prev = nullptr;
     auto Next = Curr->Callers.find(StackId);
     if (Next != Curr->Callers.end()) {
+      Prev = Curr;
       Curr = Next->second;
       Curr->addAllocType(AllocType);
+      // If this node has an ambiguous alloc type, its callee is not the deepest
+      // point where we have an ambigous allocation type.
+      if (!hasSingleAllocType(Curr->AllocTypes))
+        Prev->DeepestAmbiguousAllocType = false;
       continue;
     }
     // Otherwise add a new caller node.
@@ -243,14 +256,35 @@ void CallStackTrie::convertHotToNotCold(CallStackTrieNode *Node) {
 bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
                                   std::vector<uint64_t> &MIBCallStack,
                                   std::vector<Metadata *> &MIBNodes,
-                                  bool CalleeHasAmbiguousCallerContext) {
+                                  bool CalleeHasAmbiguousCallerContext,
+                                  bool &CalleeDeepestAmbiguousAllocType) {
   // Trim context below the first node in a prefix with a single alloc type.
   // Add an MIB record for the current call stack prefix.
   if (hasSingleAllocType(Node->AllocTypes)) {
-    std::vector<ContextTotalSize> ContextSizeInfo;
-    collectContextSizeInfo(Node, ContextSizeInfo);
-    MIBNodes.push_back(createMIBNode(
-        Ctx, MIBCallStack, (AllocationType)Node->AllocTypes, ContextSizeInfo));
+    // Because we only clone cold contexts (we don't clone for exposing NotCold
+    // contexts as that is the default allocation behavior), we create MIB
+    // metadata for this context iff:
+    // 1) It is cold.
+    // 2) The immediate callee is the deepest point where we have an ambiguous
+    //    allocation type (i.e. the other callers that are cold need to know
+    //    that we have a not cold context overlapping to this point so that we
+    //    know how deep to clone).
+    // 3) MemProfKeepAllNotColdContexts is enabled, which is useful if we are
+    //    reporting hinted sizes, and want to get information from the indexing
+    //    step for all contexts, or have specified a value less than 100% for
+    //    -memprof-cloning-cold-threshold.
+    if ((AllocationType)Node->AllocTypes == AllocationType::Cold ||
+        CalleeDeepestAmbiguousAllocType || MemProfKeepAllNotColdContexts) {
+      std::vector<ContextTotalSize> ContextSizeInfo;
+      collectContextSizeInfo(Node, ContextSizeInfo);
+      MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack,
+                                       (AllocationType)Node->AllocTypes,
+                                       ContextSizeInfo));
+      // If we just emitted an MIB for a not cold caller, don't need to emit
+      // another one for the callee to correctly disambiguate its cold callers.
+      if ((AllocationType)Node->AllocTypes != AllocationType::Cold)
+        CalleeDeepestAmbiguousAllocType = false;
+    }
     return true;
   }
 
@@ -261,9 +295,9 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
     bool AddedMIBNodesForAllCallerContexts = true;
     for (auto &Caller : Node->Callers) {
       MIBCallStack.push_back(Caller.first);
-      AddedMIBNodesForAllCallerContexts &=
-          buildMIBNodes(Caller.second, Ctx, MIBCallStack, MIBNodes,
-                        NodeHasAmbiguousCallerContext);
+      AddedMIBNodesForAllCallerContexts &= buildMIBNodes(
+          Caller.second, Ctx, MIBCallStack, MIBNodes,
+          NodeHasAmbiguousCallerContext, Node->DeepestAmbiguousAllocType);
       // Remove Caller.
       MIBCallStack.pop_back();
     }
@@ -338,9 +372,11 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
   std::vector<Metadata *> MIBNodes;
   assert(!Alloc->Callers.empty() && "addCallStack has not been called yet");
   // The last parameter is meant to say whether the callee of the given node
-  // has more than one caller. Here the node being passed in is the alloc
-  // and it has no callees. So it's false.
-  if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes, false)) {
+  // has is the deepest point where we have ambiguous alloc types. Here the
+  // node being passed in is the alloc and it has no callees. So it's false.
+  bool DeepestAmbiguousAllocType = true;
+  if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes, false,
+                    DeepestAmbiguousAllocType)) {
     assert(MIBCallStack.size() == 1 &&
            "Should only be left with Alloc's location in stack");
     CI->setMetadata(LLVMContext::MD_memprof, MDNode::get(Ctx, MIBNodes));
diff --git a/llvm/test/Transforms/PGOProfile/memprof.ll b/llvm/test/Transforms/PGOProfile/memprof.ll
index acf70880becd19..f0421ba60cffca 100644
--- a/llvm/test/Transforms/PGOProfile/memprof.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof.ll
@@ -64,14 +64,14 @@
 ; RUN: opt < %s -passes='pgo-instr-use,memprof-use<profile-filename=%t.pgomemprofdata>' -pgo-test-profile-file=%t.pgomemprofdata -pgo-warn-missing-function -S 2>&1 | FileCheck %s --check-prefixes=MEMPROF,ALL,PGO
 
 ;; Check that the total sizes are reported if requested.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
 
 ;; Check that we hint additional allocations with a threshold < 100%
 ; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-matching-cold-threshold=60 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZESTHRESH60
 
 ;; Make sure that the -memprof-cloning-cold-threshold flag is enough to cause
 ;; the size metadata to be generated for the LTO link.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
 
 ;; Make sure we emit a random hotness seed if requested.
 ; RUN: llvm-profdata merge -memprof-random-hotness %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdatarand 2>&1 | FileCheck %s --check-prefix=RAND
@@ -339,7 +339,7 @@ for.end:                                          ; preds = %for.cond
 
 ; MEMPROF: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
 ; MEMPROF: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
-; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
+; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
 ; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK1]] = !{i64 2732490490862098848, i64 748269490701775343}
 ; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold"}
@@ -348,8 +348,6 @@ for.end:                                          ; preds = %for.cond
 ; MEMPROF: ![[STACK3]] = !{i64 2732490490862098848, i64 2104812325165620841, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934}
 ; MEMPROF: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK4]] = !{i64 2732490490862098848, i64 8467819354083268568}
-; MEMPROF: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"notcold"}
-; MEMPROF: ![[STACK5]] = !{i64 2732490490862098848, i64 8690657650969109624}
 ; MEMPROF: ![[C1]] = !{i64 2732490490862098848}
 ; MEMPROF: ![[C2]] = !{i64 8467819354083268568}
 ; MEMPROF: ![[C3]] = !{i64 9086428284934609951}
@@ -390,17 +388,15 @@ for.end:                                          ; preds = %for.cond
 
 ; MEMPROFNOCOLINFO: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
 ; MEMPROFNOCOLINFO: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
-; MEMPROFNOCOLINFO: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
+; MEMPROFNOCOLINFO: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
 ; MEMPROFNOCOLINFO: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
 ; MEMPROFNOCOLINFO: ![[STACK1]] = !{i64 5281664982037379640, i64 6362220161075421157, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790, i64 3577763375057267810}
 ; MEMPROFNOCOLINFO: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"notcold"}
 ; MEMPROFNOCOLINFO: ![[STACK2]] = !{i64 5281664982037379640, i64 6362220161075421157, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790}
-; MEMPROFNOCOLINFO: ![[MIB3]] = !{![[STACK3:[0-9]+]], !"notcold"}
-; MEMPROFNOCOLINFO: ![[STACK3]] = !{i64 5281664982037379640, i64 -6896091699916449732}
+; MEMPROFNOCOLINFO: ![[MIB3]] = !{![[STACK3:[0-9]+]], !"cold"}
+; MEMPROFNOCOLINFO: ![[STACK3]] = !{i64 5281664982037379640, i64 -6871734214936418908}
 ; MEMPROFNOCOLINFO: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
-; MEMPROFNOCOLINFO: ![[STACK4]] = !{i64 5281664982037379640, i64 -6871734214936418908}
-; MEMPROFNOCOLINFO: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"cold"}
-; MEMPROFNOCOLINFO: ![[STACK5]] = !{i64 5281664982037379640, i64 -6201180255894224618}
+; MEMPROFNOCOLINFO: ![[STACK4]] = !{i64 5281664982037379640, i64 -6201180255894224618}
 ; MEMPROFNOCOLINFO: ![[C1]] = !{i64 5281664982037379640}
 ; MEMPROFNOCOLINFO: ![[C2]] = !{i64 -6871734214936418908}
 ; MEMPROFNOCOLINFO: ![[C3]] = !{i64 -5588766871448036195}
@@ -420,7 +416,6 @@ for.end:                                          ; preds = %for.cond
 ; MEMPROFRAND2: !"cold"
 ; MEMPROFRAND2: !"cold"
 ; MEMPROFRAND2: !"notcold"
-; MEMPROFRAND2: !"notcold"
 
 ; MEMPROFSTATS:  8 memprof - Number of alloc contexts in memory profile.
 ; MEMPROFSTATS: 10 memprof - Number of callsites in memory profile.
diff --git a/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll b/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll
index 83753ed6376d3f..4aa05116226623 100644
--- a/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll
@@ -105,7 +105,7 @@ entry:
 
 declare noundef ptr @_ZnwmSt11align_val_tRKSt9nothrow_t12__hot_cold_t(i64 noundef, i64 noundef, ptr noundef nonnull align 1 dereferenceable(1), i8 noundef zeroext)
 
-; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
+; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
 ; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK1]] = !{i64 2732490490862098848, i64 748269490701775343}
 ; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold"}
@@ -114,8 +114,6 @@ declare noundef ptr @_ZnwmSt11align_val_tRKSt9nothrow_t12__hot_cold_t(i64 nounde
 ; MEMPROF: ![[STACK3]] = !{i64 2732490490862098848, i64 2104812325165620841, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934}
 ; MEMPROF: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK4]] = !{i64 2732490490862098848, i64 8467819354083268568}
-; MEMPROF: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"notcold"}
-; MEMPROF: ![[STACK5]] = !{i64 2732490490862098848, i64 8690657650969109624}
 ; MEMPROF: ![[C1]] = !{i64 2732490490862098848}
 
 !llvm.dbg.cu = !{!0}
diff --git a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
index b4e81e69116e8a..17dbafff8ee940 100644
--- a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
+++ b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
@@ -15,6 +15,7 @@
 #include "llvm/IR/ModuleSummaryIndex.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/SourceMgr.h"
+#include "gmock/gmock.h"
 #include "gtest/gtest.h"
 #include <cstring>
 #include <sys/types.h>
@@ -220,6 +221,48 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   EXPECT_EQ(Call4->getFnAttr("memprof").getValueAsString(), "notcold");
 }
 
+// TODO: Use this matcher in existing tests.
+// ExpectedVals should be a vector of expected MIBs and their allocation type
+// and stack id contents in order, of type:
+//  std::vector<std::pair<AllocationType, std::vector<unsigned>>>
+MATCHER_P(MemprofMetadataEquals, ExpectedVals, "Matching !memprof contents") {
+  auto PrintAndFail = [&]() {
+    std::string Buffer;
+    llvm::raw_string_ostream OS(Buffer);
+    OS << "Expected:\n";
+    for (auto &[ExpectedAllocType, ExpectedStackIds] : ExpectedVals) {
+      OS << "\t" << getAllocTypeAttributeString(ExpectedAllocType) << " { ";
+      for (auto Id : ExpectedStackIds)
+        OS << Id << " ";
+      OS << "}\n";
+    }
+    OS << "Got:\n";
+    arg->printTree(OS);
+    *result_listener << "!memprof metadata differs!\n" << Buffer;
+    return false;
+  };
+
+  if (ExpectedVals.size() != arg->getNumOperands())
+    return PrintAndFail();
+
+  for (size_t I = 0; I < ExpectedVals.size(); I++) {
+    const auto &[ExpectedAllocType, ExpectedStackIds] = ExpectedVals[I];
+    MDNode *MIB = dyn_cast<MDNode>(arg->getOperand(I));
+    if (getMIBAllocType(MIB) != ExpectedAllocType)
+      return PrintAndFail();
+    MDNode *StackMD = getMIBStackNode(MIB);
+    EXPECT_NE(StackMD, nullptr);
+    if (StackMD->getNumOperands() != ExpectedStackIds.size())
+      return PrintAndFail();
+    for (size_t J = 0; J < ExpectedStackIds.size(); J++) {
+      auto *StackId = mdconst::dyn_extract<ConstantInt>(StackMD->getOperand(J));
+      if (StackId->getZExtValue() != ExpectedStackIds[J])
+        return PrintAndFail();
+    }
+  }
+  return true;
+}
+
 // Test CallStackTrie::addCallStack interface taking allocation type and list of
 // call stack ids.
 // Test that an allocation call reached by both cold and non cold call stacks
@@ -344,6 +387,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   CallStackTrie Trie;
   Trie.addCallStack(AllocationType::Cold, {1, 2});
   Trie.addCallStack(AllocationType::NotCold, {1, 3});
+  // This will be pruned as it is unnecessary to determine how to clone the
+  // cold allocation.
   Trie.addCallStack(AllocationType::Hot, {1, 4});
 
   CallBase *Call = findCall(*Func, "call");
@@ -352,7 +397,7 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   EXPECT_FALSE(Call->hasFnAttr("memprof"));
   EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
   MDNode *MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
-  ASSERT_EQ(MemProfMD->getNumOperands(), 3u);
+  ASSERT_EQ(MemProfMD->getNumOperands(), 2u);
   for (auto &MIBOp : MemProfMD->operands()) {
     MDNode *MIB = dyn_cast<MDNode>(MIBOp);
     MDNode *StackMD = getMIBStackNode(MIB);
@@ -365,10 +410,6 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Cold);
     } else if (StackId->getZExtValue() == 3u) {
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
-    } else {
-      ASSERT_EQ(StackId->getZExtValue(), 4u);
-      // Hot contexts are converted to NotCold when building the metadata.
-      EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
     }
   }
 }
@@ -404,8 +445,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   // with the non-cold context {1, 5}.
   Trie.addCallStack(AllocationType::NotCold, {1, 5, 6});
   Trie.addCallStack(AllocationType::NotCold, {1, 5, 7});
-  // We should be able to trim the following two and combine into a single MIB
-  // with the hot context {1, 8}.
+  // These will be pruned as they are unnecessary to determine how to clone the
+  // cold allocation.
   Trie.addCallStack(AllocationType::Hot, {1, 8, 9});
   Trie.addCallStack(AllocationType::Hot, {1, 8, 10});
 
@@ -415,7 +456,7 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   EXPECT_FALSE(Call->hasFnAttr("memprof"));
   EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
   MDNode *MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
-  ASSERT_EQ(MemProfMD->getNumOperands(), 3u);
+  ASSERT_EQ(MemProfMD->getNumOperands(), 2u);
   for (auto &MIBOp : Me...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Teresa Johnson (teresajohnson)

Changes

We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.

Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.

For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.


Patch is 24.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124823.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Analysis/MemoryProfileInfo.h (+33-1)
  • (modified) llvm/lib/Analysis/MemoryProfileInfo.cpp (+47-11)
  • (modified) llvm/test/Transforms/PGOProfile/memprof.ll (+7-12)
  • (modified) llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll (+1-3)
  • (modified) llvm/unittests/Analysis/MemoryProfileInfoTest.cpp (+108-19)
diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index deb7ab134c1617..f75783a4fef50e 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -56,6 +56,37 @@ class CallStackTrie {
     // Allocation types for call context sharing the context prefix at this
     // node.
     uint8_t AllocTypes;
+    // Updated as we add allocations to note if this is the deepest point in the
+    // trie that has an ambiguous allocation type (both Cold and NotCold). It is
+    // used to prune unneeded NotCold contexts, taking advantage of the fact
+    // that we later will only clone Cold contexts, as NotCold is the allocation
+    // default. We only need to keep as metadata the NotCold contexts that
+    // overlap the longest with Cold allocations, so that we know how deeply we
+    // need to clone. For example, assume we add the following contexts to the
+    // trie:
+    //    1 3 (notcold)
+    //    1 2 4 (cold)
+    //    1 2 5 (notcold)
+    //    1 2 6 (notcold)
+    // the trie looks like:
+    //         1
+    //        / \
+    //       2   3
+    //      /|\
+    //     4 5 6
+    //
+    // It is sufficient to prune all but one not cold contexts (either 1,2,5 or
+    // 1,2,6, we arbitrarily keep the first one we encounter which will be
+    // 1,2,5). We'll initially have DeepestAmbiguousAllocType set false for trie
+    // node 1 after the trie is built, and true for node 2. This indicates that
+    // the not cold context ending in 3 is not needed (its immediate callee has
+    // this value set false). The first notcold context we encounter when
+    // iterating the callers of node 2 will be the context ending in 5 (since
+    // std::map iteration is in sorted order of key), which will see that this
+    // field is true for its callee, so we will keep it. But at that point we
+    // set the callee's flag to false which prevents adding the not cold context
+    // ending in 6 unnecessarily.
+    bool DeepestAmbiguousAllocType = true;
     // If the user has requested reporting of hinted sizes, keep track of the
     // associated full stack id and profiled sizes. Can have more than one
     // after trimming (e.g. when building from metadata). This is only placed on
@@ -103,7 +134,8 @@ class CallStackTrie {
   bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
                      std::vector<uint64_t> &MIBCallStack,
                      std::vector<Metadata *> &MIBNodes,
-                     bool CalleeHasAmbiguousCallerContext);
+                     bool CalleeHasAmbiguousCallerContext,
+                     bool &CalleeDeepestAmbiguousAllocType);
 
 public:
   CallStackTrie() = default;
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 5553a2e2dd24ba..409748ec80fe79 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -51,6 +51,13 @@ cl::opt<bool> MemProfReportHintedSizes(
     "memprof-report-hinted-sizes", cl::init(false), cl::Hidden,
     cl::desc("Report total allocation sizes of hinted allocations"));
 
+// This is useful if we have enabled reporting of hinted sizes, and want to get
+// information from the indexing step for all contexts (especially for testing),
+// or have specified a value less than 100% for -memprof-cloning-cold-threshold.
+cl::opt<bool> MemProfKeepAllNotColdContexts(
+    "memprof-keep-all-not-cold-contexts", cl::init(false), cl::Hidden,
+    cl::desc("Disable pruning of non-cold contexts unneeded for cold cloning"));
+
 AllocationType llvm::memprof::getAllocType(uint64_t TotalLifetimeAccessDensity,
                                            uint64_t AllocCount,
                                            uint64_t TotalLifetime) {
@@ -156,10 +163,16 @@ void CallStackTrie::addCallStack(
       continue;
     }
     // Update existing caller node if it exists.
+    CallStackTrieNode *Prev = nullptr;
     auto Next = Curr->Callers.find(StackId);
     if (Next != Curr->Callers.end()) {
+      Prev = Curr;
       Curr = Next->second;
       Curr->addAllocType(AllocType);
+      // If this node has an ambiguous alloc type, its callee is not the deepest
+      // point where we have an ambigous allocation type.
+      if (!hasSingleAllocType(Curr->AllocTypes))
+        Prev->DeepestAmbiguousAllocType = false;
       continue;
     }
     // Otherwise add a new caller node.
@@ -243,14 +256,35 @@ void CallStackTrie::convertHotToNotCold(CallStackTrieNode *Node) {
 bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
                                   std::vector<uint64_t> &MIBCallStack,
                                   std::vector<Metadata *> &MIBNodes,
-                                  bool CalleeHasAmbiguousCallerContext) {
+                                  bool CalleeHasAmbiguousCallerContext,
+                                  bool &CalleeDeepestAmbiguousAllocType) {
   // Trim context below the first node in a prefix with a single alloc type.
   // Add an MIB record for the current call stack prefix.
   if (hasSingleAllocType(Node->AllocTypes)) {
-    std::vector<ContextTotalSize> ContextSizeInfo;
-    collectContextSizeInfo(Node, ContextSizeInfo);
-    MIBNodes.push_back(createMIBNode(
-        Ctx, MIBCallStack, (AllocationType)Node->AllocTypes, ContextSizeInfo));
+    // Because we only clone cold contexts (we don't clone for exposing NotCold
+    // contexts as that is the default allocation behavior), we create MIB
+    // metadata for this context iff:
+    // 1) It is cold.
+    // 2) The immediate callee is the deepest point where we have an ambiguous
+    //    allocation type (i.e. the other callers that are cold need to know
+    //    that we have a not cold context overlapping to this point so that we
+    //    know how deep to clone).
+    // 3) MemProfKeepAllNotColdContexts is enabled, which is useful if we are
+    //    reporting hinted sizes, and want to get information from the indexing
+    //    step for all contexts, or have specified a value less than 100% for
+    //    -memprof-cloning-cold-threshold.
+    if ((AllocationType)Node->AllocTypes == AllocationType::Cold ||
+        CalleeDeepestAmbiguousAllocType || MemProfKeepAllNotColdContexts) {
+      std::vector<ContextTotalSize> ContextSizeInfo;
+      collectContextSizeInfo(Node, ContextSizeInfo);
+      MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack,
+                                       (AllocationType)Node->AllocTypes,
+                                       ContextSizeInfo));
+      // If we just emitted an MIB for a not cold caller, don't need to emit
+      // another one for the callee to correctly disambiguate its cold callers.
+      if ((AllocationType)Node->AllocTypes != AllocationType::Cold)
+        CalleeDeepestAmbiguousAllocType = false;
+    }
     return true;
   }
 
@@ -261,9 +295,9 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
     bool AddedMIBNodesForAllCallerContexts = true;
     for (auto &Caller : Node->Callers) {
       MIBCallStack.push_back(Caller.first);
-      AddedMIBNodesForAllCallerContexts &=
-          buildMIBNodes(Caller.second, Ctx, MIBCallStack, MIBNodes,
-                        NodeHasAmbiguousCallerContext);
+      AddedMIBNodesForAllCallerContexts &= buildMIBNodes(
+          Caller.second, Ctx, MIBCallStack, MIBNodes,
+          NodeHasAmbiguousCallerContext, Node->DeepestAmbiguousAllocType);
       // Remove Caller.
       MIBCallStack.pop_back();
     }
@@ -338,9 +372,11 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
   std::vector<Metadata *> MIBNodes;
   assert(!Alloc->Callers.empty() && "addCallStack has not been called yet");
   // The last parameter is meant to say whether the callee of the given node
-  // has more than one caller. Here the node being passed in is the alloc
-  // and it has no callees. So it's false.
-  if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes, false)) {
+  // has is the deepest point where we have ambiguous alloc types. Here the
+  // node being passed in is the alloc and it has no callees. So it's false.
+  bool DeepestAmbiguousAllocType = true;
+  if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes, false,
+                    DeepestAmbiguousAllocType)) {
     assert(MIBCallStack.size() == 1 &&
            "Should only be left with Alloc's location in stack");
     CI->setMetadata(LLVMContext::MD_memprof, MDNode::get(Ctx, MIBNodes));
diff --git a/llvm/test/Transforms/PGOProfile/memprof.ll b/llvm/test/Transforms/PGOProfile/memprof.ll
index acf70880becd19..f0421ba60cffca 100644
--- a/llvm/test/Transforms/PGOProfile/memprof.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof.ll
@@ -64,14 +64,14 @@
 ; RUN: opt < %s -passes='pgo-instr-use,memprof-use<profile-filename=%t.pgomemprofdata>' -pgo-test-profile-file=%t.pgomemprofdata -pgo-warn-missing-function -S 2>&1 | FileCheck %s --check-prefixes=MEMPROF,ALL,PGO
 
 ;; Check that the total sizes are reported if requested.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
 
 ;; Check that we hint additional allocations with a threshold < 100%
 ; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-matching-cold-threshold=60 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZESTHRESH60
 
 ;; Make sure that the -memprof-cloning-cold-threshold flag is enough to cause
 ;; the size metadata to be generated for the LTO link.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
 
 ;; Make sure we emit a random hotness seed if requested.
 ; RUN: llvm-profdata merge -memprof-random-hotness %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdatarand 2>&1 | FileCheck %s --check-prefix=RAND
@@ -339,7 +339,7 @@ for.end:                                          ; preds = %for.cond
 
 ; MEMPROF: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
 ; MEMPROF: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
-; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
+; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
 ; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK1]] = !{i64 2732490490862098848, i64 748269490701775343}
 ; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold"}
@@ -348,8 +348,6 @@ for.end:                                          ; preds = %for.cond
 ; MEMPROF: ![[STACK3]] = !{i64 2732490490862098848, i64 2104812325165620841, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934}
 ; MEMPROF: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK4]] = !{i64 2732490490862098848, i64 8467819354083268568}
-; MEMPROF: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"notcold"}
-; MEMPROF: ![[STACK5]] = !{i64 2732490490862098848, i64 8690657650969109624}
 ; MEMPROF: ![[C1]] = !{i64 2732490490862098848}
 ; MEMPROF: ![[C2]] = !{i64 8467819354083268568}
 ; MEMPROF: ![[C3]] = !{i64 9086428284934609951}
@@ -390,17 +388,15 @@ for.end:                                          ; preds = %for.cond
 
 ; MEMPROFNOCOLINFO: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
 ; MEMPROFNOCOLINFO: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
-; MEMPROFNOCOLINFO: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
+; MEMPROFNOCOLINFO: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
 ; MEMPROFNOCOLINFO: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
 ; MEMPROFNOCOLINFO: ![[STACK1]] = !{i64 5281664982037379640, i64 6362220161075421157, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790, i64 3577763375057267810}
 ; MEMPROFNOCOLINFO: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"notcold"}
 ; MEMPROFNOCOLINFO: ![[STACK2]] = !{i64 5281664982037379640, i64 6362220161075421157, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790}
-; MEMPROFNOCOLINFO: ![[MIB3]] = !{![[STACK3:[0-9]+]], !"notcold"}
-; MEMPROFNOCOLINFO: ![[STACK3]] = !{i64 5281664982037379640, i64 -6896091699916449732}
+; MEMPROFNOCOLINFO: ![[MIB3]] = !{![[STACK3:[0-9]+]], !"cold"}
+; MEMPROFNOCOLINFO: ![[STACK3]] = !{i64 5281664982037379640, i64 -6871734214936418908}
 ; MEMPROFNOCOLINFO: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
-; MEMPROFNOCOLINFO: ![[STACK4]] = !{i64 5281664982037379640, i64 -6871734214936418908}
-; MEMPROFNOCOLINFO: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"cold"}
-; MEMPROFNOCOLINFO: ![[STACK5]] = !{i64 5281664982037379640, i64 -6201180255894224618}
+; MEMPROFNOCOLINFO: ![[STACK4]] = !{i64 5281664982037379640, i64 -6201180255894224618}
 ; MEMPROFNOCOLINFO: ![[C1]] = !{i64 5281664982037379640}
 ; MEMPROFNOCOLINFO: ![[C2]] = !{i64 -6871734214936418908}
 ; MEMPROFNOCOLINFO: ![[C3]] = !{i64 -5588766871448036195}
@@ -420,7 +416,6 @@ for.end:                                          ; preds = %for.cond
 ; MEMPROFRAND2: !"cold"
 ; MEMPROFRAND2: !"cold"
 ; MEMPROFRAND2: !"notcold"
-; MEMPROFRAND2: !"notcold"
 
 ; MEMPROFSTATS:  8 memprof - Number of alloc contexts in memory profile.
 ; MEMPROFSTATS: 10 memprof - Number of callsites in memory profile.
diff --git a/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll b/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll
index 83753ed6376d3f..4aa05116226623 100644
--- a/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll
@@ -105,7 +105,7 @@ entry:
 
 declare noundef ptr @_ZnwmSt11align_val_tRKSt9nothrow_t12__hot_cold_t(i64 noundef, i64 noundef, ptr noundef nonnull align 1 dereferenceable(1), i8 noundef zeroext)
 
-; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
+; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
 ; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK1]] = !{i64 2732490490862098848, i64 748269490701775343}
 ; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold"}
@@ -114,8 +114,6 @@ declare noundef ptr @_ZnwmSt11align_val_tRKSt9nothrow_t12__hot_cold_t(i64 nounde
 ; MEMPROF: ![[STACK3]] = !{i64 2732490490862098848, i64 2104812325165620841, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934}
 ; MEMPROF: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK4]] = !{i64 2732490490862098848, i64 8467819354083268568}
-; MEMPROF: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"notcold"}
-; MEMPROF: ![[STACK5]] = !{i64 2732490490862098848, i64 8690657650969109624}
 ; MEMPROF: ![[C1]] = !{i64 2732490490862098848}
 
 !llvm.dbg.cu = !{!0}
diff --git a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
index b4e81e69116e8a..17dbafff8ee940 100644
--- a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
+++ b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
@@ -15,6 +15,7 @@
 #include "llvm/IR/ModuleSummaryIndex.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/SourceMgr.h"
+#include "gmock/gmock.h"
 #include "gtest/gtest.h"
 #include <cstring>
 #include <sys/types.h>
@@ -220,6 +221,48 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   EXPECT_EQ(Call4->getFnAttr("memprof").getValueAsString(), "notcold");
 }
 
+// TODO: Use this matcher in existing tests.
+// ExpectedVals should be a vector of expected MIBs and their allocation type
+// and stack id contents in order, of type:
+//  std::vector<std::pair<AllocationType, std::vector<unsigned>>>
+MATCHER_P(MemprofMetadataEquals, ExpectedVals, "Matching !memprof contents") {
+  auto PrintAndFail = [&]() {
+    std::string Buffer;
+    llvm::raw_string_ostream OS(Buffer);
+    OS << "Expected:\n";
+    for (auto &[ExpectedAllocType, ExpectedStackIds] : ExpectedVals) {
+      OS << "\t" << getAllocTypeAttributeString(ExpectedAllocType) << " { ";
+      for (auto Id : ExpectedStackIds)
+        OS << Id << " ";
+      OS << "}\n";
+    }
+    OS << "Got:\n";
+    arg->printTree(OS);
+    *result_listener << "!memprof metadata differs!\n" << Buffer;
+    return false;
+  };
+
+  if (ExpectedVals.size() != arg->getNumOperands())
+    return PrintAndFail();
+
+  for (size_t I = 0; I < ExpectedVals.size(); I++) {
+    const auto &[ExpectedAllocType, ExpectedStackIds] = ExpectedVals[I];
+    MDNode *MIB = dyn_cast<MDNode>(arg->getOperand(I));
+    if (getMIBAllocType(MIB) != ExpectedAllocType)
+      return PrintAndFail();
+    MDNode *StackMD = getMIBStackNode(MIB);
+    EXPECT_NE(StackMD, nullptr);
+    if (StackMD->getNumOperands() != ExpectedStackIds.size())
+      return PrintAndFail();
+    for (size_t J = 0; J < ExpectedStackIds.size(); J++) {
+      auto *StackId = mdconst::dyn_extract<ConstantInt>(StackMD->getOperand(J));
+      if (StackId->getZExtValue() != ExpectedStackIds[J])
+        return PrintAndFail();
+    }
+  }
+  return true;
+}
+
 // Test CallStackTrie::addCallStack interface taking allocation type and list of
 // call stack ids.
 // Test that an allocation call reached by both cold and non cold call stacks
@@ -344,6 +387,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   CallStackTrie Trie;
   Trie.addCallStack(AllocationType::Cold, {1, 2});
   Trie.addCallStack(AllocationType::NotCold, {1, 3});
+  // This will be pruned as it is unnecessary to determine how to clone the
+  // cold allocation.
   Trie.addCallStack(AllocationType::Hot, {1, 4});
 
   CallBase *Call = findCall(*Func, "call");
@@ -352,7 +397,7 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   EXPECT_FALSE(Call->hasFnAttr("memprof"));
   EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
   MDNode *MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
-  ASSERT_EQ(MemProfMD->getNumOperands(), 3u);
+  ASSERT_EQ(MemProfMD->getNumOperands(), 2u);
   for (auto &MIBOp : MemProfMD->operands()) {
     MDNode *MIB = dyn_cast<MDNode>(MIBOp);
     MDNode *StackMD = getMIBStackNode(MIB);
@@ -365,10 +410,6 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Cold);
     } else if (StackId->getZExtValue() == 3u) {
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
-    } else {
-      ASSERT_EQ(StackId->getZExtValue(), 4u);
-      // Hot contexts are converted to NotCold when building the metadata.
-      EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
     }
   }
 }
@@ -404,8 +445,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   // with the non-cold context {1, 5}.
   Trie.addCallStack(AllocationType::NotCold, {1, 5, 6});
   Trie.addCallStack(AllocationType::NotCold, {1, 5, 7});
-  // We should be able to trim the following two and combine into a single MIB
-  // with the hot context {1, 8}.
+  // These will be pruned as they are unnecessary to determine how to clone the
+  // cold allocation.
   Trie.addCallStack(AllocationType::Hot, {1, 8, 9});
   Trie.addCallStack(AllocationType::Hot, {1, 8, 10});
 
@@ -415,7 +456,7 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   EXPECT_FALSE(Call->hasFnAttr("memprof"));
   EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
   MDNode *MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
-  ASSERT_EQ(MemProfMD->getNumOperands(), 3u);
+  ASSERT_EQ(MemProfMD->getNumOperands(), 2u);
   for (auto &MIBOp : Me...
[truncated]

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking through the test.

// Similarly, the last parameter is meant to say whether the callee of the
// given node is the deepest point where we have ambiguous alloc types, which
// is also false as the alloc has no callees.
bool DeepestAmbiguousAllocType = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change buildMIBNodes to take a pointer to a bool instead of a reference (defaulting to nullptr) then we can avoid the change (and redundant var) here. Inside buildMIBNodes we'd check and fill in the bool only if the pointer was not null. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to do this but it made the uses more complicated inside buildMIBNodes and I think it ends up more complicated and hard to reason about overall. So I'd prefer to leave as-is.

// or have specified a value less than 100% for -memprof-cloning-cold-threshold.
cl::opt<bool> MemProfKeepAllNotColdContexts(
"memprof-keep-all-not-cold-contexts", cl::init(false), cl::Hidden,
cl::desc("Disable pruning of non-cold contexts unneeded for cold cloning"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The current description is a bit hard to parse how about "Keep all non-cold contexts (increases cloning overheads)" instead of the current description.

Also should the default be true (existing behaviour) since internal users set -memprof-cloning-cold-threshold to less than 100?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change the description. But I wanted to point out that we are not currently using -memprof-cloning-cold-threshold, only -memprof-matching-cold-threshold which doesn't rely on the metadata. I would like to keep the default here true to reduce overhead by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying we only use the matching option for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the flag description per suggestion

// reporting hinted sizes, and want to get information from the indexing
// step for all contexts, or have specified a value less than 100% for
// -memprof-cloning-cold-threshold.
if ((AllocationType)Node->AllocTypes == AllocationType::Cold ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be rewritten as Node->hasAllocType(AllocationType::Cold) to use the new helpers?

Same for the usage below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here and below

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with some minor comments.

if (getMIBAllocType(MIB) != ExpectedAllocType)
return PrintAndFail();
MDNode *StackMD = getMIBStackNode(MIB);
EXPECT_NE(StackMD, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ASSERT since EXPECT doesn't stop the test and we would deref the nullptr on the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had ASSERT_NE here first but I get a compiler error:

llvm/unittests/Analysis/MemoryProfileInfoTest.cpp:254:5: error: cannot initialize return object of type 'bool' with an rvalue of type 'void'
ASSERT_NE(StackMD, nullptr);

{AllocationType::NotCold, {1, 2, 3, 8, 9, 14}}};

CallBase *Call = findCall(*Func, "call");
Trie.buildAndAttachMIBMetadata(Call);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert Call is not a nullptr before we pass it to the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Neither of these two NotCold contexts are needed as the Cold contexts they
// overlap with are covered by longer overlapping NotCold contexts.
Trie.addCallStack(AllocationType::NotCold, {1, 2, 3, 12});
Trie.addCallStack(AllocationType::NotCold, {1, 2, 11});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add one more test for the edge case like this -- Trie.addCallStack(AllocationType::NotCold, {1, 16});? I noticed that all of the interesting stuff starts from node 2 onwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@teresajohnson teresajohnson merged commit ae6d5dd into llvm:main Jan 29, 2025
5 of 6 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 29, 2025

LLVM Buildbot has detected a new failure on builder lldb-remote-linux-ubuntu running on as-builder-9 while building llvm at step 16 "test-check-lldb-api".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/195/builds/4190

Here is the relevant piece of the build log for the reference
Step 16 (test-check-lldb-api) failure: Test just built components: check-lldb-api completed (failure)
...
PASS: lldb-api :: types/TestCharType.py (1214 of 1223)
PASS: lldb-api :: types/TestCharTypeExpr.py (1215 of 1223)
PASS: lldb-api :: types/TestIntegerType.py (1216 of 1223)
PASS: lldb-api :: types/TestRecursiveTypes.py (1217 of 1223)
PASS: lldb-api :: types/TestIntegerTypeExpr.py (1218 of 1223)
PASS: lldb-api :: types/TestShortType.py (1219 of 1223)
PASS: lldb-api :: types/TestShortTypeExpr.py (1220 of 1223)
PASS: lldb-api :: types/TestLongTypes.py (1221 of 1223)
PASS: lldb-api :: types/TestLongTypesExpr.py (1222 of 1223)
TIMEOUT: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py (1223 of 1223)
******************** TEST 'lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py' FAILED ********************
Script:
--
/usr/bin/python3.12 /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --libcxx-include-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/c++/v1 --libcxx-include-target-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/aarch64-unknown-linux-gnu/c++/v1 --libcxx-library-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib/aarch64-unknown-linux-gnu --arch aarch64 --build-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/lldb --compiler /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/bin/clang --dsymutil /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --lldb-obj-root /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/tools/lldb --lldb-libs-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --platform-url connect://jetson-agx-2198.lab.llvm.org:1234 --platform-working-dir /home/ubuntu/lldb-tests --sysroot /mnt/fs/jetson-agx-ubuntu --env ARCH_CFLAGS=-mcpu=cortex-a78 --platform-name remote-linux --skip-category=lldb-server /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/python_api/process/cancel_attach -p TestCancelAttach.py
--
Exit Code: -9
Timeout: Reached timeout of 600 seconds

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision ae6d5dd58bcae9ced3b3c5b058876d3564017337)
  clang revision ae6d5dd58bcae9ced3b3c5b058876d3564017337
  llvm revision ae6d5dd58bcae9ced3b3c5b058876d3564017337

--
Command Output (stderr):
--
WARNING:root:Custom libc++ is not supported for remote runs: ignoring --libcxx arguments
FAIL: LLDB (/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/bin/clang-aarch64) :: test_scripted_implementation (TestCancelAttach.AttachCancelTestCase.test_scripted_implementation)

--

********************
Slowest Tests:
--------------------------------------------------------------------------
600.04s: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py
180.97s: lldb-api :: commands/command/script_alias/TestCommandScriptAlias.py
70.55s: lldb-api :: commands/process/attach/TestProcessAttach.py
40.45s: lldb-api :: functionalities/data-formatter/data-formatter-stl/libcxx-simulators/string/TestDataFormatterLibcxxStringSimulator.py
35.19s: lldb-api :: functionalities/completion/TestCompletion.py
34.25s: lldb-api :: functionalities/single-thread-step/TestSingleThreadStepTimeout.py
21.10s: lldb-api :: commands/statistics/basic/TestStats.py
20.70s: lldb-api :: functionalities/gdb_remote_client/TestPlatformClient.py
19.25s: lldb-api :: functionalities/thread/state/TestThreadStates.py
18.41s: lldb-api :: commands/dwim-print/TestDWIMPrint.py
15.00s: lldb-api :: functionalities/data-formatter/data-formatter-stl/generic/set/TestDataFormatterGenericSet.py
14.75s: lldb-api :: commands/expression/expr-in-syscall/TestExpressionInSyscall.py
14.36s: lldb-api :: functionalities/inline-stepping/TestInlineStepping.py
13.92s: lldb-api :: python_api/find_in_memory/TestFindRangesInMemory.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants