Skip to content

[MemProf] Prune unneeded non-cold contexts #124823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion llvm/include/llvm/Analysis/MemoryProfileInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,37 @@ class CallStackTrie {
// Allocation types for call context sharing the context prefix at this
// node.
uint8_t AllocTypes;
// Updated as we add allocations to note if this is the deepest point in the
// trie that has an ambiguous allocation type (both Cold and NotCold). It is
// used to prune unneeded NotCold contexts, taking advantage of the fact
// that we later will only clone Cold contexts, as NotCold is the allocation
// default. We only need to keep as metadata the NotCold contexts that
// overlap the longest with Cold allocations, so that we know how deeply we
// need to clone. For example, assume we add the following contexts to the
// trie:
// 1 3 (notcold)
// 1 2 4 (cold)
// 1 2 5 (notcold)
// 1 2 6 (notcold)
// the trie looks like:
// 1
// / \
// 2 3
// /|\
// 4 5 6
//
// It is sufficient to prune all but one not cold contexts (either 1,2,5 or
// 1,2,6, we arbitrarily keep the first one we encounter which will be
// 1,2,5). We'll initially have DeepestAmbiguousAllocType set false for trie
// node 1 after the trie is built, and true for node 2. This indicates that
// the not cold context ending in 3 is not needed (its immediate callee has
// this value set false). The first notcold context we encounter when
// iterating the callers of node 2 will be the context ending in 5 (since
// std::map iteration is in sorted order of key), which will see that this
// field is true for its callee, so we will keep it. But at that point we
// set the callee's flag to false which prevents adding the not cold context
// ending in 6 unnecessarily.
bool DeepestAmbiguousAllocType = true;
// If the user has requested reporting of hinted sizes, keep track of the
// associated full stack id and profiled sizes. Can have more than one
// after trimming (e.g. when building from metadata). This is only placed on
Expand Down Expand Up @@ -103,7 +134,8 @@ class CallStackTrie {
bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
std::vector<Metadata *> &MIBNodes,
bool CalleeHasAmbiguousCallerContext);
bool CalleeHasAmbiguousCallerContext,
bool &CalleeDeepestAmbiguousAllocType);

public:
CallStackTrie() = default;
Expand Down
64 changes: 52 additions & 12 deletions llvm/lib/Analysis/MemoryProfileInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,13 @@ cl::opt<bool> MemProfReportHintedSizes(
"memprof-report-hinted-sizes", cl::init(false), cl::Hidden,
cl::desc("Report total allocation sizes of hinted allocations"));

// This is useful if we have enabled reporting of hinted sizes, and want to get
// information from the indexing step for all contexts (especially for testing),
// or have specified a value less than 100% for -memprof-cloning-cold-threshold.
cl::opt<bool> MemProfKeepAllNotColdContexts(
"memprof-keep-all-not-cold-contexts", cl::init(false), cl::Hidden,
cl::desc("Keep all non-cold contexts (increases cloning overheads)"));

AllocationType llvm::memprof::getAllocType(uint64_t TotalLifetimeAccessDensity,
uint64_t AllocCount,
uint64_t TotalLifetime) {
Expand Down Expand Up @@ -156,10 +163,16 @@ void CallStackTrie::addCallStack(
continue;
}
// Update existing caller node if it exists.
CallStackTrieNode *Prev = nullptr;
auto Next = Curr->Callers.find(StackId);
if (Next != Curr->Callers.end()) {
Prev = Curr;
Curr = Next->second;
Curr->addAllocType(AllocType);
// If this node has an ambiguous alloc type, its callee is not the deepest
// point where we have an ambigous allocation type.
if (!hasSingleAllocType(Curr->AllocTypes))
Prev->DeepestAmbiguousAllocType = false;
continue;
}
// Otherwise add a new caller node.
Expand Down Expand Up @@ -243,14 +256,35 @@ void CallStackTrie::convertHotToNotCold(CallStackTrieNode *Node) {
bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
std::vector<Metadata *> &MIBNodes,
bool CalleeHasAmbiguousCallerContext) {
bool CalleeHasAmbiguousCallerContext,
bool &CalleeDeepestAmbiguousAllocType) {
// Trim context below the first node in a prefix with a single alloc type.
// Add an MIB record for the current call stack prefix.
if (hasSingleAllocType(Node->AllocTypes)) {
std::vector<ContextTotalSize> ContextSizeInfo;
collectContextSizeInfo(Node, ContextSizeInfo);
MIBNodes.push_back(createMIBNode(
Ctx, MIBCallStack, (AllocationType)Node->AllocTypes, ContextSizeInfo));
// Because we only clone cold contexts (we don't clone for exposing NotCold
// contexts as that is the default allocation behavior), we create MIB
// metadata for this context if any of the following are true:
// 1) It is cold.
// 2) The immediate callee is the deepest point where we have an ambiguous
// allocation type (i.e. the other callers that are cold need to know
// that we have a not cold context overlapping to this point so that we
// know how deep to clone).
// 3) MemProfKeepAllNotColdContexts is enabled, which is useful if we are
// reporting hinted sizes, and want to get information from the indexing
// step for all contexts, or have specified a value less than 100% for
// -memprof-cloning-cold-threshold.
if (Node->hasAllocType(AllocationType::Cold) ||
CalleeDeepestAmbiguousAllocType || MemProfKeepAllNotColdContexts) {
std::vector<ContextTotalSize> ContextSizeInfo;
collectContextSizeInfo(Node, ContextSizeInfo);
MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack,
(AllocationType)Node->AllocTypes,
ContextSizeInfo));
// If we just emitted an MIB for a not cold caller, don't need to emit
// another one for the callee to correctly disambiguate its cold callers.
if (!Node->hasAllocType(AllocationType::Cold))
CalleeDeepestAmbiguousAllocType = false;
}
return true;
}

Expand All @@ -261,9 +295,9 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
bool AddedMIBNodesForAllCallerContexts = true;
for (auto &Caller : Node->Callers) {
MIBCallStack.push_back(Caller.first);
AddedMIBNodesForAllCallerContexts &=
buildMIBNodes(Caller.second, Ctx, MIBCallStack, MIBNodes,
NodeHasAmbiguousCallerContext);
AddedMIBNodesForAllCallerContexts &= buildMIBNodes(
Caller.second, Ctx, MIBCallStack, MIBNodes,
NodeHasAmbiguousCallerContext, Node->DeepestAmbiguousAllocType);
// Remove Caller.
MIBCallStack.pop_back();
}
Expand Down Expand Up @@ -337,10 +371,16 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
MIBCallStack.push_back(AllocStackId);
std::vector<Metadata *> MIBNodes;
assert(!Alloc->Callers.empty() && "addCallStack has not been called yet");
// The last parameter is meant to say whether the callee of the given node
// has more than one caller. Here the node being passed in is the alloc
// and it has no callees. So it's false.
if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes, false)) {
// The CalleeHasAmbiguousCallerContext flag is meant to say whether the
// callee of the given node has more than one caller. Here the node being
// passed in is the alloc and it has no callees. So it's false.
// Similarly, the last parameter is meant to say whether the callee of the
// given node is the deepest point where we have ambiguous alloc types, which
// is also false as the alloc has no callees.
bool DeepestAmbiguousAllocType = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change buildMIBNodes to take a pointer to a bool instead of a reference (defaulting to nullptr) then we can avoid the change (and redundant var) here. Inside buildMIBNodes we'd check and fill in the bool only if the pointer was not null. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to do this but it made the uses more complicated inside buildMIBNodes and I think it ends up more complicated and hard to reason about overall. So I'd prefer to leave as-is.

if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes,
/*CalleeHasAmbiguousCallerContext=*/false,
DeepestAmbiguousAllocType)) {
assert(MIBCallStack.size() == 1 &&
"Should only be left with Alloc's location in stack");
CI->setMetadata(LLVMContext::MD_memprof, MDNode::get(Ctx, MIBNodes));
Expand Down
19 changes: 7 additions & 12 deletions llvm/test/Transforms/PGOProfile/memprof.ll
Original file line number Diff line number Diff line change
Expand Up @@ -64,14 +64,14 @@
; RUN: opt < %s -passes='pgo-instr-use,memprof-use<profile-filename=%t.pgomemprofdata>' -pgo-test-profile-file=%t.pgomemprofdata -pgo-warn-missing-function -S 2>&1 | FileCheck %s --check-prefixes=MEMPROF,ALL,PGO

;; Check that the total sizes are reported if requested.
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES

;; Check that we hint additional allocations with a threshold < 100%
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-matching-cold-threshold=60 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZESTHRESH60

;; Make sure that the -memprof-cloning-cold-threshold flag is enough to cause
;; the size metadata to be generated for the LTO link.
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES

;; Make sure we emit a random hotness seed if requested.
; RUN: llvm-profdata merge -memprof-random-hotness %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdatarand 2>&1 | FileCheck %s --check-prefix=RAND
Expand Down Expand Up @@ -339,7 +339,7 @@ for.end: ; preds = %for.cond

; MEMPROF: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
; MEMPROF: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
; MEMPROF: ![[STACK1]] = !{i64 2732490490862098848, i64 748269490701775343}
; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold"}
Expand All @@ -348,8 +348,6 @@ for.end: ; preds = %for.cond
; MEMPROF: ![[STACK3]] = !{i64 2732490490862098848, i64 2104812325165620841, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934}
; MEMPROF: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
; MEMPROF: ![[STACK4]] = !{i64 2732490490862098848, i64 8467819354083268568}
; MEMPROF: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"notcold"}
; MEMPROF: ![[STACK5]] = !{i64 2732490490862098848, i64 8690657650969109624}
; MEMPROF: ![[C1]] = !{i64 2732490490862098848}
; MEMPROF: ![[C2]] = !{i64 8467819354083268568}
; MEMPROF: ![[C3]] = !{i64 9086428284934609951}
Expand Down Expand Up @@ -390,17 +388,15 @@ for.end: ; preds = %for.cond

; MEMPROFNOCOLINFO: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
; MEMPROFNOCOLINFO: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
; MEMPROFNOCOLINFO: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
; MEMPROFNOCOLINFO: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
; MEMPROFNOCOLINFO: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
; MEMPROFNOCOLINFO: ![[STACK1]] = !{i64 5281664982037379640, i64 6362220161075421157, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790, i64 3577763375057267810}
; MEMPROFNOCOLINFO: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"notcold"}
; MEMPROFNOCOLINFO: ![[STACK2]] = !{i64 5281664982037379640, i64 6362220161075421157, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790}
; MEMPROFNOCOLINFO: ![[MIB3]] = !{![[STACK3:[0-9]+]], !"notcold"}
; MEMPROFNOCOLINFO: ![[STACK3]] = !{i64 5281664982037379640, i64 -6896091699916449732}
; MEMPROFNOCOLINFO: ![[MIB3]] = !{![[STACK3:[0-9]+]], !"cold"}
; MEMPROFNOCOLINFO: ![[STACK3]] = !{i64 5281664982037379640, i64 -6871734214936418908}
; MEMPROFNOCOLINFO: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
; MEMPROFNOCOLINFO: ![[STACK4]] = !{i64 5281664982037379640, i64 -6871734214936418908}
; MEMPROFNOCOLINFO: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"cold"}
; MEMPROFNOCOLINFO: ![[STACK5]] = !{i64 5281664982037379640, i64 -6201180255894224618}
; MEMPROFNOCOLINFO: ![[STACK4]] = !{i64 5281664982037379640, i64 -6201180255894224618}
; MEMPROFNOCOLINFO: ![[C1]] = !{i64 5281664982037379640}
; MEMPROFNOCOLINFO: ![[C2]] = !{i64 -6871734214936418908}
; MEMPROFNOCOLINFO: ![[C3]] = !{i64 -5588766871448036195}
Expand All @@ -420,7 +416,6 @@ for.end: ; preds = %for.cond
; MEMPROFRAND2: !"cold"
; MEMPROFRAND2: !"cold"
; MEMPROFRAND2: !"notcold"
; MEMPROFRAND2: !"notcold"

; MEMPROFSTATS: 8 memprof - Number of alloc contexts in memory profile.
; MEMPROFSTATS: 10 memprof - Number of callsites in memory profile.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ entry:

declare noundef ptr @_ZnwmSt11align_val_tRKSt9nothrow_t12__hot_cold_t(i64 noundef, i64 noundef, ptr noundef nonnull align 1 dereferenceable(1), i8 noundef zeroext)

; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
; MEMPROF: ![[STACK1]] = !{i64 2732490490862098848, i64 748269490701775343}
; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold"}
Expand All @@ -114,8 +114,6 @@ declare noundef ptr @_ZnwmSt11align_val_tRKSt9nothrow_t12__hot_cold_t(i64 nounde
; MEMPROF: ![[STACK3]] = !{i64 2732490490862098848, i64 2104812325165620841, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934}
; MEMPROF: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
; MEMPROF: ![[STACK4]] = !{i64 2732490490862098848, i64 8467819354083268568}
; MEMPROF: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"notcold"}
; MEMPROF: ![[STACK5]] = !{i64 2732490490862098848, i64 8690657650969109624}
; MEMPROF: ![[C1]] = !{i64 2732490490862098848}

!llvm.dbg.cu = !{!0}
Expand Down
Loading