Skip to content

Commit 02d6aad

Browse files
[MemProf] Reduce unnecessary context id computation (NFC) (#109857)
One of the memory reduction techniques was to compute node context ids on the fly. This reduced memory at the expense of some compile time increase. For a large binary we were spending a lot of time invoking getContextIds on the node during assignStackNodesPostOrder, because we were iterating through the stack ids for a call from leaf to root (first to last node in the parlance used in that code). However, all calls for a given entry in the StackIdToMatchingCalls map share the same last node, so we can borrow the approach used by similar code in updateStackNodes and compute the context ids on the last node once, then iterate each call's stack ids in reverse order while reusing the last node's context ids. This reduced the thin link time by 43% for a large target. It isn't clear why there wasn't a similar increase measured when introducing the node context id recomputation, but the compile time was longer to start with then.
1 parent 4a9da96 commit 02d6aad

File tree

1 file changed

+30
-14
lines changed

1 file changed

+30
-14
lines changed

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

Lines changed: 30 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1362,12 +1362,22 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
13621362
}
13631363
}
13641364

1365+
#ifndef NDEBUG
13651366
// Find the node for the last stack id, which should be the same
13661367
// across all calls recorded for this id, and is this node's id.
13671368
uint64_t LastId = Node->OrigStackOrAllocId;
13681369
ContextNode *LastNode = getNodeForStackId(LastId);
13691370
// We should only have kept stack ids that had nodes.
13701371
assert(LastNode);
1372+
assert(LastNode == Node);
1373+
#else
1374+
ContextNode *LastNode = Node;
1375+
#endif
1376+
1377+
// Compute the last node's context ids once, as it is shared by all calls in
1378+
// this entry.
1379+
DenseSet<uint32_t> LastNodeContextIds = LastNode->getContextIds();
1380+
assert(!LastNodeContextIds.empty());
13711381

13721382
for (unsigned I = 0; I < Calls.size(); I++) {
13731383
auto &[Call, Ids, Func, SavedContextIds] = Calls[I];
@@ -1389,40 +1399,43 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
13891399

13901400
assert(LastId == Ids.back());
13911401

1392-
ContextNode *FirstNode = getNodeForStackId(Ids[0]);
1393-
assert(FirstNode);
1394-
13951402
// Recompute the context ids for this stack id sequence (the
13961403
// intersection of the context ids of the corresponding nodes).
13971404
// Start with the ids we saved in the map for this call, which could be
13981405
// duplicated context ids. We have to recompute as we might have overlap
13991406
// overlap between the saved context ids for different last nodes, and
14001407
// removed them already during the post order traversal.
1401-
set_intersect(SavedContextIds, FirstNode->getContextIds());
1402-
ContextNode *PrevNode = nullptr;
1403-
for (auto Id : Ids) {
1408+
set_intersect(SavedContextIds, LastNodeContextIds);
1409+
ContextNode *PrevNode = LastNode;
1410+
bool Skip = false;
1411+
// Iterate backwards through the stack Ids, starting after the last Id
1412+
// in the list, which was handled once outside for all Calls.
1413+
for (auto IdIter = Ids.rbegin() + 1; IdIter != Ids.rend(); IdIter++) {
1414+
auto Id = *IdIter;
14041415
ContextNode *CurNode = getNodeForStackId(Id);
14051416
// We should only have kept stack ids that had nodes and weren't
14061417
// recursive.
14071418
assert(CurNode);
14081419
assert(!CurNode->Recursive);
1409-
if (!PrevNode) {
1410-
PrevNode = CurNode;
1411-
continue;
1412-
}
1413-
auto *Edge = CurNode->findEdgeFromCallee(PrevNode);
1420+
1421+
auto *Edge = CurNode->findEdgeFromCaller(PrevNode);
14141422
if (!Edge) {
1415-
SavedContextIds.clear();
1423+
Skip = true;
14161424
break;
14171425
}
14181426
PrevNode = CurNode;
1427+
1428+
// Update the context ids, which is the intersection of the ids along
1429+
// all edges in the sequence.
14191430
set_intersect(SavedContextIds, Edge->getContextIds());
14201431

14211432
// If we now have no context ids for clone, skip this call.
1422-
if (SavedContextIds.empty())
1433+
if (SavedContextIds.empty()) {
1434+
Skip = true;
14231435
break;
1436+
}
14241437
}
1425-
if (SavedContextIds.empty())
1438+
if (Skip)
14261439
continue;
14271440

14281441
// Create new context node.
@@ -1433,6 +1446,9 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
14331446
NonAllocationCallToContextNodeMap[Call] = NewNode;
14341447
NewNode->AllocTypes = computeAllocType(SavedContextIds);
14351448

1449+
ContextNode *FirstNode = getNodeForStackId(Ids[0]);
1450+
assert(FirstNode);
1451+
14361452
// Connect to callees of innermost stack frame in inlined call chain.
14371453
// This updates context ids for FirstNode's callee's to reflect those
14381454
// moved to NewNode.

0 commit comments

Comments
 (0)