[MemProf] Reduce unnecessary context id computation (NFC) #109857

teresajohnson · 2024-09-24T20:13:19Z

One of the memory reduction techniques was to compute node context ids
on the fly. This reduced memory at the expense of some compile time
increase.

For a large binary we were spending a lot of time invoking getContextIds
on the node during assignStackNodesPostOrder, because we were iterating
through the stack ids for a call from leaf to root (first to last node
in the parlance used in that code). However, all calls for a given entry
in the StackIdToMatchingCalls map share the same last node, so we can
borrow the approach used by similar code in updateStackNodes and compute
the context ids on the last node once, then iterate each call's stack
ids in reverse order while reusing the last node's context ids.

This reduced the thin link time by 43% for a large target. It isn't
clear why there wasn't a similar increase measured when introducing the
node context id recomputation, but the compile time was longer to start
with then.

One of the memory reduction techniques was to compute node context ids on the fly. This reduced memory at the expense of some compile time increase. For a large binary we were spending a lot of time invoking getContextIds on the node during assignStackNodesPostOrder, because we were iterating through the stack ids for a call from leaf to root (first to last node in the parlance used in that code). However, all calls for a given entry in the StackIdToMatchingCalls map share the same last node, so we can borrow the approach used by similar code in updateStackNodes and compute the context ids on the last node once, then iterate each call's stack ids in reverse order while reusing the last node's context ids. This reduced the thin link time by 43% for a large target. It isn't clear why there wasn't a similar increase measured when introducing the node context id recomputation, but the compile time was longer to start with then.

llvmbot · 2024-09-24T20:13:58Z

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

Changes

One of the memory reduction techniques was to compute node context ids
on the fly. This reduced memory at the expense of some compile time
increase.

For a large binary we were spending a lot of time invoking getContextIds
on the node during assignStackNodesPostOrder, because we were iterating
through the stack ids for a call from leaf to root (first to last node
in the parlance used in that code). However, all calls for a given entry
in the StackIdToMatchingCalls map share the same last node, so we can
borrow the approach used by similar code in updateStackNodes and compute
the context ids on the last node once, then iterate each call's stack
ids in reverse order while reusing the last node's context ids.

This reduced the thin link time by 43% for a large target. It isn't
clear why there wasn't a similar increase measured when introducing the
node context id recomputation, but the compile time was longer to start
with then.

Full diff: https://github.com/llvm/llvm-project/pull/109857.diff

1 Files Affected:

(modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+28-13)

diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index 6927fe538e367b..9f324a6739c9ba 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -1362,12 +1362,22 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
     }
   }
 
+#ifndef NDEBUG
   // Find the node for the last stack id, which should be the same
   // across all calls recorded for this id, and is this node's id.
   uint64_t LastId = Node->OrigStackOrAllocId;
   ContextNode *LastNode = getNodeForStackId(LastId);
   // We should only have kept stack ids that had nodes.
   assert(LastNode);
+  assert(LastNode == Node);
+#else
+  ContextNode *LastNode = Node;
+#endif
+
+  // Compute the last node's context ids once, as it is shared by all calls in
+  // this entry.
+  DenseSet<uint32_t> LastNodeContextIds = LastNode->getContextIds();
+  assert(!LastNodeContextIds.empty());
 
   for (unsigned I = 0; I < Calls.size(); I++) {
     auto &[Call, Ids, Func, SavedContextIds] = Calls[I];
@@ -1398,31 +1408,36 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
     // duplicated context ids. We have to recompute as we might have overlap
     // overlap between the saved context ids for different last nodes, and
     // removed them already during the post order traversal.
-    set_intersect(SavedContextIds, FirstNode->getContextIds());
-    ContextNode *PrevNode = nullptr;
-    for (auto Id : Ids) {
+    set_intersect(SavedContextIds, LastNodeContextIds);
+    ContextNode *PrevNode = LastNode;
+    bool Skip = false;
+    // Iterate backwards through the stack Ids, starting after the last Id
+    // in the list, which was handled once outside for all Calls.
+    for (auto IdIter = Ids.rbegin() + 1; IdIter != Ids.rend(); IdIter++) {
+      auto Id = *IdIter;
       ContextNode *CurNode = getNodeForStackId(Id);
-      // We should only have kept stack ids that had nodes and weren't
-      // recursive.
+      // We should only have kept stack ids that had nodes.
       assert(CurNode);
       assert(!CurNode->Recursive);
-      if (!PrevNode) {
-        PrevNode = CurNode;
-        continue;
-      }
-      auto *Edge = CurNode->findEdgeFromCallee(PrevNode);
+
+      auto *Edge = CurNode->findEdgeFromCaller(PrevNode);
       if (!Edge) {
-        SavedContextIds.clear();
+        Skip = true;
         break;
       }
       PrevNode = CurNode;
+
+      // Update the context ids, which is the intersection of the ids along
+      // all edges in the sequence.
       set_intersect(SavedContextIds, Edge->getContextIds());
 
       // If we now have no context ids for clone, skip this call.
-      if (SavedContextIds.empty())
+      if (SavedContextIds.empty()) {
+        Skip = true;
         break;
+      }
     }
-    if (SavedContextIds.empty())
+    if (Skip)
       continue;
 
     // Create new context node.

snehasish

lgtm

snehasish · 2024-09-24T22:05:25Z

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

    // duplicated context ids. We have to recompute as we might have overlap
    // overlap between the saved context ids for different last nodes, and
    // removed them already during the post order traversal.


Does this comment need to be updated? I guess "recompute" doesn't refer to the getContextIds() call that is now elided.

Also can L1402 ContextNode *FirstNode = ... now be moved closer to where it is used?

Does this comment need to be updated? I guess "recompute" doesn't refer to the getContextIds() call that is now elided.

Right on the latter - it doesn't refer to getContextIds but rather to the recomputation that remains here.

Also can L1402 ContextNode *FirstNode = ... now be moved closer to where it is used?

Done

llvmbot added the llvm:transforms label Sep 24, 2024

teresajohnson requested a review from snehasish September 24, 2024 20:14

Replace comment that was truncated

862301b

snehasish approved these changes Sep 24, 2024

View reviewed changes

Address comments

58fb0df

teresajohnson merged commit 02d6aad into llvm:main Sep 24, 2024
5 of 7 checks passed

teresajohnson added a commit to teresajohnson/llvm-project that referenced this pull request Sep 26, 2024

Fix assert from recent llvm#109857 exposed by this test case

93e319e

teresajohnson mentioned this pull request Sep 26, 2024

Reapply "[MemProf] Streamline and avoid unnecessary context id duplication (#107918)" #110036

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MemProf] Reduce unnecessary context id computation (NFC) #109857

[MemProf] Reduce unnecessary context id computation (NFC) #109857

Uh oh!

teresajohnson commented Sep 24, 2024

Uh oh!

llvmbot commented Sep 24, 2024

Uh oh!

snehasish left a comment

Uh oh!

snehasish Sep 24, 2024

Uh oh!

teresajohnson Sep 24, 2024

Uh oh!

Uh oh!

Uh oh!

[MemProf] Reduce unnecessary context id computation (NFC) #109857

[MemProf] Reduce unnecessary context id computation (NFC) #109857

Uh oh!

Conversation

teresajohnson commented Sep 24, 2024

Uh oh!

llvmbot commented Sep 24, 2024

Uh oh!

snehasish left a comment

Choose a reason for hiding this comment

Uh oh!

snehasish Sep 24, 2024

Choose a reason for hiding this comment

Uh oh!

teresajohnson Sep 24, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!