[MemProf] Streamline and avoid unnecessary context id duplication #107918

teresajohnson · 2024-09-09T21:27:34Z

Sort the list of calls such that those with the same stack ids are also
sorted by function. This allows processing of all matching calls (that
can share a context node) in bulk as they are all adjacent.

This has 2 benefits:

It reduces unnecessary work, specifically the handling to intersect
the context ids with those along the graph edges for the stack ids,
for calls that we know can share a node.
It simplifies detecting when we have matching stack ids but don't
need to duplicate context ids. Specifically, we were previously
still duplicating context ids whenever we saw another call with the
same stack ids, but that isn't necessary if they will share a context
node. With this change we now only duplicate context ids if we see
some that not only have the same ids but also are in different
functions.

This change reduced the amount of context id duplication and provided
reductions in both both peak memory (~~8%) and time (~~%5) for a large
target.

Sort the list of calls such that those with the same stack ids are also sorted by function. This allows processing of all matching calls (that can share a context node) in bulk as they are all adjacent. This has 2 benefits: 1. It reduces unnecessary work, specifically the handling to intersect the context ids with those along the graph edges for the stack ids, for calls that we know can share a node. 2. It simplifies detecting when we have matching stack ids but don't need to duplicate context ids. Specifically, we were previously still duplicating context ids whenever we saw another call with the same stack ids, but that isn't necessary if they will share a context node. With this change we now only duplicate context ids if we see some that not only have the same ids but also are in different functions. This change reduced the amount of context id duplication and provided reductions in both both peak memory (~8%) and time (~%5) for a large target.

llvmbot · 2024-09-09T21:28:07Z

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

Changes

Sort the list of calls such that those with the same stack ids are also
sorted by function. This allows processing of all matching calls (that
can share a context node) in bulk as they are all adjacent.

This has 2 benefits:

It reduces unnecessary work, specifically the handling to intersect
the context ids with those along the graph edges for the stack ids,
for calls that we know can share a node.
It simplifies detecting when we have matching stack ids but don't
need to duplicate context ids. Specifically, we were previously
still duplicating context ids whenever we saw another call with the
same stack ids, but that isn't necessary if they will share a context
node. With this change we now only duplicate context ids if we see
some that not only have the same ids but also are in different
functions.

This change reduced the amount of context id duplication and provided
reductions in both both peak memory (~~8%) and time (~~%5) for a large
target.

Full diff: https://github.com/llvm/llvm-project/pull/107918.diff

1 Files Affected:

(modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+61-35)

diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index f8c150a675e645..5c09bb1800cb2b 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -1467,14 +1467,26 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::updateStackNodes() {
     // of length, and within each length, lexicographically by stack id. The
     // latter is so that we can specially handle calls that have identical stack
     // id sequences (either due to cloning or artificially because of the MIB
-    // context pruning).
-    std::stable_sort(Calls.begin(), Calls.end(),
-                     [](const CallContextInfo &A, const CallContextInfo &B) {
-                       auto &IdsA = std::get<1>(A);
-                       auto &IdsB = std::get<1>(B);
-                       return IdsA.size() > IdsB.size() ||
-                              (IdsA.size() == IdsB.size() && IdsA < IdsB);
-                     });
+    // context pruning). Those with the same Ids are then sorted by function to
+    // facilitate efficiently mapping them to the same context node.
+    // Because the functions are pointers, to ensure a stable sort first assign
+    // each function pointer to its first index in the Calls array, and then use
+    // that to sort by.
+    DenseMap<const FuncTy *, unsigned> FuncToIndex;
+    for (const auto &[Idx, CallCtxInfo] : enumerate(Calls))
+      FuncToIndex.insert({std::get<2>(CallCtxInfo), Idx});
+    std::stable_sort(
+        Calls.begin(), Calls.end(),
+        [&FuncToIndex](const CallContextInfo &A, const CallContextInfo &B) {
+          auto &IdsA = std::get<1>(A);
+          auto &IdsB = std::get<1>(B);
+          auto *FuncA = std::get<2>(A);
+          auto *FuncB = std::get<2>(B);
+          return IdsA.size() > IdsB.size() ||
+                 (IdsA.size() == IdsB.size() &&
+                  (IdsA < IdsB ||
+                   (IdsA == IdsB && FuncToIndex[FuncA] < FuncToIndex[FuncB])));
+        });
 
     // Find the node for the last stack id, which should be the same
     // across all calls recorded for this id, and is the id for this
@@ -1492,18 +1504,35 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::updateStackNodes() {
     DenseSet<uint32_t> LastNodeContextIds = LastNode->getContextIds();
     assert(!LastNodeContextIds.empty());
 
-    // Map from function to the first call from the below list (with matching
-    // stack ids) found in that function. Note that calls from different
-    // functions can have the same stack ids because this is the list of stack
-    // ids that had (possibly pruned) nodes after building the graph from the
-    // allocation MIBs.
-    DenseMap<const FuncTy *, CallInfo> FuncToCallMap;
+#ifndef NDEBUG
+    // Save the set of functions seen for a particular set of the same stack
+    // ids. This is used to ensure that they have been correctly sorted to be
+    // adjacent in the Calls list, since we rely on that to efficiently place
+    // all such matching calls onto the same context node.
+    DenseSet<const FuncTy *> MatchingIdsFuncSet;
+#endif
 
     for (unsigned I = 0; I < Calls.size(); I++) {
       auto &[Call, Ids, Func, SavedContextIds] = Calls[I];
       assert(SavedContextIds.empty());
       assert(LastId == Ids.back());
 
+#ifndef NDEBUG
+      // If this call has a different set of ids than the last one, clear the
+      // set used to ensure they are sorted properly.
+      if (I > 0 && Ids != std::get<1>(Calls[I - 1]))
+        MatchingIdsFuncSet.clear();
+      else
+        // If the prior call had the same stack ids this set would not be empty.
+        // Check if we already have a call that "matches" because it is located
+        // in the same function. If the Calls list was sorted properly we should
+        // not encounter this situation as all such entries should be adjacent
+        // and processed in bulk further below.
+        assert(!MatchingIdsFuncSet.contains(Func));
+
+      MatchingIdsFuncSet.insert(Func);
+#endif
+
       // First compute the context ids for this stack id sequence (the
       // intersection of the context ids of the corresponding nodes).
       // Start with the remaining saved ids for the last node.
@@ -1572,22 +1601,26 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::updateStackNodes() {
           continue;
       }
 
-      // If the prior call had the same stack ids this map would not be empty.
-      // Check if we already have a call that "matches" because it is located
-      // in the same function.
-      if (FuncToCallMap.contains(Func)) {
-        // Record the matching call found for this call, and skip it. We
-        // will subsequently combine it into the same node.
-        CallToMatchingCall[Call] = FuncToCallMap[Func];
-        continue;
-      }
-
       // Check if the next set of stack ids is the same (since the Calls vector
       // of tuples is sorted by the stack ids we can just look at the next one).
+      // If so, save them in the CallToMatchingCall map so that they get
+      // assigned to the same context node, and skip them.
       bool DuplicateContextIds = false;
-      if (I + 1 < Calls.size()) {
-        auto NextIds = std::get<1>(Calls[I + 1]);
-        DuplicateContextIds = Ids == NextIds;
+      for (unsigned J = I + 1; J < Calls.size(); J++) {
+        auto &NextIds = std::get<1>(Calls[J]);
+        if (NextIds != Ids)
+          break;
+        auto *NextFunc = std::get<2>(Calls[J]);
+        if (NextFunc != Func) {
+          // We have another Call with the same ids but that cannot share this
+          // node, must duplicate ids for it.
+          DuplicateContextIds = true;
+          break;
+        }
+        auto &NextCall = std::get<0>(Calls[J]);
+        CallToMatchingCall[NextCall] = Call;
+        // Update I so that it gets incremented correctly to skip this call.
+        I = J;
       }
 
       // If we don't have duplicate context ids, then we can assign all the
@@ -1611,14 +1644,7 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::updateStackNodes() {
         set_subtract(LastNodeContextIds, StackSequenceContextIds);
         if (LastNodeContextIds.empty())
           break;
-        // No longer possibly in a sequence of calls with duplicate stack ids,
-        // clear the map.
-        FuncToCallMap.clear();
-      } else
-        // Record the call with its function, so we can locate it the next time
-        // we find a call from this function when processing the calls with the
-        // same stack ids.
-        FuncToCallMap[Func] = Call;
+      }
     }
   }

snehasish

lgtm.

nit: We should convert the callcontextinfo from a tuple to a struct at some point to make it easier to read the code here without having to refer to the decl.

teresajohnson · 2024-09-10T17:11:59Z

nit: We should convert the callcontextinfo from a tuple to a struct at some point to make it easier to read the code here without having to refer to the decl.

I agree, I'll prep an nfc patch to do that.

llvm-ci · 2024-09-10T17:23:55Z

LLVM Buildbot has detected a new failure on builder clangd-ubuntu-tsan running on clangd-ubuntu-clang while building llvm at step 6 "test-build-clangd-clangd-index-server-clangd-indexer-check-clangd".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/134/builds/5014

Here is the relevant piece of the build log for the reference

Step 6 (test-build-clangd-clangd-index-server-clangd-indexer-check-clangd) failure: test (failure)
******************** TEST 'Clangd :: utf8.test' FAILED ********************
Exit Code: 66

Command Output (stderr):
--
RUN: at line 1: clangd -lit-test < /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/clang-tools-extra/clangd/test/utf8.test | /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/bin/FileCheck -strict-whitespace /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/clang-tools-extra/clangd/test/utf8.test
+ clangd -lit-test
+ /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/bin/FileCheck -strict-whitespace /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/clang-tools-extra/clangd/test/utf8.test
WARNING: ThreadSanitizer: unexpected memory mapping 0x79ffff072000-0x79ffff500000
FATAL: ThreadSanitizer: unexpectedly found incompatible memory layout.
FATAL: Please file a bug.
I[17:22:46.171] clangd version 20.0.0git (https://github.com/llvm/llvm-project.git 524a028f69cdf25503912c396ebda7ebf0065ed2)
I[17:22:46.171] Features: linux+debug+tsan+grpc
I[17:22:46.171] PID: 84610
I[17:22:46.171] Working directory: /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/build/tools/clang/tools/extra/clangd/test
I[17:22:46.171] argv[0]: clangd
I[17:22:46.171] argv[1]: -lit-test
I[17:22:46.171] Starting LSP over stdin/stdout
V[17:22:46.171] <<< {
  "id": 0,
  "jsonrpc": "2.0",
  "method": "initialize",
  "params": {
    "capabilities": {
      "offsetEncoding": [
        "utf-8",
        "utf-16"
      ]
    },
    "processId": 123,
    "rootPath": "clangd",
    "trace": "off"
  }
}

I[17:22:46.171] <-- initialize(0)
I[17:22:46.173] --> reply:initialize(0) 1 ms
==================
WARNING: ThreadSanitizer: signal-unsafe call inside of a signal (pid=84610)
    #0 free <null> (clangd+0xd37eaf) (BuildId: 8c0d4391a55323945b9df670c6f4c15bddec5b8a)
    #1 __call_tls_dtors <null> (libc.so.6+0x438b3) (BuildId: f7307432a8b162377e77a182b6cc2e53d771ec4b)
    #2 SignalHandler(int) /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/llvm/lib/Support/Unix/Signals.inc (clangd+0xf63c8f) (BuildId: 8c0d4391a55323945b9df670c6f4c15bddec5b8a)
    #3 __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) tsan_interceptors_posix.cpp.o (clangd+0xd40d65) (BuildId: 8c0d4391a55323945b9df670c6f4c15bddec5b8a)
    #4 llvm::raw_fd_ostream::write_impl(char const*, unsigned long) /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/llvm/lib/Support/raw_ostream.cpp:766:19 (clangd+0xf49e92) (BuildId: 8c0d4391a55323945b9df670c6f4c15bddec5b8a)
    #5 llvm::raw_ostream::flush_nonempty() /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/llvm/lib/Support/raw_ostream.cpp:224:3 (clangd+0xf4838b) (BuildId: 8c0d4391a55323945b9df670c6f4c15bddec5b8a)
    #6 flush /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/llvm/include/llvm/Support/raw_ostream.h:200:7 (clangd+0x2eba3f4) (BuildId: 8c0d4391a55323945b9df670c6f4c15bddec5b8a)
    #7 clang::clangd::(anonymous namespace)::JSONTransport::sendMessage(llvm::json::Value) /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/clang-tools-extra/clangd/JSONTransport.cpp:141:9 (clangd+0x2eba3f4)
    #8 clang::clangd::(anonymous namespace)::JSONTransport::reply(llvm::json::Value, llvm::Expected<llvm::json::Value>) /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/clang-tools-extra/clangd/JSONTransport.cpp:89:7 (clangd+0x2eb71c3) (BuildId: 8c0d4391a55323945b9df670c6f4c15bddec5b8a)
    #9 clang::clangd::ClangdLSPServer::MessageHandler::ReplyOnce::operator()(llvm::Expected<llvm::json::Value>) /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/clang-tools-extra/clangd/ClangdLSPServer.cpp:381:24 (clangd+0x2c75c1a) (BuildId: 8c0d4391a55323945b9df670c6f4c15bddec5b8a)
    #10 void llvm::detail::UniqueFunctionBase<void, llvm::Expected<llvm::json::Value>>::CallImpl<clang::clangd::ClangdLSPServer::MessageHandler::ReplyOnce>(void*, llvm::Expected<llvm::json::Value>&) /vol/worker/clangd-ubuntu-clang/clangd-ubuntu-tsan/llvm-project/llvm/include/llvm/ADT/FunctionExtras.h:222:12 (clangd+0x2c76cb0) (BuildId: 8c0d4391a55323945b9df670c6f4c15bddec5b8a)
...

As suggested in llvm#107918, improve readability by converting this tuple to a struct.

As suggested in #107918, improve readability by converting this tuple to a struct.

…tion (llvm#107918)" This reverts commit 524a028, but manually so that follow on PR108086 / ae5f1a7 is retained (NFC patch to convert tuple to a struct).

…tion (#107918)" (#108652) This reverts commit 524a028, but manually so that follow on PR108086 / ae5f1a7 is retained (NFC patch to convert tuple to a struct).

…ation (llvm#107918)" (llvm#108652) This reverts commit 12d4769, reapplying 524a028 but with fixes for failures seen in broader testing.

…ation (#107918)" (#110036) This reverts commit 12d4769, reapplying 524a028 but with fixes for failures seen in broader testing.

…ation (llvm#107918)" (llvm#110036) This reverts commit 12d4769, reapplying 524a028 but with fixes for failures seen in broader testing.

llvmbot added the llvm:transforms label Sep 9, 2024

teresajohnson requested a review from snehasish September 9, 2024 21:28

snehasish approved these changes Sep 10, 2024

View reviewed changes

teresajohnson merged commit 524a028 into llvm:main Sep 10, 2024
10 checks passed

teresajohnson mentioned this pull request Sep 10, 2024

[MemProf] Convert CallContextInfo to a struct (NFC) #108086

Merged

teresajohnson added a commit to teresajohnson/llvm-project that referenced this pull request Sep 10, 2024

[MemProf] Convert CallContextInfo to a struct (NFC)

90bf8a6

As suggested in llvm#107918, improve readability by converting this tuple to a struct.

teresajohnson added a commit that referenced this pull request Sep 10, 2024

[MemProf] Convert CallContextInfo to a struct (NFC) (#108086)

ae5f1a7

As suggested in #107918, improve readability by converting this tuple to a struct.

teresajohnson mentioned this pull request Sep 25, 2024

Reapply "[MemProf] Streamline and avoid unnecessary context id duplication (#107918)" #110036

Merged

teresajohnson added a commit that referenced this pull request Sep 26, 2024

Reapply "[MemProf] Streamline and avoid unnecessary context id duplic…

9483ff9

…ation (#107918)" (#110036) This reverts commit 12d4769, reapplying 524a028 but with fixes for failures seen in broader testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MemProf] Streamline and avoid unnecessary context id duplication #107918

[MemProf] Streamline and avoid unnecessary context id duplication #107918

Uh oh!

teresajohnson commented Sep 9, 2024

Uh oh!

llvmbot commented Sep 9, 2024

Uh oh!

snehasish left a comment

Uh oh!

Uh oh!

teresajohnson commented Sep 10, 2024

Uh oh!

llvm-ci commented Sep 10, 2024

Uh oh!

Uh oh!

[MemProf] Streamline and avoid unnecessary context id duplication #107918

[MemProf] Streamline and avoid unnecessary context id duplication #107918

Uh oh!

Conversation

teresajohnson commented Sep 9, 2024

Uh oh!

llvmbot commented Sep 9, 2024

Uh oh!

snehasish left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

teresajohnson commented Sep 10, 2024

Uh oh!

llvm-ci commented Sep 10, 2024

Uh oh!

Uh oh!