Skip to content

[MemProf] Avoid duplicate edges between nodes #113337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 25, 2024

Conversation

teresajohnson
Copy link
Contributor

The recent change to add support for cloning indirect calls
inadvertantly caused duplicate edges to be created between the same
caller/callee pair. This is due to the new moveCalleeEdgeToNewCaller
not properly guarding the addition of a new edge (ironically I was
testing for that in an assertion, but failed to handle that case
specially otherwise). Now simply move the context ids over to any
existing edge.

This issue in turn led to some assumptions in cloning being violated,
resulting in a later crash.

Add a test for this case to checkNode.

The recent change to add support for cloning indirect calls
inadvertantly caused duplicate edges to be created between the same
caller/callee pair. This is due to the new moveCalleeEdgeToNewCaller
not properly guarding the addition of a new edge (ironically I was
testing for that in an assertion, but failed to handle that case
specially otherwise). Now simply move the context ids over to any
existing edge.

This issue in turn led to some assumptions in cloning being violated,
resulting in a later crash.

Add a test for this case to checkNode.
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Oct 22, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2024

@llvm/pr-subscribers-lto

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

Changes

The recent change to add support for cloning indirect calls
inadvertantly caused duplicate edges to be created between the same
caller/callee pair. This is due to the new moveCalleeEdgeToNewCaller
not properly guarding the addition of a new edge (ironically I was
testing for that in an assertion, but failed to handle that case
specially otherwise). Now simply move the context ids over to any
existing edge.

This issue in turn led to some assumptions in cloning being violated,
resulting in a later crash.

Add a test for this case to checkNode.


Full diff: https://github.com/llvm/llvm-project/pull/113337.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+15-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-icp.ll (+28-6)
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index 4efd683dfca363..8c651fc8a342a4 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -1352,6 +1352,12 @@ static void checkNode(const ContextNode<DerivedCCG, FuncTy, CallTy> *Node,
     }
     assert(NodeContextIds == CalleeEdgeContextIds);
   }
+  // Make sure we don't end up with duplicate edges between the same caller and
+  // callee.
+  DenseSet<ContextNode<DerivedCCG, FuncTy, CallTy> *> NodeSet;
+  for (const auto &E : Node->CalleeEdges)
+    NodeSet.insert(E->Callee);
+  assert(NodeSet.size() == Node->CalleeEdges.size());
 }
 
 template <typename DerivedCCG, typename FuncTy, typename CallTy>
@@ -3125,7 +3131,15 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
     // from the same callers as the old node. That should be true in the current
     // use case, where we will remove None-type edges after copying over all
     // caller edges from the callee.
-    assert(IsNewNode || NewCaller->findEdgeFromCaller(OldCallerEdge->Caller));
+    auto *ExistingCallerEdge =
+        NewCaller->findEdgeFromCaller(OldCallerEdge->Caller);
+    assert(IsNewNode || ExistingCallerEdge);
+    if (ExistingCallerEdge) {
+      ExistingCallerEdge->getContextIds().insert(EdgeContextIdsToMove.begin(),
+                                                 EdgeContextIdsToMove.end());
+      ExistingCallerEdge->AllocTypes |= computeAllocType(EdgeContextIdsToMove);
+      continue;
+    }
     auto NewEdge = std::make_shared<ContextEdge>(
         NewCaller, OldCallerEdge->Caller,
         computeAllocType(EdgeContextIdsToMove), EdgeContextIdsToMove);
diff --git a/llvm/test/ThinLTO/X86/memprof-icp.ll b/llvm/test/ThinLTO/X86/memprof-icp.ll
index f17e19e1f77ef2..99e07189876556 100644
--- a/llvm/test/ThinLTO/X86/memprof-icp.ll
+++ b/llvm/test/ThinLTO/X86/memprof-icp.ll
@@ -186,9 +186,13 @@
 ; REMARKS-MAIN: created clone _ZN2B03barEj.memprof.1
 ; REMARKS-MAIN: call in clone _ZN2B03barEj marked with memprof allocation attribute notcold
 ; REMARKS-MAIN: call in clone _ZN2B03barEj.memprof.1 marked with memprof allocation attribute cold
+; REMARKS-MAIN: call in clone _ZN2B03barEj marked with memprof allocation attribute notcold
+; REMARKS-MAIN: call in clone _ZN2B03barEj.memprof.1 marked with memprof allocation attribute cold
 ; REMARKS-MAIN: created clone _ZN1B3barEj.memprof.1
 ; REMARKS-MAIN: call in clone _ZN1B3barEj marked with memprof allocation attribute notcold
 ; REMARKS-MAIN: call in clone _ZN1B3barEj.memprof.1 marked with memprof allocation attribute cold
+; REMARKS-MAIN: call in clone _ZN1B3barEj marked with memprof allocation attribute notcold
+; REMARKS-MAIN: call in clone _ZN1B3barEj.memprof.1 marked with memprof allocation attribute cold
 ; REMARKS-FOO: created clone _Z3fooR2B0j.memprof.1
 ;; In each version of foo we should have promoted the indirect call to two conditional
 ;; direct calls, one to B::bar and one to B0::bar. The cloned version of foo should call
@@ -208,10 +212,10 @@
 ; REMARKS-FOO: call in clone _ZN1B3barEj marked with memprof allocation attribute notcold
 ; REMARKS-FOO: call in clone _ZN1B3barEj.memprof.1 marked with memprof allocation attribute cold
 
-; STATS: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during whole program analysis
-; STATS-BE: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
-; STATS: 2 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during whole program analysis
-; STATS-BE: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
+; STATS: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during whole program analysis
+; STATS-BE: 8 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
+; STATS: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during whole program analysis
+; STATS-BE: 8 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
 ; STATS: 3 memprof-context-disambiguation - Number of function clones created during whole program analysis
 ; STATS-BE: 5 memprof-context-disambiguation - Number of function clones created during ThinLTO backend
 
@@ -247,8 +251,8 @@
 ; IR: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
 ; IR: attributes #[[COLD]] = {{.*}} "memprof"="cold"
 
-; STATS-BE-DISTRIB: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
-; STATS-BE-DISTRIB: 2 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
+; STATS-BE-DISTRIB: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
+; STATS-BE-DISTRIB: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
 ; STATS-BE-DISTRIB: 3 memprof-context-disambiguation - Number of function clones created during ThinLTO backend
 
 ;--- foo.ll
@@ -298,6 +302,9 @@ declare i32 @_Z3fooR2B0j(ptr, i32)
 define i32 @_ZN2B03barEj(ptr %this, i32 %s) {
 entry:
   %call = tail call ptr @_Znwm(i64 noundef 4) #0, !memprof !33, !callsite !38
+  ;; Second allocation in this function, to ensure that indirect edges to the
+  ;; same callee are partitioned correctly.
+  %call2 = tail call ptr @_Znwm(i64 noundef 4) #0, !memprof !45, !callsite !50
   store volatile i32 0, ptr %call, align 4
   ret i32 0
 }
@@ -311,6 +318,9 @@ declare void @_ZdlPvm()
 define i32 @_ZN1B3barEj(ptr %this, i32 %s) {
 entry:
   %call = tail call ptr @_Znwm(i64 noundef 4) #0, !memprof !39, !callsite !44
+  ;; Second allocation in this function, to ensure that indirect edges to the
+  ;; same callee are partitioned correctly.
+  %call2 = tail call ptr @_Znwm(i64 noundef 4) #0, !memprof !51, !callsite !56
   store volatile i32 0, ptr %call, align 4
   ret i32 0
 }
@@ -367,3 +377,15 @@ attributes #0 = { builtin allocsize(0) }
 !42 = !{!43, !"cold"}
 !43 = !{i64 4457553070050523782, i64 -2101080423462424381, i64 -6490791336773930154}
 !44 = !{i64 4457553070050523782}
+!45 = !{!46, !48}
+!46 = !{!47, !"notcold"}
+!47 = !{i64 456, i64 -2101080423462424381, i64 5188446645037944434}
+!48 = !{!49, !"cold"}
+!49 = !{i64 456, i64 -2101080423462424381, i64 5583420417449503557}
+!50 = !{i64 456}
+!51 = !{!52, !54}
+!52 = !{!53, !"notcold"}
+!53 = !{i64 789, i64 -2101080423462424381, i64 132626519179914298}
+!54 = !{!55, !"cold"}
+!55 = !{i64 789, i64 -2101080423462424381, i64 -6490791336773930154}
+!56 = !{i64 789}

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -1352,6 +1352,12 @@ static void checkNode(const ContextNode<DerivedCCG, FuncTy, CallTy> *Node,
}
assert(NodeContextIds == CalleeEdgeContextIds);
}
// Make sure we don't end up with duplicate edges between the same caller and
// callee.
DenseSet<ContextNode<DerivedCCG, FuncTy, CallTy> *> NodeSet;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only use the NodeSet to check in the assert, I think this will needlessly run in an optmized build. Should we wrap this in an #ifndef NDEBUG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that actually goes for a bunch of stuff in this function. Note this checking is off by default and only invoked under one of the -memprof-verify-* options, so in general it won't add overhead. But in fact I think I should change this into some kind of error that doesn't rely on debug builds, so that it can be utilized for release builds as well. I am going to put this new handling under the ifndef as you suggest, but also add a TODO to make it a non-debug only checking failure.

@teresajohnson teresajohnson merged commit 144ddca into llvm:main Oct 25, 2024
5 of 6 checks passed
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
The recent change to add support for cloning indirect calls
inadvertantly caused duplicate edges to be created between the same
caller/callee pair. This is due to the new moveCalleeEdgeToNewCaller
not properly guarding the addition of a new edge (ironically I was
testing for that in an assertion, but failed to handle that case
specially otherwise). Now simply move the context ids over to any
existing edge.

This issue in turn led to some assumptions in cloning being violated,
resulting in a later crash.

Add a test for this case to checkNode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants