Skip to content

[CGSCC] Fix compile time blowup with large RefSCCs #94815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 11, 2024
Merged

Conversation

aeubanks
Copy link
Contributor

@aeubanks aeubanks commented Jun 7, 2024

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC.

There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction().

  1. Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().

  2. LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph.

Compile time: https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC.

There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction().

1) Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().

2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph.
@llvmbot
Copy link
Member

llvmbot commented Jun 7, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Arthur Eubanks (aeubanks)

Changes

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC.

There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction().

  1. Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().

  2. LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph.


Patch is 25.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94815.diff

13 Files Affected:

  • (modified) llvm/include/llvm/Analysis/CGSCCPassManager.h (+4)
  • (modified) llvm/include/llvm/Analysis/LazyCallGraph.h (+11-11)
  • (modified) llvm/lib/Analysis/CGSCCPassManager.cpp (+9-33)
  • (modified) llvm/lib/Analysis/LazyCallGraph.cpp (+66-59)
  • (modified) llvm/lib/Transforms/IPO/Inliner.cpp (+14-15)
  • (modified) llvm/lib/Transforms/Utils/CallGraphUpdater.cpp (+8-4)
  • (modified) llvm/test/Other/cgscc-refscc-mutation-order.ll (-2)
  • (modified) llvm/test/Other/devirt-invalidated.ll (-2)
  • (modified) llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll (-1)
  • (modified) llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll (-1)
  • (modified) llvm/test/Transforms/Inline/cgscc-cycle-debug.ll (-1)
  • (modified) llvm/unittests/Analysis/CGSCCPassManagerTest.cpp (+5-7)
  • (modified) llvm/unittests/Analysis/LazyCallGraphTest.cpp (+17-12)
diff --git a/llvm/include/llvm/Analysis/CGSCCPassManager.h b/llvm/include/llvm/Analysis/CGSCCPassManager.h
index 5654ad46d6eab..b19d53621ac86 100644
--- a/llvm/include/llvm/Analysis/CGSCCPassManager.h
+++ b/llvm/include/llvm/Analysis/CGSCCPassManager.h
@@ -306,6 +306,10 @@ struct CGSCCUpdateResult {
   SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
       &InlinedInternalEdges;
 
+  /// Functions that a pass has considered to be dead to be removed at the end
+  /// of the call graph walk in batch.
+  SmallVector<Function *, 4> &DeadFunctions;
+
   /// Weak VHs to keep track of indirect calls for the purposes of detecting
   /// devirtualization.
   ///
diff --git a/llvm/include/llvm/Analysis/LazyCallGraph.h b/llvm/include/llvm/Analysis/LazyCallGraph.h
index ac8ca207d312b..a8bbf2c578af9 100644
--- a/llvm/include/llvm/Analysis/LazyCallGraph.h
+++ b/llvm/include/llvm/Analysis/LazyCallGraph.h
@@ -832,7 +832,7 @@ class LazyCallGraph {
     /// self-edges and edge removals which result in a spanning tree with no
     /// more cycles.
     [[nodiscard]] SmallVector<RefSCC *, 1>
-    removeInternalRefEdge(Node &SourceN, ArrayRef<Node *> TargetNs);
+    removeInternalRefEdges(ArrayRef<std::pair<Node *, Node *>> Edges);
 
     /// A convenience wrapper around the above to handle trivial cases of
     /// inserting a new call edge.
@@ -1056,18 +1056,18 @@ class LazyCallGraph {
   /// once SCCs have started to be formed. These routines have strict contracts
   /// but may be called at any point.
 
-  /// Remove a dead function from the call graph (typically to delete it).
+  /// Remove dead functions from the call graph.
   ///
-  /// Note that the function must have an empty use list, and the call graph
-  /// must be up-to-date prior to calling this. That means it is by itself in
-  /// a maximal SCC which is by itself in a maximal RefSCC, etc. No structural
-  /// changes result from calling this routine other than potentially removing
-  /// entry points into the call graph.
+  /// These functions should have already been passed to markDeadFunction().
+  /// This is done as a batch to prevent compile time blowup as a result of
+  /// handling a single function at a time.
+  void removeDeadFunctions(ArrayRef<Function *> DeadFs);
+
+  /// Mark a function as dead to be removed later by removeDeadFunctions().
   ///
-  /// If SCC formation has begun, this function must not be part of the current
-  /// DFS in order to call this safely. Typically, the function will have been
-  /// fully visited by the DFS prior to calling this routine.
-  void removeDeadFunction(Function &F);
+  /// The function body should have no incoming or outgoing call or ref edges.
+  /// For example, a function with a single "unreachable" instruction.
+  void markDeadFunction(Function &F);
 
   /// Add a new function split/outlined from an existing function.
   ///
diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp b/llvm/lib/Analysis/CGSCCPassManager.cpp
index 8ae5c3dee6103..2ed1d98f80068 100644
--- a/llvm/lib/Analysis/CGSCCPassManager.cpp
+++ b/llvm/lib/Analysis/CGSCCPassManager.cpp
@@ -158,10 +158,12 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
   SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
       InlinedInternalEdges;
 
+  SmallVector<Function *, 4> DeadFunctions;
+
   CGSCCUpdateResult UR = {
-      RCWorklist,           CWorklist, InvalidRefSCCSet,
-      InvalidSCCSet,        nullptr,   PreservedAnalyses::all(),
-      InlinedInternalEdges, {}};
+      RCWorklist,           CWorklist,     InvalidRefSCCSet,
+      InvalidSCCSet,        nullptr,       PreservedAnalyses::all(),
+      InlinedInternalEdges, DeadFunctions, {}};
 
   // Request PassInstrumentation from analysis manager, will use it to run
   // instrumenting callbacks for the passes later.
@@ -340,6 +342,10 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
     } while (!RCWorklist.empty());
   }
 
+  CG.removeDeadFunctions(DeadFunctions);
+  for (Function *DeadF : DeadFunctions)
+    DeadF->eraseFromParent();
+
 #if defined(EXPENSIVE_CHECKS)
   // Verify that the call graph is still valid.
   CG.verify();
@@ -1030,36 +1036,6 @@ static LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(
     return true;
   });
 
-  // Now do a batch removal of the internal ref edges left.
-  auto NewRefSCCs = RC->removeInternalRefEdge(N, DeadTargets);
-  if (!NewRefSCCs.empty()) {
-    // The old RefSCC is dead, mark it as such.
-    UR.InvalidatedRefSCCs.insert(RC);
-
-    // Note that we don't bother to invalidate analyses as ref-edge
-    // connectivity is not really observable in any way and is intended
-    // exclusively to be used for ordering of transforms rather than for
-    // analysis conclusions.
-
-    // Update RC to the "bottom".
-    assert(G.lookupSCC(N) == C && "Changed the SCC when splitting RefSCCs!");
-    RC = &C->getOuterRefSCC();
-    assert(G.lookupRefSCC(N) == RC && "Failed to update current RefSCC!");
-
-    // The RC worklist is in reverse postorder, so we enqueue the new ones in
-    // RPO except for the one which contains the source node as that is the
-    // "bottom" we will continue processing in the bottom-up walk.
-    assert(NewRefSCCs.front() == RC &&
-           "New current RefSCC not first in the returned list!");
-    for (RefSCC *NewRC : llvm::reverse(llvm::drop_begin(NewRefSCCs))) {
-      assert(NewRC != RC && "Should not encounter the current RefSCC further "
-                            "in the postorder list of new RefSCCs.");
-      UR.RCWorklist.insert(NewRC);
-      LLVM_DEBUG(dbgs() << "Enqueuing a new RefSCC in the update worklist: "
-                        << *NewRC << "\n");
-    }
-  }
-
   // Next demote all the call edges that are now ref edges. This helps make
   // the SCCs small which should minimize the work below as we don't want to
   // form cycles that this would break.
diff --git a/llvm/lib/Analysis/LazyCallGraph.cpp b/llvm/lib/Analysis/LazyCallGraph.cpp
index 48a7ca0061600..11d2b8118bd5a 100644
--- a/llvm/lib/Analysis/LazyCallGraph.cpp
+++ b/llvm/lib/Analysis/LazyCallGraph.cpp
@@ -1160,8 +1160,8 @@ void LazyCallGraph::RefSCC::removeOutgoingEdge(Node &SourceN, Node &TargetN) {
 }
 
 SmallVector<LazyCallGraph::RefSCC *, 1>
-LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
-                                             ArrayRef<Node *> TargetNs) {
+LazyCallGraph::RefSCC::removeInternalRefEdges(
+    ArrayRef<std::pair<Node *, Node *>> Edges) {
   // We return a list of the resulting *new* RefSCCs in post-order.
   SmallVector<RefSCC *, 1> Result;
 
@@ -1179,25 +1179,21 @@ LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
 #endif
 
   // First remove the actual edges.
-  for (Node *TargetN : TargetNs) {
-    assert(!(*SourceN)[*TargetN].isCall() &&
+  for (auto [SourceN, TargetN] : Edges) {
+    assert(!(**SourceN)[*TargetN].isCall() &&
            "Cannot remove a call edge, it must first be made a ref edge");
 
-    bool Removed = SourceN->removeEdgeInternal(*TargetN);
+    bool Removed = (*SourceN)->removeEdgeInternal(*TargetN);
     (void)Removed;
     assert(Removed && "Target not in the edge set for this caller?");
   }
 
   // Direct self references don't impact the ref graph at all.
-  if (llvm::all_of(TargetNs,
-                   [&](Node *TargetN) { return &SourceN == TargetN; }))
-    return Result;
-
   // If all targets are in the same SCC as the source, because no call edges
   // were removed there is no RefSCC structure change.
-  SCC &SourceC = *G->lookupSCC(SourceN);
-  if (llvm::all_of(TargetNs, [&](Node *TargetN) {
-        return G->lookupSCC(*TargetN) == &SourceC;
+  if (llvm::all_of(Edges, [&](std::pair<Node *, Node *> E) {
+        return E.first == E.second ||
+               G->lookupSCC(*E.first) == G->lookupSCC(*E.second);
       }))
     return Result;
 
@@ -1499,7 +1495,7 @@ void LazyCallGraph::removeEdge(Node &SourceN, Node &TargetN) {
   assert(Removed && "Target not in the edge set for this caller?");
 }
 
-void LazyCallGraph::removeDeadFunction(Function &F) {
+void LazyCallGraph::markDeadFunction(Function &F) {
   // FIXME: This is unnecessarily restrictive. We should be able to remove
   // functions which recursively call themselves.
   assert(F.hasZeroLiveUses() &&
@@ -1515,57 +1511,68 @@ void LazyCallGraph::removeDeadFunction(Function &F) {
 
   Node &N = *NI->second;
 
-  // Cannot remove a function which has yet to be visited in the DFS walk, so
-  // if we have a node at all then we must have an SCC and RefSCC.
-  auto CI = SCCMap.find(&N);
-  assert(CI != SCCMap.end() &&
-         "Tried to remove a node without an SCC after DFS walk started!");
-  SCC &C = *CI->second;
-  RefSCC *RC = &C.getOuterRefSCC();
-
-  // In extremely rare cases, we can delete a dead function which is still in a
-  // non-trivial RefSCC. This can happen due to spurious ref edges sticking
-  // around after an IR function reference is removed.
-  if (RC->size() != 1) {
-    SmallVector<Node *, 0> NodesInRC;
-    for (SCC &OtherC : *RC) {
-      for (Node &OtherN : OtherC)
-        NodesInRC.push_back(&OtherN);
+  // Remove all call edges out of dead function.
+  for (Edge E : *N) {
+    if (E.isCall())
+      N->setEdgeKind(E.getNode(), Edge::Ref);
+  }
+}
+
+void LazyCallGraph::removeDeadFunctions(ArrayRef<Function *> DeadFs) {
+  if (DeadFs.empty())
+    return;
+
+  // Group dead functions by the RefSCC they're in.
+  DenseMap<RefSCC *, SmallVector<Node *, 1>> RCs;
+  for (Function *DeadF : DeadFs) {
+    Node *N = lookup(*DeadF);
+#ifndef NDEBUG
+    for (Edge &E : **N) {
+      assert(!E.isCall() &&
+             "dead function shouldn't have any outgoing call edges");
     }
-    for (Node *OtherN : NodesInRC) {
-      if ((*OtherN)->lookup(N)) {
-        auto NewRefSCCs =
-            RC->removeInternalRefEdge(*OtherN, ArrayRef<Node *>(&N));
-        // If we've split into multiple RefSCCs, RC is now invalid and the
-        // RefSCC containing C will be different.
-        if (!NewRefSCCs.empty())
-          RC = &C.getOuterRefSCC();
+#endif
+    RefSCC *RC = lookupRefSCC(*N);
+    RCs[RC].push_back(N);
+  }
+  // Remove outgoing edges from all dead functions. Dead functions should
+  // already have had their call edges removed in markDeadFunction(), so we only
+  // need to worry about spurious ref edges.
+  for (auto [RC, DeadNs] : RCs) {
+    SmallVector<std::pair<Node *, Node *>> InternalEdgesToRemove;
+    for (Node *DeadN : DeadNs) {
+      if (lookupRefSCC(*DeadN) != RC)
+        continue;
+      for (Edge &E : **DeadN) {
+        if (lookupRefSCC(E.getNode()) == RC)
+          InternalEdgesToRemove.push_back({DeadN, &E.getNode()});
+        else
+          RC->removeOutgoingEdge(*DeadN, E.getNode());
       }
     }
+    // We ignore the returned RefSCCs since at this point we're done with CGSCC
+    // iteration and don't need to add it to any worklists.
+    (void)RC->removeInternalRefEdges(InternalEdgesToRemove);
+    for (Node *DeadN : DeadNs) {
+      RefSCC *DeadRC = lookupRefSCC(*DeadN);
+      assert(DeadRC->size() == 1);
+      assert(DeadRC->begin()->size() == 1);
+      DeadRC->clear();
+      DeadRC->G = nullptr;
+    }
   }
+  // Clean up data structures.
+  for (Function *DeadF : DeadFs) {
+    Node &N = *lookup(*DeadF);
 
-  NodeMap.erase(NI);
-  EntryEdges.removeEdgeInternal(N);
-  SCCMap.erase(CI);
-
-  // This node must be the only member of its SCC as it has no callers, and
-  // that SCC must be the only member of a RefSCC as it has no references.
-  // Validate these properties first.
-  assert(C.size() == 1 && "Dead functions must be in a singular SCC");
-  assert(RC->size() == 1 && "Dead functions must be in a singular RefSCC");
-
-  // Finally clear out all the data structures from the node down through the
-  // components. postorder_ref_scc_iterator will skip empty RefSCCs, so no need
-  // to adjust LazyCallGraph data structures.
-  N.clear();
-  N.G = nullptr;
-  N.F = nullptr;
-  C.clear();
-  RC->clear();
-  RC->G = nullptr;
-
-  // Nothing to delete as all the objects are allocated in stable bump pointer
-  // allocators.
+    EntryEdges.removeEdgeInternal(N);
+    SCCMap.erase(SCCMap.find(&N));
+    NodeMap.erase(NodeMap.find(DeadF));
+
+    N.clear();
+    N.G = nullptr;
+    N.F = nullptr;
+  }
 }
 
 // Gets the Edge::Kind from one function to another by looking at the function's
diff --git a/llvm/lib/Transforms/IPO/Inliner.cpp b/llvm/lib/Transforms/IPO/Inliner.cpp
index a9747aebf67bb..1a7b9bc8e3e77 100644
--- a/llvm/lib/Transforms/IPO/Inliner.cpp
+++ b/llvm/lib/Transforms/IPO/Inliner.cpp
@@ -197,6 +197,14 @@ InlinerPass::getAdvisor(const ModuleAnalysisManagerCGSCCProxy::Result &MAM,
   return *IAA->getAdvisor();
 }
 
+void makeFunctionBodyUnreachable(Function &F) {
+  F.dropAllReferences();
+  for (BasicBlock &BB : make_early_inc_range(F))
+    BB.eraseFromParent();
+  BasicBlock *BB = BasicBlock::Create(F.getContext(), "", &F);
+  new UnreachableInst(F.getContext(), BB);
+}
+
 PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                                    CGSCCAnalysisManager &AM, LazyCallGraph &CG,
                                    CGSCCUpdateResult &UR) {
@@ -448,11 +456,9 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                              }),
               Calls.end());
 
-          // Clear the body and queue the function itself for deletion when we
-          // finish inlining and call graph updates.
-          // Note that after this point, it is an error to do anything other
-          // than use the callee's address or delete it.
-          Callee.dropAllReferences();
+          // Clear the body and queue the function itself for call graph
+          // updating when we finish inlining.
+          makeFunctionBodyUnreachable(Callee);
           assert(!is_contained(DeadFunctions, &Callee) &&
                  "Cannot put cause a function to become dead twice!");
           DeadFunctions.push_back(&Callee);
@@ -530,7 +536,7 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
   if (!DeadFunctionsInComdats.empty()) {
     filterDeadComdatFunctions(DeadFunctionsInComdats);
     for (auto *Callee : DeadFunctionsInComdats)
-      Callee->dropAllReferences();
+      makeFunctionBodyUnreachable(*Callee);
     DeadFunctions.append(DeadFunctionsInComdats);
   }
 
@@ -542,25 +548,18 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
   // that is OK as all we do is delete things and add pointers to unordered
   // sets.
   for (Function *DeadF : DeadFunctions) {
+    CG.markDeadFunction(*DeadF);
     // Get the necessary information out of the call graph and nuke the
     // function there. Also, clear out any cached analyses.
     auto &DeadC = *CG.lookupSCC(*CG.lookup(*DeadF));
     FAM.clear(*DeadF, DeadF->getName());
     AM.clear(DeadC, DeadC.getName());
-    auto &DeadRC = DeadC.getOuterRefSCC();
-    CG.removeDeadFunction(*DeadF);
 
     // Mark the relevant parts of the call graph as invalid so we don't visit
     // them.
     UR.InvalidatedSCCs.insert(&DeadC);
-    UR.InvalidatedRefSCCs.insert(&DeadRC);
-
-    // If the updated SCC was the one containing the deleted function, clear it.
-    if (&DeadC == UR.UpdatedC)
-      UR.UpdatedC = nullptr;
 
-    // And delete the actual function from the module.
-    M.getFunctionList().erase(DeadF);
+    UR.DeadFunctions.push_back(DeadF);
 
     ++NumDeleted;
   }
diff --git a/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp b/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
index d0b9884aa9099..e9f37d4044cb0 100644
--- a/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
+++ b/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
@@ -67,16 +67,20 @@ bool CallGraphUpdater::finalize() {
 
         FAM.clear(*DeadFn, DeadFn->getName());
         AM->clear(*DeadSCC, DeadSCC->getName());
-        LCG->removeDeadFunction(*DeadFn);
+        LCG->markDeadFunction(*DeadFn);
 
         // Mark the relevant parts of the call graph as invalid so we don't
         // visit them.
         UR->InvalidatedSCCs.insert(DeadSCC);
         UR->InvalidatedRefSCCs.insert(&DeadRC);
+        UR->DeadFunctions.push_back(DeadFn);
+      } else {
+        // The CGSCC infrastructure batch deletes functions at the end of the
+        // call graph walk, so only erase the function if we're not using that
+        // infrastructure.
+        // The function is now really dead and de-attached from everything.
+        DeadFn->eraseFromParent();
       }
-
-      // The function is now really dead and de-attached from everything.
-      DeadFn->eraseFromParent();
     }
   }
 
diff --git a/llvm/test/Other/cgscc-refscc-mutation-order.ll b/llvm/test/Other/cgscc-refscc-mutation-order.ll
index 13a46503c1f4c..aa20735715763 100644
--- a/llvm/test/Other/cgscc-refscc-mutation-order.ll
+++ b/llvm/test/Other/cgscc-refscc-mutation-order.ll
@@ -15,8 +15,6 @@
 ; CHECK-NOT: InstCombinePass
 ; CHECK: Running pass: InstCombinePass on f4
 ; CHECK-NOT: InstCombinePass
-; CHECK: Running pass: InstCombinePass on f1
-; CHECK-NOT: InstCombinePass
 
 @a1 = alias void (), ptr @f1
 @a2 = alias void (), ptr @f2
diff --git a/llvm/test/Other/devirt-invalidated.ll b/llvm/test/Other/devirt-invalidated.ll
index c3ed5e53b3b04..7926641dda97b 100644
--- a/llvm/test/Other/devirt-invalidated.ll
+++ b/llvm/test/Other/devirt-invalidated.ll
@@ -1,8 +1,6 @@
 ; RUN: opt -passes='devirt<0>(inline)' < %s -S | FileCheck %s
 
-; CHECK-NOT: internal
 ; CHECK: define void @e()
-; CHECK-NOT: internal
 
 define void @e() {
 entry:
diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
index 2df81d6cb1832..1c34fff8dd755 100644
--- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
+++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
@@ -37,7 +37,6 @@ define internal i32 @caller(ptr %B) {
 ; CGSCC-LABEL: define {{[^@]+}}@caller
 ; CGSCC-SAME: () #[[ATTR0]] {
 ; CGSCC-NEXT:    [[A:%.*]] = alloca i32, align 4
-; CGSCC-NEXT:    [[A2:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[A1:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    ret i32 0
 ;
diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
index 7c28de24beea2..b42647840f7cf 100644
--- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
+++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
@@ -54,7 +54,6 @@ define internal i32 @caller(ptr %B) {
 ; CGSCC-LABEL: define {{[^@]+}}@caller
 ; CGSCC-SAME: (ptr noalias nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[B:%.*]]) #[[ATTR0]] {
 ; CGSCC-NEXT:    [[A:%.*]] = alloca i32, align 4
-; CGSCC-NEXT:    [[A2:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[A1:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[C:%.*]] = call i32 @test(ptr noalias nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[B]]) #[[ATTR2:[0-9]+]]
 ; CGSCC-NEXT:    ret i32 0
diff --git a/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
index 40a6b0577e7dd..e79700e8dac62 100644
--- a/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
+++ b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
@@ -13,7 +13,6 @@
 ; CHECK: Running an SCC pass across the RefSCC: [(test1_a, test1_b, test1_c)]
 ; CHECK: Enqueuing the existing SCC in the worklist:(test1_b)
 ; CHECK: Enqueuing a newly formed SCC:(test1_c)
-; CHECK: Enqueuing a new RefSCC in the update worklist: [(test1_b)]
 ; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_c'
 ; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_a'
 ; CHECK: Re-running SCC passes after a refinement of the current SCC: (test1_c, test1_a)
diff --git a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
index b33567dd602b0..aab148c12c416 100644
--- a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
+++ b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
@@ -1659,18 +1659,16 @@ TEST_F(CGSCCPassManagerTest, TestUpdateCGAndAnalysisM...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 7, 2024

@llvm/pr-subscribers-llvm-analysis

Author: Arthur Eubanks (aeubanks)

Changes

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC.

There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction().

  1. Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().

  2. LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph.


Patch is 25.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94815.diff

13 Files Affected:

  • (modified) llvm/include/llvm/Analysis/CGSCCPassManager.h (+4)
  • (modified) llvm/include/llvm/Analysis/LazyCallGraph.h (+11-11)
  • (modified) llvm/lib/Analysis/CGSCCPassManager.cpp (+9-33)
  • (modified) llvm/lib/Analysis/LazyCallGraph.cpp (+66-59)
  • (modified) llvm/lib/Transforms/IPO/Inliner.cpp (+14-15)
  • (modified) llvm/lib/Transforms/Utils/CallGraphUpdater.cpp (+8-4)
  • (modified) llvm/test/Other/cgscc-refscc-mutation-order.ll (-2)
  • (modified) llvm/test/Other/devirt-invalidated.ll (-2)
  • (modified) llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll (-1)
  • (modified) llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll (-1)
  • (modified) llvm/test/Transforms/Inline/cgscc-cycle-debug.ll (-1)
  • (modified) llvm/unittests/Analysis/CGSCCPassManagerTest.cpp (+5-7)
  • (modified) llvm/unittests/Analysis/LazyCallGraphTest.cpp (+17-12)
diff --git a/llvm/include/llvm/Analysis/CGSCCPassManager.h b/llvm/include/llvm/Analysis/CGSCCPassManager.h
index 5654ad46d6eab..b19d53621ac86 100644
--- a/llvm/include/llvm/Analysis/CGSCCPassManager.h
+++ b/llvm/include/llvm/Analysis/CGSCCPassManager.h
@@ -306,6 +306,10 @@ struct CGSCCUpdateResult {
   SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
       &InlinedInternalEdges;
 
+  /// Functions that a pass has considered to be dead to be removed at the end
+  /// of the call graph walk in batch.
+  SmallVector<Function *, 4> &DeadFunctions;
+
   /// Weak VHs to keep track of indirect calls for the purposes of detecting
   /// devirtualization.
   ///
diff --git a/llvm/include/llvm/Analysis/LazyCallGraph.h b/llvm/include/llvm/Analysis/LazyCallGraph.h
index ac8ca207d312b..a8bbf2c578af9 100644
--- a/llvm/include/llvm/Analysis/LazyCallGraph.h
+++ b/llvm/include/llvm/Analysis/LazyCallGraph.h
@@ -832,7 +832,7 @@ class LazyCallGraph {
     /// self-edges and edge removals which result in a spanning tree with no
     /// more cycles.
     [[nodiscard]] SmallVector<RefSCC *, 1>
-    removeInternalRefEdge(Node &SourceN, ArrayRef<Node *> TargetNs);
+    removeInternalRefEdges(ArrayRef<std::pair<Node *, Node *>> Edges);
 
     /// A convenience wrapper around the above to handle trivial cases of
     /// inserting a new call edge.
@@ -1056,18 +1056,18 @@ class LazyCallGraph {
   /// once SCCs have started to be formed. These routines have strict contracts
   /// but may be called at any point.
 
-  /// Remove a dead function from the call graph (typically to delete it).
+  /// Remove dead functions from the call graph.
   ///
-  /// Note that the function must have an empty use list, and the call graph
-  /// must be up-to-date prior to calling this. That means it is by itself in
-  /// a maximal SCC which is by itself in a maximal RefSCC, etc. No structural
-  /// changes result from calling this routine other than potentially removing
-  /// entry points into the call graph.
+  /// These functions should have already been passed to markDeadFunction().
+  /// This is done as a batch to prevent compile time blowup as a result of
+  /// handling a single function at a time.
+  void removeDeadFunctions(ArrayRef<Function *> DeadFs);
+
+  /// Mark a function as dead to be removed later by removeDeadFunctions().
   ///
-  /// If SCC formation has begun, this function must not be part of the current
-  /// DFS in order to call this safely. Typically, the function will have been
-  /// fully visited by the DFS prior to calling this routine.
-  void removeDeadFunction(Function &F);
+  /// The function body should have no incoming or outgoing call or ref edges.
+  /// For example, a function with a single "unreachable" instruction.
+  void markDeadFunction(Function &F);
 
   /// Add a new function split/outlined from an existing function.
   ///
diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp b/llvm/lib/Analysis/CGSCCPassManager.cpp
index 8ae5c3dee6103..2ed1d98f80068 100644
--- a/llvm/lib/Analysis/CGSCCPassManager.cpp
+++ b/llvm/lib/Analysis/CGSCCPassManager.cpp
@@ -158,10 +158,12 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
   SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
       InlinedInternalEdges;
 
+  SmallVector<Function *, 4> DeadFunctions;
+
   CGSCCUpdateResult UR = {
-      RCWorklist,           CWorklist, InvalidRefSCCSet,
-      InvalidSCCSet,        nullptr,   PreservedAnalyses::all(),
-      InlinedInternalEdges, {}};
+      RCWorklist,           CWorklist,     InvalidRefSCCSet,
+      InvalidSCCSet,        nullptr,       PreservedAnalyses::all(),
+      InlinedInternalEdges, DeadFunctions, {}};
 
   // Request PassInstrumentation from analysis manager, will use it to run
   // instrumenting callbacks for the passes later.
@@ -340,6 +342,10 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
     } while (!RCWorklist.empty());
   }
 
+  CG.removeDeadFunctions(DeadFunctions);
+  for (Function *DeadF : DeadFunctions)
+    DeadF->eraseFromParent();
+
 #if defined(EXPENSIVE_CHECKS)
   // Verify that the call graph is still valid.
   CG.verify();
@@ -1030,36 +1036,6 @@ static LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(
     return true;
   });
 
-  // Now do a batch removal of the internal ref edges left.
-  auto NewRefSCCs = RC->removeInternalRefEdge(N, DeadTargets);
-  if (!NewRefSCCs.empty()) {
-    // The old RefSCC is dead, mark it as such.
-    UR.InvalidatedRefSCCs.insert(RC);
-
-    // Note that we don't bother to invalidate analyses as ref-edge
-    // connectivity is not really observable in any way and is intended
-    // exclusively to be used for ordering of transforms rather than for
-    // analysis conclusions.
-
-    // Update RC to the "bottom".
-    assert(G.lookupSCC(N) == C && "Changed the SCC when splitting RefSCCs!");
-    RC = &C->getOuterRefSCC();
-    assert(G.lookupRefSCC(N) == RC && "Failed to update current RefSCC!");
-
-    // The RC worklist is in reverse postorder, so we enqueue the new ones in
-    // RPO except for the one which contains the source node as that is the
-    // "bottom" we will continue processing in the bottom-up walk.
-    assert(NewRefSCCs.front() == RC &&
-           "New current RefSCC not first in the returned list!");
-    for (RefSCC *NewRC : llvm::reverse(llvm::drop_begin(NewRefSCCs))) {
-      assert(NewRC != RC && "Should not encounter the current RefSCC further "
-                            "in the postorder list of new RefSCCs.");
-      UR.RCWorklist.insert(NewRC);
-      LLVM_DEBUG(dbgs() << "Enqueuing a new RefSCC in the update worklist: "
-                        << *NewRC << "\n");
-    }
-  }
-
   // Next demote all the call edges that are now ref edges. This helps make
   // the SCCs small which should minimize the work below as we don't want to
   // form cycles that this would break.
diff --git a/llvm/lib/Analysis/LazyCallGraph.cpp b/llvm/lib/Analysis/LazyCallGraph.cpp
index 48a7ca0061600..11d2b8118bd5a 100644
--- a/llvm/lib/Analysis/LazyCallGraph.cpp
+++ b/llvm/lib/Analysis/LazyCallGraph.cpp
@@ -1160,8 +1160,8 @@ void LazyCallGraph::RefSCC::removeOutgoingEdge(Node &SourceN, Node &TargetN) {
 }
 
 SmallVector<LazyCallGraph::RefSCC *, 1>
-LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
-                                             ArrayRef<Node *> TargetNs) {
+LazyCallGraph::RefSCC::removeInternalRefEdges(
+    ArrayRef<std::pair<Node *, Node *>> Edges) {
   // We return a list of the resulting *new* RefSCCs in post-order.
   SmallVector<RefSCC *, 1> Result;
 
@@ -1179,25 +1179,21 @@ LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
 #endif
 
   // First remove the actual edges.
-  for (Node *TargetN : TargetNs) {
-    assert(!(*SourceN)[*TargetN].isCall() &&
+  for (auto [SourceN, TargetN] : Edges) {
+    assert(!(**SourceN)[*TargetN].isCall() &&
            "Cannot remove a call edge, it must first be made a ref edge");
 
-    bool Removed = SourceN->removeEdgeInternal(*TargetN);
+    bool Removed = (*SourceN)->removeEdgeInternal(*TargetN);
     (void)Removed;
     assert(Removed && "Target not in the edge set for this caller?");
   }
 
   // Direct self references don't impact the ref graph at all.
-  if (llvm::all_of(TargetNs,
-                   [&](Node *TargetN) { return &SourceN == TargetN; }))
-    return Result;
-
   // If all targets are in the same SCC as the source, because no call edges
   // were removed there is no RefSCC structure change.
-  SCC &SourceC = *G->lookupSCC(SourceN);
-  if (llvm::all_of(TargetNs, [&](Node *TargetN) {
-        return G->lookupSCC(*TargetN) == &SourceC;
+  if (llvm::all_of(Edges, [&](std::pair<Node *, Node *> E) {
+        return E.first == E.second ||
+               G->lookupSCC(*E.first) == G->lookupSCC(*E.second);
       }))
     return Result;
 
@@ -1499,7 +1495,7 @@ void LazyCallGraph::removeEdge(Node &SourceN, Node &TargetN) {
   assert(Removed && "Target not in the edge set for this caller?");
 }
 
-void LazyCallGraph::removeDeadFunction(Function &F) {
+void LazyCallGraph::markDeadFunction(Function &F) {
   // FIXME: This is unnecessarily restrictive. We should be able to remove
   // functions which recursively call themselves.
   assert(F.hasZeroLiveUses() &&
@@ -1515,57 +1511,68 @@ void LazyCallGraph::removeDeadFunction(Function &F) {
 
   Node &N = *NI->second;
 
-  // Cannot remove a function which has yet to be visited in the DFS walk, so
-  // if we have a node at all then we must have an SCC and RefSCC.
-  auto CI = SCCMap.find(&N);
-  assert(CI != SCCMap.end() &&
-         "Tried to remove a node without an SCC after DFS walk started!");
-  SCC &C = *CI->second;
-  RefSCC *RC = &C.getOuterRefSCC();
-
-  // In extremely rare cases, we can delete a dead function which is still in a
-  // non-trivial RefSCC. This can happen due to spurious ref edges sticking
-  // around after an IR function reference is removed.
-  if (RC->size() != 1) {
-    SmallVector<Node *, 0> NodesInRC;
-    for (SCC &OtherC : *RC) {
-      for (Node &OtherN : OtherC)
-        NodesInRC.push_back(&OtherN);
+  // Remove all call edges out of dead function.
+  for (Edge E : *N) {
+    if (E.isCall())
+      N->setEdgeKind(E.getNode(), Edge::Ref);
+  }
+}
+
+void LazyCallGraph::removeDeadFunctions(ArrayRef<Function *> DeadFs) {
+  if (DeadFs.empty())
+    return;
+
+  // Group dead functions by the RefSCC they're in.
+  DenseMap<RefSCC *, SmallVector<Node *, 1>> RCs;
+  for (Function *DeadF : DeadFs) {
+    Node *N = lookup(*DeadF);
+#ifndef NDEBUG
+    for (Edge &E : **N) {
+      assert(!E.isCall() &&
+             "dead function shouldn't have any outgoing call edges");
     }
-    for (Node *OtherN : NodesInRC) {
-      if ((*OtherN)->lookup(N)) {
-        auto NewRefSCCs =
-            RC->removeInternalRefEdge(*OtherN, ArrayRef<Node *>(&N));
-        // If we've split into multiple RefSCCs, RC is now invalid and the
-        // RefSCC containing C will be different.
-        if (!NewRefSCCs.empty())
-          RC = &C.getOuterRefSCC();
+#endif
+    RefSCC *RC = lookupRefSCC(*N);
+    RCs[RC].push_back(N);
+  }
+  // Remove outgoing edges from all dead functions. Dead functions should
+  // already have had their call edges removed in markDeadFunction(), so we only
+  // need to worry about spurious ref edges.
+  for (auto [RC, DeadNs] : RCs) {
+    SmallVector<std::pair<Node *, Node *>> InternalEdgesToRemove;
+    for (Node *DeadN : DeadNs) {
+      if (lookupRefSCC(*DeadN) != RC)
+        continue;
+      for (Edge &E : **DeadN) {
+        if (lookupRefSCC(E.getNode()) == RC)
+          InternalEdgesToRemove.push_back({DeadN, &E.getNode()});
+        else
+          RC->removeOutgoingEdge(*DeadN, E.getNode());
       }
     }
+    // We ignore the returned RefSCCs since at this point we're done with CGSCC
+    // iteration and don't need to add it to any worklists.
+    (void)RC->removeInternalRefEdges(InternalEdgesToRemove);
+    for (Node *DeadN : DeadNs) {
+      RefSCC *DeadRC = lookupRefSCC(*DeadN);
+      assert(DeadRC->size() == 1);
+      assert(DeadRC->begin()->size() == 1);
+      DeadRC->clear();
+      DeadRC->G = nullptr;
+    }
   }
+  // Clean up data structures.
+  for (Function *DeadF : DeadFs) {
+    Node &N = *lookup(*DeadF);
 
-  NodeMap.erase(NI);
-  EntryEdges.removeEdgeInternal(N);
-  SCCMap.erase(CI);
-
-  // This node must be the only member of its SCC as it has no callers, and
-  // that SCC must be the only member of a RefSCC as it has no references.
-  // Validate these properties first.
-  assert(C.size() == 1 && "Dead functions must be in a singular SCC");
-  assert(RC->size() == 1 && "Dead functions must be in a singular RefSCC");
-
-  // Finally clear out all the data structures from the node down through the
-  // components. postorder_ref_scc_iterator will skip empty RefSCCs, so no need
-  // to adjust LazyCallGraph data structures.
-  N.clear();
-  N.G = nullptr;
-  N.F = nullptr;
-  C.clear();
-  RC->clear();
-  RC->G = nullptr;
-
-  // Nothing to delete as all the objects are allocated in stable bump pointer
-  // allocators.
+    EntryEdges.removeEdgeInternal(N);
+    SCCMap.erase(SCCMap.find(&N));
+    NodeMap.erase(NodeMap.find(DeadF));
+
+    N.clear();
+    N.G = nullptr;
+    N.F = nullptr;
+  }
 }
 
 // Gets the Edge::Kind from one function to another by looking at the function's
diff --git a/llvm/lib/Transforms/IPO/Inliner.cpp b/llvm/lib/Transforms/IPO/Inliner.cpp
index a9747aebf67bb..1a7b9bc8e3e77 100644
--- a/llvm/lib/Transforms/IPO/Inliner.cpp
+++ b/llvm/lib/Transforms/IPO/Inliner.cpp
@@ -197,6 +197,14 @@ InlinerPass::getAdvisor(const ModuleAnalysisManagerCGSCCProxy::Result &MAM,
   return *IAA->getAdvisor();
 }
 
+void makeFunctionBodyUnreachable(Function &F) {
+  F.dropAllReferences();
+  for (BasicBlock &BB : make_early_inc_range(F))
+    BB.eraseFromParent();
+  BasicBlock *BB = BasicBlock::Create(F.getContext(), "", &F);
+  new UnreachableInst(F.getContext(), BB);
+}
+
 PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                                    CGSCCAnalysisManager &AM, LazyCallGraph &CG,
                                    CGSCCUpdateResult &UR) {
@@ -448,11 +456,9 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                              }),
               Calls.end());
 
-          // Clear the body and queue the function itself for deletion when we
-          // finish inlining and call graph updates.
-          // Note that after this point, it is an error to do anything other
-          // than use the callee's address or delete it.
-          Callee.dropAllReferences();
+          // Clear the body and queue the function itself for call graph
+          // updating when we finish inlining.
+          makeFunctionBodyUnreachable(Callee);
           assert(!is_contained(DeadFunctions, &Callee) &&
                  "Cannot put cause a function to become dead twice!");
           DeadFunctions.push_back(&Callee);
@@ -530,7 +536,7 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
   if (!DeadFunctionsInComdats.empty()) {
     filterDeadComdatFunctions(DeadFunctionsInComdats);
     for (auto *Callee : DeadFunctionsInComdats)
-      Callee->dropAllReferences();
+      makeFunctionBodyUnreachable(*Callee);
     DeadFunctions.append(DeadFunctionsInComdats);
   }
 
@@ -542,25 +548,18 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
   // that is OK as all we do is delete things and add pointers to unordered
   // sets.
   for (Function *DeadF : DeadFunctions) {
+    CG.markDeadFunction(*DeadF);
     // Get the necessary information out of the call graph and nuke the
     // function there. Also, clear out any cached analyses.
     auto &DeadC = *CG.lookupSCC(*CG.lookup(*DeadF));
     FAM.clear(*DeadF, DeadF->getName());
     AM.clear(DeadC, DeadC.getName());
-    auto &DeadRC = DeadC.getOuterRefSCC();
-    CG.removeDeadFunction(*DeadF);
 
     // Mark the relevant parts of the call graph as invalid so we don't visit
     // them.
     UR.InvalidatedSCCs.insert(&DeadC);
-    UR.InvalidatedRefSCCs.insert(&DeadRC);
-
-    // If the updated SCC was the one containing the deleted function, clear it.
-    if (&DeadC == UR.UpdatedC)
-      UR.UpdatedC = nullptr;
 
-    // And delete the actual function from the module.
-    M.getFunctionList().erase(DeadF);
+    UR.DeadFunctions.push_back(DeadF);
 
     ++NumDeleted;
   }
diff --git a/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp b/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
index d0b9884aa9099..e9f37d4044cb0 100644
--- a/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
+++ b/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
@@ -67,16 +67,20 @@ bool CallGraphUpdater::finalize() {
 
         FAM.clear(*DeadFn, DeadFn->getName());
         AM->clear(*DeadSCC, DeadSCC->getName());
-        LCG->removeDeadFunction(*DeadFn);
+        LCG->markDeadFunction(*DeadFn);
 
         // Mark the relevant parts of the call graph as invalid so we don't
         // visit them.
         UR->InvalidatedSCCs.insert(DeadSCC);
         UR->InvalidatedRefSCCs.insert(&DeadRC);
+        UR->DeadFunctions.push_back(DeadFn);
+      } else {
+        // The CGSCC infrastructure batch deletes functions at the end of the
+        // call graph walk, so only erase the function if we're not using that
+        // infrastructure.
+        // The function is now really dead and de-attached from everything.
+        DeadFn->eraseFromParent();
       }
-
-      // The function is now really dead and de-attached from everything.
-      DeadFn->eraseFromParent();
     }
   }
 
diff --git a/llvm/test/Other/cgscc-refscc-mutation-order.ll b/llvm/test/Other/cgscc-refscc-mutation-order.ll
index 13a46503c1f4c..aa20735715763 100644
--- a/llvm/test/Other/cgscc-refscc-mutation-order.ll
+++ b/llvm/test/Other/cgscc-refscc-mutation-order.ll
@@ -15,8 +15,6 @@
 ; CHECK-NOT: InstCombinePass
 ; CHECK: Running pass: InstCombinePass on f4
 ; CHECK-NOT: InstCombinePass
-; CHECK: Running pass: InstCombinePass on f1
-; CHECK-NOT: InstCombinePass
 
 @a1 = alias void (), ptr @f1
 @a2 = alias void (), ptr @f2
diff --git a/llvm/test/Other/devirt-invalidated.ll b/llvm/test/Other/devirt-invalidated.ll
index c3ed5e53b3b04..7926641dda97b 100644
--- a/llvm/test/Other/devirt-invalidated.ll
+++ b/llvm/test/Other/devirt-invalidated.ll
@@ -1,8 +1,6 @@
 ; RUN: opt -passes='devirt<0>(inline)' < %s -S | FileCheck %s
 
-; CHECK-NOT: internal
 ; CHECK: define void @e()
-; CHECK-NOT: internal
 
 define void @e() {
 entry:
diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
index 2df81d6cb1832..1c34fff8dd755 100644
--- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
+++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
@@ -37,7 +37,6 @@ define internal i32 @caller(ptr %B) {
 ; CGSCC-LABEL: define {{[^@]+}}@caller
 ; CGSCC-SAME: () #[[ATTR0]] {
 ; CGSCC-NEXT:    [[A:%.*]] = alloca i32, align 4
-; CGSCC-NEXT:    [[A2:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[A1:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    ret i32 0
 ;
diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
index 7c28de24beea2..b42647840f7cf 100644
--- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
+++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
@@ -54,7 +54,6 @@ define internal i32 @caller(ptr %B) {
 ; CGSCC-LABEL: define {{[^@]+}}@caller
 ; CGSCC-SAME: (ptr noalias nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[B:%.*]]) #[[ATTR0]] {
 ; CGSCC-NEXT:    [[A:%.*]] = alloca i32, align 4
-; CGSCC-NEXT:    [[A2:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[A1:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[C:%.*]] = call i32 @test(ptr noalias nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[B]]) #[[ATTR2:[0-9]+]]
 ; CGSCC-NEXT:    ret i32 0
diff --git a/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
index 40a6b0577e7dd..e79700e8dac62 100644
--- a/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
+++ b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
@@ -13,7 +13,6 @@
 ; CHECK: Running an SCC pass across the RefSCC: [(test1_a, test1_b, test1_c)]
 ; CHECK: Enqueuing the existing SCC in the worklist:(test1_b)
 ; CHECK: Enqueuing a newly formed SCC:(test1_c)
-; CHECK: Enqueuing a new RefSCC in the update worklist: [(test1_b)]
 ; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_c'
 ; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_a'
 ; CHECK: Re-running SCC passes after a refinement of the current SCC: (test1_c, test1_a)
diff --git a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
index b33567dd602b0..aab148c12c416 100644
--- a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
+++ b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
@@ -1659,18 +1659,16 @@ TEST_F(CGSCCPassManagerTest, TestUpdateCGAndAnalysisM...
[truncated]

@aeubanks aeubanks requested review from rnk, nikic and alinas June 7, 2024 23:13
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me, but I'm not really familiar with the LazyCallGraph infrastructure.

for (auto [RC, DeadNs] : RCs) {
SmallVector<std::pair<Node *, Node *>> InternalEdgesToRemove;
for (Node *DeadN : DeadNs) {
if (lookupRefSCC(*DeadN) != RC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this assert lookupRefSCC(*DeadN) == RC?
I'm missing when this can happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes you're right, I've removed the check

@aeubanks aeubanks merged commit 71497cc into llvm:main Jun 11, 2024
5 of 6 checks passed
@aeubanks aeubanks deleted the cgscc branch June 11, 2024 16:50
SvyatoslavScherbina pushed a commit to Kotlin/llvm-project that referenced this pull request Jun 12, 2024
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC
and the call graph updates done as a result of the inliner take a long
time. This is due to RefSCC::removeInternalRefEdges() getting called
many times, each time removing one function from the RefSCC, but each
call to removeInternalRefEdges() is proportional to the size of the
RefSCC.

There are two places that call removeInternalRefEdges(), in
updateCGAndAnalysisManagerForPass() and
LazyCallGraph::removeDeadFunction().

1) Since LazyCallGraph can deal with spurious (edges that exist in the
graph but not in the IR) ref edges, we can simply not call
removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().

2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of
compile time with the above change for the original reason. So instead
we batch all the dead function removals so we can call
removeInternalRefEdges() just once. This requires some changes to
callers of removeDeadFunction() to not actually erase the function from
the module, but defer it to when we batch delete dead functions at the
end of the CGSCC run, leaving the function body as "unreachable" in the
meantime. We still need to ensure that call edges are accurate. I had
also tried deleting dead functions after visiting a RefSCC, but deleting
them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an
SCC (the CGSCC infrastructure deems ref edge refinements as unimportant
when it comes to revisiting SCCs, although that seems to not be
consistently true given these changes) because we don't remove some ref
edges. Specifically for devirt-invalidated.ll this seems to expose an
inlining order issue with the inliner. Probably unimportant for this
type of intentionally weird call graph.

Compile time:
https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u

(cherry picked from commit 71497cc)
SvyatoslavScherbina pushed a commit to Kotlin/llvm-project that referenced this pull request Jun 12, 2024
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC
and the call graph updates done as a result of the inliner take a long
time. This is due to RefSCC::removeInternalRefEdges() getting called
many times, each time removing one function from the RefSCC, but each
call to removeInternalRefEdges() is proportional to the size of the
RefSCC.

There are two places that call removeInternalRefEdges(), in
updateCGAndAnalysisManagerForPass() and
LazyCallGraph::removeDeadFunction().

1) Since LazyCallGraph can deal with spurious (edges that exist in the
graph but not in the IR) ref edges, we can simply not call
removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().

2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of
compile time with the above change for the original reason. So instead
we batch all the dead function removals so we can call
removeInternalRefEdges() just once. This requires some changes to
callers of removeDeadFunction() to not actually erase the function from
the module, but defer it to when we batch delete dead functions at the
end of the CGSCC run, leaving the function body as "unreachable" in the
meantime. We still need to ensure that call edges are accurate. I had
also tried deleting dead functions after visiting a RefSCC, but deleting
them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an
SCC (the CGSCC infrastructure deems ref edge refinements as unimportant
when it comes to revisiting SCCs, although that seems to not be
consistently true given these changes) because we don't remove some ref
edges. Specifically for devirt-invalidated.ll this seems to expose an
inlining order issue with the inliner. Probably unimportant for this
type of intentionally weird call graph.

Compile time:
https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u

(cherry picked from commit 71497cc)
aeubanks added a commit to aeubanks/llvm-project that referenced this pull request Jun 13, 2024
After llvm#94815, this is only used within ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that function.
aeubanks added a commit to aeubanks/llvm-project that referenced this pull request Jun 21, 2024
With llvm#94815, the nodes belonging to dead functions are no longer invalidated, but kept around to batch delete at the end of the call graph walk.

The ML inliner needs to be updated to handle this. This fixes some asserts getting hit, e.g. https://crbug.com/348376263.
@SvyatoslavScherbina
Copy link

Hi folks! Thank you, @aeubanks, for this improvement that fixed one of the blockers (KT-68417) for updating LLVM in Kotlin (KT-49279).
Thanks to @alinas as well, of course.

aeubanks added a commit that referenced this pull request Jun 24, 2024
After #94815, this is only used within
ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that
function.
aeubanks added a commit to aeubanks/llvm-project that referenced this pull request Jul 1, 2024
As opposed to using Node::isDead(), which is no longer accurate after llvm#94815.

This is only used in diagnostics.
aeubanks added a commit that referenced this pull request Jul 2, 2024
As opposed to using Node::isDead(), which is no longer accurate after
#94815.

This is only used in diagnostics.
@wlei-llvm
Copy link
Contributor

Hi @aeubanks , some of our services is broken with below assertion and we bisected to this commit, could you take a look? I think you probably needs a reduced repro from me, I can work on that, but just post here first in case you find some thing obvious to fix. Thanks in advance!

clang++: /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:1175: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(llvm::LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, llvm::CGSCCAnalysisManager &, llvm::CGSCCUpdateResult &, llvm::FunctionAnalysisManager &, bool): Assertion `!UR.InvalidatedRefSCCs.count(RC) && "Invalidated the current RefSCC!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/wlei/local/upstream/llvm-build/bin/clang++ @/tmp/fbcc.89nih15w/compiler.argsfile
1.	<eof> parser at end of file
2.	Optimizer
 #0 0x00007f650c9fe948 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/wlei/local/upstream/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00007f650c9fca20 llvm::sys::RunSignalHandlers() /home/wlei/local/upstream/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00007f650c943dd8 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) /home/wlei/local/upstream/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:73:5
 #3 0x00007f650c943dd8 CrashRecoverySignalHandler(int) /home/wlei/local/upstream/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:390:51
 #4 0x00007f650c03e6f0 __restore_rt (/lib64/libc.so.6+0x3e6f0)
 #5 0x00007f650c08b94c __pthread_kill_implementation (/lib64/libc.so.6+0x8b94c)
 #6 0x00007f650c03e646 gsignal (/lib64/libc.so.6+0x3e646)
 #7 0x00007f650c0287f3 abort (/lib64/libc.so.6+0x287f3)
 #8 0x00007f650c02871b _nl_load_domain.cold (/lib64/libc.so.6+0x2871b)
 #9 0x00007f650c037386 (/lib64/libc.so.6+0x37386)
#10 0x00007f650d50202e updateCGAndAnalysisManagerForPass(llvm::LazyCallGraph&, llvm::LazyCallGraph::SCC&, llvm::LazyCallGraph::Node&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::CGSCCUpdateResult&, llvm::AnalysisManager<llvm::Function>&, bool) /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:1176:3
#11 0x00007f650d4fdef4 llvm::updateCGAndAnalysisManagerForFunctionPass(llvm::LazyCallGraph&, llvm::LazyCallGraph::SCC&, llvm::LazyCallGraph::Node&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::CGSCCUpdateResult&, llvm::AnalysisManager<llvm::Function>&) /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:1190:10
#12 0x00007f650d4fdef4 llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:568:19
#13 0x00007f650c37bc4d llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/wlei/local/upstream/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:90:5
#14 0x00007f650d4f970b llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:90:12
... 

@aeubanks
Copy link
Contributor Author

aeubanks commented Jul 2, 2024

@wlei-llvm yes a reduced repro would be helpful, I can't figure out what's wrong with just the assert

aeubanks added a commit that referenced this pull request Jul 3, 2024
With #94815, the nodes belonging to dead functions are no longer
invalidated, but kept around to batch delete at the end of the call
graph walk.

The ML inliner needs to be updated to handle this. This fixes some
asserts getting hit, e.g. https://crbug.com/348376263.
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
As opposed to using Node::isDead(), which is no longer accurate after
llvm#94815.

This is only used in diagnostics.
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
As opposed to using Node::isDead(), which is no longer accurate after
llvm#94815.

This is only used in diagnostics.
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
With llvm#94815, the nodes belonging to dead functions are no longer
invalidated, but kept around to batch delete at the end of the call
graph walk.

The ML inliner needs to be updated to handle this. This fixes some
asserts getting hit, e.g. https://crbug.com/348376263.
@wlei-llvm
Copy link
Contributor

@wlei-llvm yes a reduced repro would be helpful, I can't figure out what's wrong with just the assert

Hi @aeubanks , here is the reduced repro:

cmd:

opt -disable-output -passes="cgscc(devirt<4>(inline,openmp-opt-cgscc,function<>(simplifycfg<>)))" < reduced.ll

reduced.ll:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-redhat-linux-gnu"
define internal fastcc void @foo() #0 personality ptr null {
  br label %1
1:                                                ; preds = %2, %0
  ret void
2:                                                ; No predecessors!
  call fastcc void @bar()
  br label %1
}
define fastcc void @baz() personality ptr null {
  switch i64 0, label %3 [
    i64 0, label %3
    i64 1, label %3
    i64 2, label %3
    i64 3, label %4
    i64 4, label %3
    i64 5, label %3
    i64 6, label %3
    i64 7, label %3
    i64 8, label %3
    i64 9, label %3
    i64 10, label %3
    i64 11, label %3
    i64 12, label %3
    i64 13, label %3
    i64 14, label %3
    i64 15, label %3
    i64 16, label %3
  ]
1:                                                ; No predecessors!
  %2 = landingpad { ptr, i32 }
          cleanup
  call fastcc void @foo()
  resume { ptr, i32 } zeroinitializer
3:                                                ; preds = %4, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0
  ret void
4:                                                ; preds = %0
  call fastcc void @baz()
  br label %3
}
define fastcc void @bar() personality ptr null {
  call fastcc void @baz()
  ret void
}
; uselistorder directives
uselistorder ptr null, { 3, 4, 0, 5, 6, 1, 7, 8, 2 }
attributes #0 = { "target-cpu"="haswell" }
!llvm.module.flags = !{!0}
!0 = !{i32 7, !"openmp", i32 50}

Please take a look, thanks!

@aeubanks
Copy link
Contributor Author

aeubanks commented Jul 9, 2024

I have a fix for the assert incoming

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
After llvm#94815, this is only used within
ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that
function.
aeubanks added a commit that referenced this pull request Jul 10, 2024
The RefSCC that a function marked dead is in may still contain valid
SCCs that we want to visit. We rely on InvalidatedSCCs to skip SCCs
containing dead functions.

The addition of RefSCCs in CallGraphUpdater to InvalidatedRefSCCs was
causing asserts as reported in #94815. Fix some more CallGraphUpdater
function deletion methods as well.
@aeubanks
Copy link
Contributor Author

#98213 should fix the assert, let me know if you are still seeing issues. sorry for the trouble (we really need to remove CallGraphUpdater...)

@wlei-llvm
Copy link
Contributor

#98213 should fix the assert, let me know if you are still seeing issues. sorry for the trouble (we really need to remove CallGraphUpdater...)

Thanks for the quick fix, I will test it on our services shortly.

aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
The RefSCC that a function marked dead is in may still contain valid
SCCs that we want to visit. We rely on InvalidatedSCCs to skip SCCs
containing dead functions.

The addition of RefSCCs in CallGraphUpdater to InvalidatedRefSCCs was
causing asserts as reported in llvm#94815. Fix some more CallGraphUpdater
function deletion methods as well.
@wlei-llvm
Copy link
Contributor

#98213 should fix the assert, let me know if you are still seeing issues. sorry for the trouble (we really need to remove CallGraphUpdater...)

Thanks for the quick fix, I will test it on our services shortly.

I confirm it fixes the assertion errors on our services, thanks again!

anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 23, 2024
After llvm#94815, this is only used within
ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that
function.

(cherry picked from commit b312cbf)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 23, 2024
With llvm#94815, the nodes belonging to dead functions are no longer
invalidated, but kept around to batch delete at the end of the call
graph walk.

The ML inliner needs to be updated to handle this. This fixes some
asserts getting hit, e.g. https://crbug.com/348376263.

(cherry picked from commit 94471e6)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 23, 2024
The RefSCC that a function marked dead is in may still contain valid
SCCs that we want to visit. We rely on InvalidatedSCCs to skip SCCs
containing dead functions.

The addition of RefSCCs in CallGraphUpdater to InvalidatedRefSCCs was
causing asserts as reported in llvm#94815. Fix some more CallGraphUpdater
function deletion methods as well.

(cherry picked from commit 8d800e6)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 23, 2024
After llvm#94815, this is only used within
ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that
function.

(cherry picked from commit b312cbf)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 23, 2024
With llvm#94815, the nodes belonging to dead functions are no longer
invalidated, but kept around to batch delete at the end of the call
graph walk.

The ML inliner needs to be updated to handle this. This fixes some
asserts getting hit, e.g. https://crbug.com/348376263.

(cherry picked from commit 94471e6)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 23, 2024
The RefSCC that a function marked dead is in may still contain valid
SCCs that we want to visit. We rely on InvalidatedSCCs to skip SCCs
containing dead functions.

The addition of RefSCCs in CallGraphUpdater to InvalidatedRefSCCs was
causing asserts as reported in llvm#94815. Fix some more CallGraphUpdater
function deletion methods as well.

(cherry picked from commit 8d800e6)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 26, 2024
After llvm#94815, this is only used within
ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that
function.

(cherry picked from commit b312cbf)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 26, 2024
With llvm#94815, the nodes belonging to dead functions are no longer
invalidated, but kept around to batch delete at the end of the call
graph walk.

The ML inliner needs to be updated to handle this. This fixes some
asserts getting hit, e.g. https://crbug.com/348376263.

(cherry picked from commit 94471e6)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 26, 2024
The RefSCC that a function marked dead is in may still contain valid
SCCs that we want to visit. We rely on InvalidatedSCCs to skip SCCs
containing dead functions.

The addition of RefSCCs in CallGraphUpdater to InvalidatedRefSCCs was
causing asserts as reported in llvm#94815. Fix some more CallGraphUpdater
function deletion methods as well.

(cherry picked from commit 8d800e6)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 26, 2024
After llvm#94815, this is only used within
ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that
function.

(cherry picked from commit b312cbf)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 26, 2024
With llvm#94815, the nodes belonging to dead functions are no longer
invalidated, but kept around to batch delete at the end of the call
graph walk.

The ML inliner needs to be updated to handle this. This fixes some
asserts getting hit, e.g. https://crbug.com/348376263.

(cherry picked from commit 94471e6)
anton-bannykh pushed a commit to Kotlin/llvm-project that referenced this pull request Aug 26, 2024
The RefSCC that a function marked dead is in may still contain valid
SCCs that we want to visit. We rely on InvalidatedSCCs to skip SCCs
containing dead functions.

The addition of RefSCCs in CallGraphUpdater to InvalidatedRefSCCs was
causing asserts as reported in llvm#94815. Fix some more CallGraphUpdater
function deletion methods as well.

(cherry picked from commit 8d800e6)
luxufan pushed a commit to luxufan/llvm-project that referenced this pull request Mar 31, 2025
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC
and the call graph updates done as a result of the inliner take a long
time. This is due to RefSCC::removeInternalRefEdges() getting called
many times, each time removing one function from the RefSCC, but each
call to removeInternalRefEdges() is proportional to the size of the
RefSCC.

There are two places that call removeInternalRefEdges(), in
updateCGAndAnalysisManagerForPass() and
LazyCallGraph::removeDeadFunction().

1) Since LazyCallGraph can deal with spurious (edges that exist in the
graph but not in the IR) ref edges, we can simply not call
removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().

2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of
compile time with the above change for the original reason. So instead
we batch all the dead function removals so we can call
removeInternalRefEdges() just once. This requires some changes to
callers of removeDeadFunction() to not actually erase the function from
the module, but defer it to when we batch delete dead functions at the
end of the CGSCC run, leaving the function body as "unreachable" in the
meantime. We still need to ensure that call edges are accurate. I had
also tried deleting dead functions after visiting a RefSCC, but deleting
them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an
SCC (the CGSCC infrastructure deems ref edge refinements as unimportant
when it comes to revisiting SCCs, although that seems to not be
consistently true given these changes) because we don't remove some ref
edges. Specifically for devirt-invalidated.ll this seems to expose an
inlining order issue with the inliner. Probably unimportant for this
type of intentionally weird call graph.

Compile time:
https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants