[CGSCC] Fix compile time blowup with large RefSCCs #94815

aeubanks · 2024-06-07T23:12:32Z

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC.

There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction().

Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().
LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph.

Compile time: https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC. There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction(). 1) Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass(). 2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler. Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph.

llvmbot · 2024-06-07T23:13:03Z

@llvm/pr-subscribers-llvm-transforms

Author: Arthur Eubanks (aeubanks)

Changes

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC.

There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction().

Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().
LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph.

Patch is 25.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94815.diff

13 Files Affected:

(modified) llvm/include/llvm/Analysis/CGSCCPassManager.h (+4)
(modified) llvm/include/llvm/Analysis/LazyCallGraph.h (+11-11)
(modified) llvm/lib/Analysis/CGSCCPassManager.cpp (+9-33)
(modified) llvm/lib/Analysis/LazyCallGraph.cpp (+66-59)
(modified) llvm/lib/Transforms/IPO/Inliner.cpp (+14-15)
(modified) llvm/lib/Transforms/Utils/CallGraphUpdater.cpp (+8-4)
(modified) llvm/test/Other/cgscc-refscc-mutation-order.ll (-2)
(modified) llvm/test/Other/devirt-invalidated.ll (-2)
(modified) llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll (-1)
(modified) llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll (-1)
(modified) llvm/test/Transforms/Inline/cgscc-cycle-debug.ll (-1)
(modified) llvm/unittests/Analysis/CGSCCPassManagerTest.cpp (+5-7)
(modified) llvm/unittests/Analysis/LazyCallGraphTest.cpp (+17-12)

diff --git a/llvm/include/llvm/Analysis/CGSCCPassManager.h b/llvm/include/llvm/Analysis/CGSCCPassManager.h
index 5654ad46d6eab..b19d53621ac86 100644
--- a/llvm/include/llvm/Analysis/CGSCCPassManager.h
+++ b/llvm/include/llvm/Analysis/CGSCCPassManager.h
@@ -306,6 +306,10 @@ struct CGSCCUpdateResult {
   SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
       &InlinedInternalEdges;
 
+  /// Functions that a pass has considered to be dead to be removed at the end
+  /// of the call graph walk in batch.
+  SmallVector<Function *, 4> &DeadFunctions;
+
   /// Weak VHs to keep track of indirect calls for the purposes of detecting
   /// devirtualization.
   ///
diff --git a/llvm/include/llvm/Analysis/LazyCallGraph.h b/llvm/include/llvm/Analysis/LazyCallGraph.h
index ac8ca207d312b..a8bbf2c578af9 100644
--- a/llvm/include/llvm/Analysis/LazyCallGraph.h
+++ b/llvm/include/llvm/Analysis/LazyCallGraph.h
@@ -832,7 +832,7 @@ class LazyCallGraph {
     /// self-edges and edge removals which result in a spanning tree with no
     /// more cycles.
     [[nodiscard]] SmallVector<RefSCC *, 1>
-    removeInternalRefEdge(Node &SourceN, ArrayRef<Node *> TargetNs);
+    removeInternalRefEdges(ArrayRef<std::pair<Node *, Node *>> Edges);
 
     /// A convenience wrapper around the above to handle trivial cases of
     /// inserting a new call edge.
@@ -1056,18 +1056,18 @@ class LazyCallGraph {
   /// once SCCs have started to be formed. These routines have strict contracts
   /// but may be called at any point.
 
-  /// Remove a dead function from the call graph (typically to delete it).
+  /// Remove dead functions from the call graph.
   ///
-  /// Note that the function must have an empty use list, and the call graph
-  /// must be up-to-date prior to calling this. That means it is by itself in
-  /// a maximal SCC which is by itself in a maximal RefSCC, etc. No structural
-  /// changes result from calling this routine other than potentially removing
-  /// entry points into the call graph.
+  /// These functions should have already been passed to markDeadFunction().
+  /// This is done as a batch to prevent compile time blowup as a result of
+  /// handling a single function at a time.
+  void removeDeadFunctions(ArrayRef<Function *> DeadFs);
+
+  /// Mark a function as dead to be removed later by removeDeadFunctions().
   ///
-  /// If SCC formation has begun, this function must not be part of the current
-  /// DFS in order to call this safely. Typically, the function will have been
-  /// fully visited by the DFS prior to calling this routine.
-  void removeDeadFunction(Function &F);
+  /// The function body should have no incoming or outgoing call or ref edges.
+  /// For example, a function with a single "unreachable" instruction.
+  void markDeadFunction(Function &F);
 
   /// Add a new function split/outlined from an existing function.
   ///
diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp b/llvm/lib/Analysis/CGSCCPassManager.cpp
index 8ae5c3dee6103..2ed1d98f80068 100644
--- a/llvm/lib/Analysis/CGSCCPassManager.cpp
+++ b/llvm/lib/Analysis/CGSCCPassManager.cpp
@@ -158,10 +158,12 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
   SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
       InlinedInternalEdges;
 
+  SmallVector<Function *, 4> DeadFunctions;
+
   CGSCCUpdateResult UR = {
-      RCWorklist,           CWorklist, InvalidRefSCCSet,
-      InvalidSCCSet,        nullptr,   PreservedAnalyses::all(),
-      InlinedInternalEdges, {}};
+      RCWorklist,           CWorklist,     InvalidRefSCCSet,
+      InvalidSCCSet,        nullptr,       PreservedAnalyses::all(),
+      InlinedInternalEdges, DeadFunctions, {}};
 
   // Request PassInstrumentation from analysis manager, will use it to run
   // instrumenting callbacks for the passes later.
@@ -340,6 +342,10 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
     } while (!RCWorklist.empty());
   }
 
+  CG.removeDeadFunctions(DeadFunctions);
+  for (Function *DeadF : DeadFunctions)
+    DeadF->eraseFromParent();
+
 #if defined(EXPENSIVE_CHECKS)
   // Verify that the call graph is still valid.
   CG.verify();
@@ -1030,36 +1036,6 @@ static LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(
     return true;
   });
 
-  // Now do a batch removal of the internal ref edges left.
-  auto NewRefSCCs = RC->removeInternalRefEdge(N, DeadTargets);
-  if (!NewRefSCCs.empty()) {
-    // The old RefSCC is dead, mark it as such.
-    UR.InvalidatedRefSCCs.insert(RC);
-
-    // Note that we don't bother to invalidate analyses as ref-edge
-    // connectivity is not really observable in any way and is intended
-    // exclusively to be used for ordering of transforms rather than for
-    // analysis conclusions.
-
-    // Update RC to the "bottom".
-    assert(G.lookupSCC(N) == C && "Changed the SCC when splitting RefSCCs!");
-    RC = &C->getOuterRefSCC();
-    assert(G.lookupRefSCC(N) == RC && "Failed to update current RefSCC!");
-
-    // The RC worklist is in reverse postorder, so we enqueue the new ones in
-    // RPO except for the one which contains the source node as that is the
-    // "bottom" we will continue processing in the bottom-up walk.
-    assert(NewRefSCCs.front() == RC &&
-           "New current RefSCC not first in the returned list!");
-    for (RefSCC *NewRC : llvm::reverse(llvm::drop_begin(NewRefSCCs))) {
-      assert(NewRC != RC && "Should not encounter the current RefSCC further "
-                            "in the postorder list of new RefSCCs.");
-      UR.RCWorklist.insert(NewRC);
-      LLVM_DEBUG(dbgs() << "Enqueuing a new RefSCC in the update worklist: "
-                        << *NewRC << "\n");
-    }
-  }
-
   // Next demote all the call edges that are now ref edges. This helps make
   // the SCCs small which should minimize the work below as we don't want to
   // form cycles that this would break.
diff --git a/llvm/lib/Analysis/LazyCallGraph.cpp b/llvm/lib/Analysis/LazyCallGraph.cpp
index 48a7ca0061600..11d2b8118bd5a 100644
--- a/llvm/lib/Analysis/LazyCallGraph.cpp
+++ b/llvm/lib/Analysis/LazyCallGraph.cpp
@@ -1160,8 +1160,8 @@ void LazyCallGraph::RefSCC::removeOutgoingEdge(Node &SourceN, Node &TargetN) {
 }
 
 SmallVector<LazyCallGraph::RefSCC *, 1>
-LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
-                                             ArrayRef<Node *> TargetNs) {
+LazyCallGraph::RefSCC::removeInternalRefEdges(
+    ArrayRef<std::pair<Node *, Node *>> Edges) {
   // We return a list of the resulting *new* RefSCCs in post-order.
   SmallVector<RefSCC *, 1> Result;
 
@@ -1179,25 +1179,21 @@ LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
 #endif
 
   // First remove the actual edges.
-  for (Node *TargetN : TargetNs) {
-    assert(!(*SourceN)[*TargetN].isCall() &&
+  for (auto [SourceN, TargetN] : Edges) {
+    assert(!(**SourceN)[*TargetN].isCall() &&
            "Cannot remove a call edge, it must first be made a ref edge");
 
-    bool Removed = SourceN->removeEdgeInternal(*TargetN);
+    bool Removed = (*SourceN)->removeEdgeInternal(*TargetN);
     (void)Removed;
     assert(Removed && "Target not in the edge set for this caller?");
   }
 
   // Direct self references don't impact the ref graph at all.
-  if (llvm::all_of(TargetNs,
-                   [&](Node *TargetN) { return &SourceN == TargetN; }))
-    return Result;
-
   // If all targets are in the same SCC as the source, because no call edges
   // were removed there is no RefSCC structure change.
-  SCC &SourceC = *G->lookupSCC(SourceN);
-  if (llvm::all_of(TargetNs, [&](Node *TargetN) {
-        return G->lookupSCC(*TargetN) == &SourceC;
+  if (llvm::all_of(Edges, [&](std::pair<Node *, Node *> E) {
+        return E.first == E.second ||
+               G->lookupSCC(*E.first) == G->lookupSCC(*E.second);
       }))
     return Result;
 
@@ -1499,7 +1495,7 @@ void LazyCallGraph::removeEdge(Node &SourceN, Node &TargetN) {
   assert(Removed && "Target not in the edge set for this caller?");
 }
 
-void LazyCallGraph::removeDeadFunction(Function &F) {
+void LazyCallGraph::markDeadFunction(Function &F) {
   // FIXME: This is unnecessarily restrictive. We should be able to remove
   // functions which recursively call themselves.
   assert(F.hasZeroLiveUses() &&
@@ -1515,57 +1511,68 @@ void LazyCallGraph::removeDeadFunction(Function &F) {
 
   Node &N = *NI->second;
 
-  // Cannot remove a function which has yet to be visited in the DFS walk, so
-  // if we have a node at all then we must have an SCC and RefSCC.
-  auto CI = SCCMap.find(&N);
-  assert(CI != SCCMap.end() &&
-         "Tried to remove a node without an SCC after DFS walk started!");
-  SCC &C = *CI->second;
-  RefSCC *RC = &C.getOuterRefSCC();
-
-  // In extremely rare cases, we can delete a dead function which is still in a
-  // non-trivial RefSCC. This can happen due to spurious ref edges sticking
-  // around after an IR function reference is removed.
-  if (RC->size() != 1) {
-    SmallVector<Node *, 0> NodesInRC;
-    for (SCC &OtherC : *RC) {
-      for (Node &OtherN : OtherC)
-        NodesInRC.push_back(&OtherN);
+  // Remove all call edges out of dead function.
+  for (Edge E : *N) {
+    if (E.isCall())
+      N->setEdgeKind(E.getNode(), Edge::Ref);
+  }
+}
+
+void LazyCallGraph::removeDeadFunctions(ArrayRef<Function *> DeadFs) {
+  if (DeadFs.empty())
+    return;
+
+  // Group dead functions by the RefSCC they're in.
+  DenseMap<RefSCC *, SmallVector<Node *, 1>> RCs;
+  for (Function *DeadF : DeadFs) {
+    Node *N = lookup(*DeadF);
+#ifndef NDEBUG
+    for (Edge &E : **N) {
+      assert(!E.isCall() &&
+             "dead function shouldn't have any outgoing call edges");
     }
-    for (Node *OtherN : NodesInRC) {
-      if ((*OtherN)->lookup(N)) {
-        auto NewRefSCCs =
-            RC->removeInternalRefEdge(*OtherN, ArrayRef<Node *>(&N));
-        // If we've split into multiple RefSCCs, RC is now invalid and the
-        // RefSCC containing C will be different.
-        if (!NewRefSCCs.empty())
-          RC = &C.getOuterRefSCC();
+#endif
+    RefSCC *RC = lookupRefSCC(*N);
+    RCs[RC].push_back(N);
+  }
+  // Remove outgoing edges from all dead functions. Dead functions should
+  // already have had their call edges removed in markDeadFunction(), so we only
+  // need to worry about spurious ref edges.
+  for (auto [RC, DeadNs] : RCs) {
+    SmallVector<std::pair<Node *, Node *>> InternalEdgesToRemove;
+    for (Node *DeadN : DeadNs) {
+      if (lookupRefSCC(*DeadN) != RC)
+        continue;
+      for (Edge &E : **DeadN) {
+        if (lookupRefSCC(E.getNode()) == RC)
+          InternalEdgesToRemove.push_back({DeadN, &E.getNode()});
+        else
+          RC->removeOutgoingEdge(*DeadN, E.getNode());
       }
     }
+    // We ignore the returned RefSCCs since at this point we're done with CGSCC
+    // iteration and don't need to add it to any worklists.
+    (void)RC->removeInternalRefEdges(InternalEdgesToRemove);
+    for (Node *DeadN : DeadNs) {
+      RefSCC *DeadRC = lookupRefSCC(*DeadN);
+      assert(DeadRC->size() == 1);
+      assert(DeadRC->begin()->size() == 1);
+      DeadRC->clear();
+      DeadRC->G = nullptr;
+    }
   }
+  // Clean up data structures.
+  for (Function *DeadF : DeadFs) {
+    Node &N = *lookup(*DeadF);
 
-  NodeMap.erase(NI);
-  EntryEdges.removeEdgeInternal(N);
-  SCCMap.erase(CI);
-
-  // This node must be the only member of its SCC as it has no callers, and
-  // that SCC must be the only member of a RefSCC as it has no references.
-  // Validate these properties first.
-  assert(C.size() == 1 && "Dead functions must be in a singular SCC");
-  assert(RC->size() == 1 && "Dead functions must be in a singular RefSCC");
-
-  // Finally clear out all the data structures from the node down through the
-  // components. postorder_ref_scc_iterator will skip empty RefSCCs, so no need
-  // to adjust LazyCallGraph data structures.
-  N.clear();
-  N.G = nullptr;
-  N.F = nullptr;
-  C.clear();
-  RC->clear();
-  RC->G = nullptr;
-
-  // Nothing to delete as all the objects are allocated in stable bump pointer
-  // allocators.
+    EntryEdges.removeEdgeInternal(N);
+    SCCMap.erase(SCCMap.find(&N));
+    NodeMap.erase(NodeMap.find(DeadF));
+
+    N.clear();
+    N.G = nullptr;
+    N.F = nullptr;
+  }
 }
 
 // Gets the Edge::Kind from one function to another by looking at the function's
diff --git a/llvm/lib/Transforms/IPO/Inliner.cpp b/llvm/lib/Transforms/IPO/Inliner.cpp
index a9747aebf67bb..1a7b9bc8e3e77 100644
--- a/llvm/lib/Transforms/IPO/Inliner.cpp
+++ b/llvm/lib/Transforms/IPO/Inliner.cpp
@@ -197,6 +197,14 @@ InlinerPass::getAdvisor(const ModuleAnalysisManagerCGSCCProxy::Result &MAM,
   return *IAA->getAdvisor();
 }
 
+void makeFunctionBodyUnreachable(Function &F) {
+  F.dropAllReferences();
+  for (BasicBlock &BB : make_early_inc_range(F))
+    BB.eraseFromParent();
+  BasicBlock *BB = BasicBlock::Create(F.getContext(), "", &F);
+  new UnreachableInst(F.getContext(), BB);
+}
+
 PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                                    CGSCCAnalysisManager &AM, LazyCallGraph &CG,
                                    CGSCCUpdateResult &UR) {
@@ -448,11 +456,9 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                              }),
               Calls.end());
 
-          // Clear the body and queue the function itself for deletion when we
-          // finish inlining and call graph updates.
-          // Note that after this point, it is an error to do anything other
-          // than use the callee's address or delete it.
-          Callee.dropAllReferences();
+          // Clear the body and queue the function itself for call graph
+          // updating when we finish inlining.
+          makeFunctionBodyUnreachable(Callee);
           assert(!is_contained(DeadFunctions, &Callee) &&
                  "Cannot put cause a function to become dead twice!");
           DeadFunctions.push_back(&Callee);
@@ -530,7 +536,7 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
   if (!DeadFunctionsInComdats.empty()) {
     filterDeadComdatFunctions(DeadFunctionsInComdats);
     for (auto *Callee : DeadFunctionsInComdats)
-      Callee->dropAllReferences();
+      makeFunctionBodyUnreachable(*Callee);
     DeadFunctions.append(DeadFunctionsInComdats);
   }
 
@@ -542,25 +548,18 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
   // that is OK as all we do is delete things and add pointers to unordered
   // sets.
   for (Function *DeadF : DeadFunctions) {
+    CG.markDeadFunction(*DeadF);
     // Get the necessary information out of the call graph and nuke the
     // function there. Also, clear out any cached analyses.
     auto &DeadC = *CG.lookupSCC(*CG.lookup(*DeadF));
     FAM.clear(*DeadF, DeadF->getName());
     AM.clear(DeadC, DeadC.getName());
-    auto &DeadRC = DeadC.getOuterRefSCC();
-    CG.removeDeadFunction(*DeadF);
 
     // Mark the relevant parts of the call graph as invalid so we don't visit
     // them.
     UR.InvalidatedSCCs.insert(&DeadC);
-    UR.InvalidatedRefSCCs.insert(&DeadRC);
-
-    // If the updated SCC was the one containing the deleted function, clear it.
-    if (&DeadC == UR.UpdatedC)
-      UR.UpdatedC = nullptr;
 
-    // And delete the actual function from the module.
-    M.getFunctionList().erase(DeadF);
+    UR.DeadFunctions.push_back(DeadF);
 
     ++NumDeleted;
   }
diff --git a/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp b/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
index d0b9884aa9099..e9f37d4044cb0 100644
--- a/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
+++ b/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
@@ -67,16 +67,20 @@ bool CallGraphUpdater::finalize() {
 
         FAM.clear(*DeadFn, DeadFn->getName());
         AM->clear(*DeadSCC, DeadSCC->getName());
-        LCG->removeDeadFunction(*DeadFn);
+        LCG->markDeadFunction(*DeadFn);
 
         // Mark the relevant parts of the call graph as invalid so we don't
         // visit them.
         UR->InvalidatedSCCs.insert(DeadSCC);
         UR->InvalidatedRefSCCs.insert(&DeadRC);
+        UR->DeadFunctions.push_back(DeadFn);
+      } else {
+        // The CGSCC infrastructure batch deletes functions at the end of the
+        // call graph walk, so only erase the function if we're not using that
+        // infrastructure.
+        // The function is now really dead and de-attached from everything.
+        DeadFn->eraseFromParent();
       }
-
-      // The function is now really dead and de-attached from everything.
-      DeadFn->eraseFromParent();
     }
   }
 
diff --git a/llvm/test/Other/cgscc-refscc-mutation-order.ll b/llvm/test/Other/cgscc-refscc-mutation-order.ll
index 13a46503c1f4c..aa20735715763 100644
--- a/llvm/test/Other/cgscc-refscc-mutation-order.ll
+++ b/llvm/test/Other/cgscc-refscc-mutation-order.ll
@@ -15,8 +15,6 @@
 ; CHECK-NOT: InstCombinePass
 ; CHECK: Running pass: InstCombinePass on f4
 ; CHECK-NOT: InstCombinePass
-; CHECK: Running pass: InstCombinePass on f1
-; CHECK-NOT: InstCombinePass
 
 @a1 = alias void (), ptr @f1
 @a2 = alias void (), ptr @f2
diff --git a/llvm/test/Other/devirt-invalidated.ll b/llvm/test/Other/devirt-invalidated.ll
index c3ed5e53b3b04..7926641dda97b 100644
--- a/llvm/test/Other/devirt-invalidated.ll
+++ b/llvm/test/Other/devirt-invalidated.ll
@@ -1,8 +1,6 @@
 ; RUN: opt -passes='devirt<0>(inline)' < %s -S | FileCheck %s
 
-; CHECK-NOT: internal
 ; CHECK: define void @e()
-; CHECK-NOT: internal
 
 define void @e() {
 entry:
diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
index 2df81d6cb1832..1c34fff8dd755 100644
--- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
+++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
@@ -37,7 +37,6 @@ define internal i32 @caller(ptr %B) {
 ; CGSCC-LABEL: define {{[^@]+}}@caller
 ; CGSCC-SAME: () #[[ATTR0]] {
 ; CGSCC-NEXT:    [[A:%.*]] = alloca i32, align 4
-; CGSCC-NEXT:    [[A2:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[A1:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    ret i32 0
 ;
diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
index 7c28de24beea2..b42647840f7cf 100644
--- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
+++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
@@ -54,7 +54,6 @@ define internal i32 @caller(ptr %B) {
 ; CGSCC-LABEL: define {{[^@]+}}@caller
 ; CGSCC-SAME: (ptr noalias nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[B:%.*]]) #[[ATTR0]] {
 ; CGSCC-NEXT:    [[A:%.*]] = alloca i32, align 4
-; CGSCC-NEXT:    [[A2:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[A1:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[C:%.*]] = call i32 @test(ptr noalias nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[B]]) #[[ATTR2:[0-9]+]]
 ; CGSCC-NEXT:    ret i32 0
diff --git a/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
index 40a6b0577e7dd..e79700e8dac62 100644
--- a/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
+++ b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
@@ -13,7 +13,6 @@
 ; CHECK: Running an SCC pass across the RefSCC: [(test1_a, test1_b, test1_c)]
 ; CHECK: Enqueuing the existing SCC in the worklist:(test1_b)
 ; CHECK: Enqueuing a newly formed SCC:(test1_c)
-; CHECK: Enqueuing a new RefSCC in the update worklist: [(test1_b)]
 ; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_c'
 ; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_a'
 ; CHECK: Re-running SCC passes after a refinement of the current SCC: (test1_c, test1_a)
diff --git a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
index b33567dd602b0..aab148c12c416 100644
--- a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
+++ b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
@@ -1659,18 +1659,16 @@ TEST_F(CGSCCPassManagerTest, TestUpdateCGAndAnalysisM...
[truncated]

llvmbot · 2024-06-07T23:13:03Z

@llvm/pr-subscribers-llvm-analysis

Author: Arthur Eubanks (aeubanks)

Changes

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC.

There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction().

Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().
LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph.

Patch is 25.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94815.diff

13 Files Affected:

(modified) llvm/include/llvm/Analysis/CGSCCPassManager.h (+4)
(modified) llvm/include/llvm/Analysis/LazyCallGraph.h (+11-11)
(modified) llvm/lib/Analysis/CGSCCPassManager.cpp (+9-33)
(modified) llvm/lib/Analysis/LazyCallGraph.cpp (+66-59)
(modified) llvm/lib/Transforms/IPO/Inliner.cpp (+14-15)
(modified) llvm/lib/Transforms/Utils/CallGraphUpdater.cpp (+8-4)
(modified) llvm/test/Other/cgscc-refscc-mutation-order.ll (-2)
(modified) llvm/test/Other/devirt-invalidated.ll (-2)
(modified) llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll (-1)
(modified) llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll (-1)
(modified) llvm/test/Transforms/Inline/cgscc-cycle-debug.ll (-1)
(modified) llvm/unittests/Analysis/CGSCCPassManagerTest.cpp (+5-7)
(modified) llvm/unittests/Analysis/LazyCallGraphTest.cpp (+17-12)

diff --git a/llvm/include/llvm/Analysis/CGSCCPassManager.h b/llvm/include/llvm/Analysis/CGSCCPassManager.h
index 5654ad46d6eab..b19d53621ac86 100644
--- a/llvm/include/llvm/Analysis/CGSCCPassManager.h
+++ b/llvm/include/llvm/Analysis/CGSCCPassManager.h
@@ -306,6 +306,10 @@ struct CGSCCUpdateResult {
   SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
       &InlinedInternalEdges;
 
+  /// Functions that a pass has considered to be dead to be removed at the end
+  /// of the call graph walk in batch.
+  SmallVector<Function *, 4> &DeadFunctions;
+
   /// Weak VHs to keep track of indirect calls for the purposes of detecting
   /// devirtualization.
   ///
diff --git a/llvm/include/llvm/Analysis/LazyCallGraph.h b/llvm/include/llvm/Analysis/LazyCallGraph.h
index ac8ca207d312b..a8bbf2c578af9 100644
--- a/llvm/include/llvm/Analysis/LazyCallGraph.h
+++ b/llvm/include/llvm/Analysis/LazyCallGraph.h
@@ -832,7 +832,7 @@ class LazyCallGraph {
     /// self-edges and edge removals which result in a spanning tree with no
     /// more cycles.
     [[nodiscard]] SmallVector<RefSCC *, 1>
-    removeInternalRefEdge(Node &SourceN, ArrayRef<Node *> TargetNs);
+    removeInternalRefEdges(ArrayRef<std::pair<Node *, Node *>> Edges);
 
     /// A convenience wrapper around the above to handle trivial cases of
     /// inserting a new call edge.
@@ -1056,18 +1056,18 @@ class LazyCallGraph {
   /// once SCCs have started to be formed. These routines have strict contracts
   /// but may be called at any point.
 
-  /// Remove a dead function from the call graph (typically to delete it).
+  /// Remove dead functions from the call graph.
   ///
-  /// Note that the function must have an empty use list, and the call graph
-  /// must be up-to-date prior to calling this. That means it is by itself in
-  /// a maximal SCC which is by itself in a maximal RefSCC, etc. No structural
-  /// changes result from calling this routine other than potentially removing
-  /// entry points into the call graph.
+  /// These functions should have already been passed to markDeadFunction().
+  /// This is done as a batch to prevent compile time blowup as a result of
+  /// handling a single function at a time.
+  void removeDeadFunctions(ArrayRef<Function *> DeadFs);
+
+  /// Mark a function as dead to be removed later by removeDeadFunctions().
   ///
-  /// If SCC formation has begun, this function must not be part of the current
-  /// DFS in order to call this safely. Typically, the function will have been
-  /// fully visited by the DFS prior to calling this routine.
-  void removeDeadFunction(Function &F);
+  /// The function body should have no incoming or outgoing call or ref edges.
+  /// For example, a function with a single "unreachable" instruction.
+  void markDeadFunction(Function &F);
 
   /// Add a new function split/outlined from an existing function.
   ///
diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp b/llvm/lib/Analysis/CGSCCPassManager.cpp
index 8ae5c3dee6103..2ed1d98f80068 100644
--- a/llvm/lib/Analysis/CGSCCPassManager.cpp
+++ b/llvm/lib/Analysis/CGSCCPassManager.cpp
@@ -158,10 +158,12 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
   SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
       InlinedInternalEdges;
 
+  SmallVector<Function *, 4> DeadFunctions;
+
   CGSCCUpdateResult UR = {
-      RCWorklist,           CWorklist, InvalidRefSCCSet,
-      InvalidSCCSet,        nullptr,   PreservedAnalyses::all(),
-      InlinedInternalEdges, {}};
+      RCWorklist,           CWorklist,     InvalidRefSCCSet,
+      InvalidSCCSet,        nullptr,       PreservedAnalyses::all(),
+      InlinedInternalEdges, DeadFunctions, {}};
 
   // Request PassInstrumentation from analysis manager, will use it to run
   // instrumenting callbacks for the passes later.
@@ -340,6 +342,10 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
     } while (!RCWorklist.empty());
   }
 
+  CG.removeDeadFunctions(DeadFunctions);
+  for (Function *DeadF : DeadFunctions)
+    DeadF->eraseFromParent();
+
 #if defined(EXPENSIVE_CHECKS)
   // Verify that the call graph is still valid.
   CG.verify();
@@ -1030,36 +1036,6 @@ static LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(
     return true;
   });
 
-  // Now do a batch removal of the internal ref edges left.
-  auto NewRefSCCs = RC->removeInternalRefEdge(N, DeadTargets);
-  if (!NewRefSCCs.empty()) {
-    // The old RefSCC is dead, mark it as such.
-    UR.InvalidatedRefSCCs.insert(RC);
-
-    // Note that we don't bother to invalidate analyses as ref-edge
-    // connectivity is not really observable in any way and is intended
-    // exclusively to be used for ordering of transforms rather than for
-    // analysis conclusions.
-
-    // Update RC to the "bottom".
-    assert(G.lookupSCC(N) == C && "Changed the SCC when splitting RefSCCs!");
-    RC = &C->getOuterRefSCC();
-    assert(G.lookupRefSCC(N) == RC && "Failed to update current RefSCC!");
-
-    // The RC worklist is in reverse postorder, so we enqueue the new ones in
-    // RPO except for the one which contains the source node as that is the
-    // "bottom" we will continue processing in the bottom-up walk.
-    assert(NewRefSCCs.front() == RC &&
-           "New current RefSCC not first in the returned list!");
-    for (RefSCC *NewRC : llvm::reverse(llvm::drop_begin(NewRefSCCs))) {
-      assert(NewRC != RC && "Should not encounter the current RefSCC further "
-                            "in the postorder list of new RefSCCs.");
-      UR.RCWorklist.insert(NewRC);
-      LLVM_DEBUG(dbgs() << "Enqueuing a new RefSCC in the update worklist: "
-                        << *NewRC << "\n");
-    }
-  }
-
   // Next demote all the call edges that are now ref edges. This helps make
   // the SCCs small which should minimize the work below as we don't want to
   // form cycles that this would break.
diff --git a/llvm/lib/Analysis/LazyCallGraph.cpp b/llvm/lib/Analysis/LazyCallGraph.cpp
index 48a7ca0061600..11d2b8118bd5a 100644
--- a/llvm/lib/Analysis/LazyCallGraph.cpp
+++ b/llvm/lib/Analysis/LazyCallGraph.cpp
@@ -1160,8 +1160,8 @@ void LazyCallGraph::RefSCC::removeOutgoingEdge(Node &SourceN, Node &TargetN) {
 }
 
 SmallVector<LazyCallGraph::RefSCC *, 1>
-LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
-                                             ArrayRef<Node *> TargetNs) {
+LazyCallGraph::RefSCC::removeInternalRefEdges(
+    ArrayRef<std::pair<Node *, Node *>> Edges) {
   // We return a list of the resulting *new* RefSCCs in post-order.
   SmallVector<RefSCC *, 1> Result;
 
@@ -1179,25 +1179,21 @@ LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
 #endif
 
   // First remove the actual edges.
-  for (Node *TargetN : TargetNs) {
-    assert(!(*SourceN)[*TargetN].isCall() &&
+  for (auto [SourceN, TargetN] : Edges) {
+    assert(!(**SourceN)[*TargetN].isCall() &&
            "Cannot remove a call edge, it must first be made a ref edge");
 
-    bool Removed = SourceN->removeEdgeInternal(*TargetN);
+    bool Removed = (*SourceN)->removeEdgeInternal(*TargetN);
     (void)Removed;
     assert(Removed && "Target not in the edge set for this caller?");
   }
 
   // Direct self references don't impact the ref graph at all.
-  if (llvm::all_of(TargetNs,
-                   [&](Node *TargetN) { return &SourceN == TargetN; }))
-    return Result;
-
   // If all targets are in the same SCC as the source, because no call edges
   // were removed there is no RefSCC structure change.
-  SCC &SourceC = *G->lookupSCC(SourceN);
-  if (llvm::all_of(TargetNs, [&](Node *TargetN) {
-        return G->lookupSCC(*TargetN) == &SourceC;
+  if (llvm::all_of(Edges, [&](std::pair<Node *, Node *> E) {
+        return E.first == E.second ||
+               G->lookupSCC(*E.first) == G->lookupSCC(*E.second);
       }))
     return Result;
 
@@ -1499,7 +1495,7 @@ void LazyCallGraph::removeEdge(Node &SourceN, Node &TargetN) {
   assert(Removed && "Target not in the edge set for this caller?");
 }
 
-void LazyCallGraph::removeDeadFunction(Function &F) {
+void LazyCallGraph::markDeadFunction(Function &F) {
   // FIXME: This is unnecessarily restrictive. We should be able to remove
   // functions which recursively call themselves.
   assert(F.hasZeroLiveUses() &&
@@ -1515,57 +1511,68 @@ void LazyCallGraph::removeDeadFunction(Function &F) {
 
   Node &N = *NI->second;
 
-  // Cannot remove a function which has yet to be visited in the DFS walk, so
-  // if we have a node at all then we must have an SCC and RefSCC.
-  auto CI = SCCMap.find(&N);
-  assert(CI != SCCMap.end() &&
-         "Tried to remove a node without an SCC after DFS walk started!");
-  SCC &C = *CI->second;
-  RefSCC *RC = &C.getOuterRefSCC();
-
-  // In extremely rare cases, we can delete a dead function which is still in a
-  // non-trivial RefSCC. This can happen due to spurious ref edges sticking
-  // around after an IR function reference is removed.
-  if (RC->size() != 1) {
-    SmallVector<Node *, 0> NodesInRC;
-    for (SCC &OtherC : *RC) {
-      for (Node &OtherN : OtherC)
-        NodesInRC.push_back(&OtherN);
+  // Remove all call edges out of dead function.
+  for (Edge E : *N) {
+    if (E.isCall())
+      N->setEdgeKind(E.getNode(), Edge::Ref);
+  }
+}
+
+void LazyCallGraph::removeDeadFunctions(ArrayRef<Function *> DeadFs) {
+  if (DeadFs.empty())
+    return;
+
+  // Group dead functions by the RefSCC they're in.
+  DenseMap<RefSCC *, SmallVector<Node *, 1>> RCs;
+  for (Function *DeadF : DeadFs) {
+    Node *N = lookup(*DeadF);
+#ifndef NDEBUG
+    for (Edge &E : **N) {
+      assert(!E.isCall() &&
+             "dead function shouldn't have any outgoing call edges");
     }
-    for (Node *OtherN : NodesInRC) {
-      if ((*OtherN)->lookup(N)) {
-        auto NewRefSCCs =
-            RC->removeInternalRefEdge(*OtherN, ArrayRef<Node *>(&N));
-        // If we've split into multiple RefSCCs, RC is now invalid and the
-        // RefSCC containing C will be different.
-        if (!NewRefSCCs.empty())
-          RC = &C.getOuterRefSCC();
+#endif
+    RefSCC *RC = lookupRefSCC(*N);
+    RCs[RC].push_back(N);
+  }
+  // Remove outgoing edges from all dead functions. Dead functions should
+  // already have had their call edges removed in markDeadFunction(), so we only
+  // need to worry about spurious ref edges.
+  for (auto [RC, DeadNs] : RCs) {
+    SmallVector<std::pair<Node *, Node *>> InternalEdgesToRemove;
+    for (Node *DeadN : DeadNs) {
+      if (lookupRefSCC(*DeadN) != RC)
+        continue;
+      for (Edge &E : **DeadN) {
+        if (lookupRefSCC(E.getNode()) == RC)
+          InternalEdgesToRemove.push_back({DeadN, &E.getNode()});
+        else
+          RC->removeOutgoingEdge(*DeadN, E.getNode());
       }
     }
+    // We ignore the returned RefSCCs since at this point we're done with CGSCC
+    // iteration and don't need to add it to any worklists.
+    (void)RC->removeInternalRefEdges(InternalEdgesToRemove);
+    for (Node *DeadN : DeadNs) {
+      RefSCC *DeadRC = lookupRefSCC(*DeadN);
+      assert(DeadRC->size() == 1);
+      assert(DeadRC->begin()->size() == 1);
+      DeadRC->clear();
+      DeadRC->G = nullptr;
+    }
   }
+  // Clean up data structures.
+  for (Function *DeadF : DeadFs) {
+    Node &N = *lookup(*DeadF);
 
-  NodeMap.erase(NI);
-  EntryEdges.removeEdgeInternal(N);
-  SCCMap.erase(CI);
-
-  // This node must be the only member of its SCC as it has no callers, and
-  // that SCC must be the only member of a RefSCC as it has no references.
-  // Validate these properties first.
-  assert(C.size() == 1 && "Dead functions must be in a singular SCC");
-  assert(RC->size() == 1 && "Dead functions must be in a singular RefSCC");
-
-  // Finally clear out all the data structures from the node down through the
-  // components. postorder_ref_scc_iterator will skip empty RefSCCs, so no need
-  // to adjust LazyCallGraph data structures.
-  N.clear();
-  N.G = nullptr;
-  N.F = nullptr;
-  C.clear();
-  RC->clear();
-  RC->G = nullptr;
-
-  // Nothing to delete as all the objects are allocated in stable bump pointer
-  // allocators.
+    EntryEdges.removeEdgeInternal(N);
+    SCCMap.erase(SCCMap.find(&N));
+    NodeMap.erase(NodeMap.find(DeadF));
+
+    N.clear();
+    N.G = nullptr;
+    N.F = nullptr;
+  }
 }
 
 // Gets the Edge::Kind from one function to another by looking at the function's
diff --git a/llvm/lib/Transforms/IPO/Inliner.cpp b/llvm/lib/Transforms/IPO/Inliner.cpp
index a9747aebf67bb..1a7b9bc8e3e77 100644
--- a/llvm/lib/Transforms/IPO/Inliner.cpp
+++ b/llvm/lib/Transforms/IPO/Inliner.cpp
@@ -197,6 +197,14 @@ InlinerPass::getAdvisor(const ModuleAnalysisManagerCGSCCProxy::Result &MAM,
   return *IAA->getAdvisor();
 }
 
+void makeFunctionBodyUnreachable(Function &F) {
+  F.dropAllReferences();
+  for (BasicBlock &BB : make_early_inc_range(F))
+    BB.eraseFromParent();
+  BasicBlock *BB = BasicBlock::Create(F.getContext(), "", &F);
+  new UnreachableInst(F.getContext(), BB);
+}
+
 PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                                    CGSCCAnalysisManager &AM, LazyCallGraph &CG,
                                    CGSCCUpdateResult &UR) {
@@ -448,11 +456,9 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
                              }),
               Calls.end());
 
-          // Clear the body and queue the function itself for deletion when we
-          // finish inlining and call graph updates.
-          // Note that after this point, it is an error to do anything other
-          // than use the callee's address or delete it.
-          Callee.dropAllReferences();
+          // Clear the body and queue the function itself for call graph
+          // updating when we finish inlining.
+          makeFunctionBodyUnreachable(Callee);
           assert(!is_contained(DeadFunctions, &Callee) &&
                  "Cannot put cause a function to become dead twice!");
           DeadFunctions.push_back(&Callee);
@@ -530,7 +536,7 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
   if (!DeadFunctionsInComdats.empty()) {
     filterDeadComdatFunctions(DeadFunctionsInComdats);
     for (auto *Callee : DeadFunctionsInComdats)
-      Callee->dropAllReferences();
+      makeFunctionBodyUnreachable(*Callee);
     DeadFunctions.append(DeadFunctionsInComdats);
   }
 
@@ -542,25 +548,18 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
   // that is OK as all we do is delete things and add pointers to unordered
   // sets.
   for (Function *DeadF : DeadFunctions) {
+    CG.markDeadFunction(*DeadF);
     // Get the necessary information out of the call graph and nuke the
     // function there. Also, clear out any cached analyses.
     auto &DeadC = *CG.lookupSCC(*CG.lookup(*DeadF));
     FAM.clear(*DeadF, DeadF->getName());
     AM.clear(DeadC, DeadC.getName());
-    auto &DeadRC = DeadC.getOuterRefSCC();
-    CG.removeDeadFunction(*DeadF);
 
     // Mark the relevant parts of the call graph as invalid so we don't visit
     // them.
     UR.InvalidatedSCCs.insert(&DeadC);
-    UR.InvalidatedRefSCCs.insert(&DeadRC);
-
-    // If the updated SCC was the one containing the deleted function, clear it.
-    if (&DeadC == UR.UpdatedC)
-      UR.UpdatedC = nullptr;
 
-    // And delete the actual function from the module.
-    M.getFunctionList().erase(DeadF);
+    UR.DeadFunctions.push_back(DeadF);
 
     ++NumDeleted;
   }
diff --git a/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp b/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
index d0b9884aa9099..e9f37d4044cb0 100644
--- a/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
+++ b/llvm/lib/Transforms/Utils/CallGraphUpdater.cpp
@@ -67,16 +67,20 @@ bool CallGraphUpdater::finalize() {
 
         FAM.clear(*DeadFn, DeadFn->getName());
         AM->clear(*DeadSCC, DeadSCC->getName());
-        LCG->removeDeadFunction(*DeadFn);
+        LCG->markDeadFunction(*DeadFn);
 
         // Mark the relevant parts of the call graph as invalid so we don't
         // visit them.
         UR->InvalidatedSCCs.insert(DeadSCC);
         UR->InvalidatedRefSCCs.insert(&DeadRC);
+        UR->DeadFunctions.push_back(DeadFn);
+      } else {
+        // The CGSCC infrastructure batch deletes functions at the end of the
+        // call graph walk, so only erase the function if we're not using that
+        // infrastructure.
+        // The function is now really dead and de-attached from everything.
+        DeadFn->eraseFromParent();
       }
-
-      // The function is now really dead and de-attached from everything.
-      DeadFn->eraseFromParent();
     }
   }
 
diff --git a/llvm/test/Other/cgscc-refscc-mutation-order.ll b/llvm/test/Other/cgscc-refscc-mutation-order.ll
index 13a46503c1f4c..aa20735715763 100644
--- a/llvm/test/Other/cgscc-refscc-mutation-order.ll
+++ b/llvm/test/Other/cgscc-refscc-mutation-order.ll
@@ -15,8 +15,6 @@
 ; CHECK-NOT: InstCombinePass
 ; CHECK: Running pass: InstCombinePass on f4
 ; CHECK-NOT: InstCombinePass
-; CHECK: Running pass: InstCombinePass on f1
-; CHECK-NOT: InstCombinePass
 
 @a1 = alias void (), ptr @f1
 @a2 = alias void (), ptr @f2
diff --git a/llvm/test/Other/devirt-invalidated.ll b/llvm/test/Other/devirt-invalidated.ll
index c3ed5e53b3b04..7926641dda97b 100644
--- a/llvm/test/Other/devirt-invalidated.ll
+++ b/llvm/test/Other/devirt-invalidated.ll
@@ -1,8 +1,6 @@
 ; RUN: opt -passes='devirt<0>(inline)' < %s -S | FileCheck %s
 
-; CHECK-NOT: internal
 ; CHECK: define void @e()
-; CHECK-NOT: internal
 
 define void @e() {
 entry:
diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
index 2df81d6cb1832..1c34fff8dd755 100644
--- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
+++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll
@@ -37,7 +37,6 @@ define internal i32 @caller(ptr %B) {
 ; CGSCC-LABEL: define {{[^@]+}}@caller
 ; CGSCC-SAME: () #[[ATTR0]] {
 ; CGSCC-NEXT:    [[A:%.*]] = alloca i32, align 4
-; CGSCC-NEXT:    [[A2:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[A1:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    ret i32 0
 ;
diff --git a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
index 7c28de24beea2..b42647840f7cf 100644
--- a/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
+++ b/llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead_2.ll
@@ -54,7 +54,6 @@ define internal i32 @caller(ptr %B) {
 ; CGSCC-LABEL: define {{[^@]+}}@caller
 ; CGSCC-SAME: (ptr noalias nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[B:%.*]]) #[[ATTR0]] {
 ; CGSCC-NEXT:    [[A:%.*]] = alloca i32, align 4
-; CGSCC-NEXT:    [[A2:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[A1:%.*]] = alloca i8, i32 0, align 4
 ; CGSCC-NEXT:    [[C:%.*]] = call i32 @test(ptr noalias nocapture nofree noundef nonnull writeonly align 4 dereferenceable(4) [[B]]) #[[ATTR2:[0-9]+]]
 ; CGSCC-NEXT:    ret i32 0
diff --git a/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
index 40a6b0577e7dd..e79700e8dac62 100644
--- a/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
+++ b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
@@ -13,7 +13,6 @@
 ; CHECK: Running an SCC pass across the RefSCC: [(test1_a, test1_b, test1_c)]
 ; CHECK: Enqueuing the existing SCC in the worklist:(test1_b)
 ; CHECK: Enqueuing a newly formed SCC:(test1_c)
-; CHECK: Enqueuing a new RefSCC in the update worklist: [(test1_b)]
 ; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_c'
 ; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_a'
 ; CHECK: Re-running SCC passes after a refinement of the current SCC: (test1_c, test1_a)
diff --git a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
index b33567dd602b0..aab148c12c416 100644
--- a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
+++ b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
@@ -1659,18 +1659,16 @@ TEST_F(CGSCCPassManagerTest, TestUpdateCGAndAnalysisM...
[truncated]

nikic

This looks reasonable to me, but I'm not really familiar with the LazyCallGraph infrastructure.

alinas · 2024-06-10T21:28:36Z

llvm/lib/Analysis/LazyCallGraph.cpp

+  for (auto [RC, DeadNs] : RCs) {
+    SmallVector<std::pair<Node *, Node *>> InternalEdgesToRemove;
+    for (Node *DeadN : DeadNs) {
+      if (lookupRefSCC(*DeadN) != RC)


Should this assert lookupRefSCC(*DeadN) == RC?
I'm missing when this can happen.

ah yes you're right, I've removed the check

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC. There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction(). 1) Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass(). 2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler. Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph. Compile time: https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u (cherry picked from commit 71497cc)

After llvm#94815, this is only used within ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that function.

With llvm#94815, the nodes belonging to dead functions are no longer invalidated, but kept around to batch delete at the end of the call graph walk. The ML inliner needs to be updated to handle this. This fixes some asserts getting hit, e.g. https://crbug.com/348376263.

SvyatoslavScherbina · 2024-06-24T09:09:32Z

Hi folks! Thank you, @aeubanks, for this improvement that fixed one of the blockers (KT-68417) for updating LLVM in Kotlin (KT-49279).
Thanks to @alinas as well, of course.

After #94815, this is only used within ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that function.

As opposed to using Node::isDead(), which is no longer accurate after llvm#94815. This is only used in diagnostics.

As opposed to using Node::isDead(), which is no longer accurate after #94815. This is only used in diagnostics.

wlei-llvm · 2024-07-02T20:03:15Z

Hi @aeubanks , some of our services is broken with below assertion and we bisected to this commit, could you take a look? I think you probably needs a reduced repro from me, I can work on that, but just post here first in case you find some thing obvious to fix. Thanks in advance!

clang++: /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:1175: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(llvm::LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, llvm::CGSCCAnalysisManager &, llvm::CGSCCUpdateResult &, llvm::FunctionAnalysisManager &, bool): Assertion `!UR.InvalidatedRefSCCs.count(RC) && "Invalidated the current RefSCC!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/wlei/local/upstream/llvm-build/bin/clang++ @/tmp/fbcc.89nih15w/compiler.argsfile
1.	<eof> parser at end of file
2.	Optimizer
 #0 0x00007f650c9fe948 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/wlei/local/upstream/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00007f650c9fca20 llvm::sys::RunSignalHandlers() /home/wlei/local/upstream/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00007f650c943dd8 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) /home/wlei/local/upstream/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:73:5
 #3 0x00007f650c943dd8 CrashRecoverySignalHandler(int) /home/wlei/local/upstream/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:390:51
 #4 0x00007f650c03e6f0 __restore_rt (/lib64/libc.so.6+0x3e6f0)
 #5 0x00007f650c08b94c __pthread_kill_implementation (/lib64/libc.so.6+0x8b94c)
 #6 0x00007f650c03e646 gsignal (/lib64/libc.so.6+0x3e646)
 #7 0x00007f650c0287f3 abort (/lib64/libc.so.6+0x287f3)
 #8 0x00007f650c02871b _nl_load_domain.cold (/lib64/libc.so.6+0x2871b)
 #9 0x00007f650c037386 (/lib64/libc.so.6+0x37386)
#10 0x00007f650d50202e updateCGAndAnalysisManagerForPass(llvm::LazyCallGraph&, llvm::LazyCallGraph::SCC&, llvm::LazyCallGraph::Node&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::CGSCCUpdateResult&, llvm::AnalysisManager<llvm::Function>&, bool) /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:1176:3
#11 0x00007f650d4fdef4 llvm::updateCGAndAnalysisManagerForFunctionPass(llvm::LazyCallGraph&, llvm::LazyCallGraph::SCC&, llvm::LazyCallGraph::Node&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::CGSCCUpdateResult&, llvm::AnalysisManager<llvm::Function>&) /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:1190:10
#12 0x00007f650d4fdef4 llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:568:19
#13 0x00007f650c37bc4d llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/wlei/local/upstream/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:90:5
#14 0x00007f650d4f970b llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/wlei/local/upstream/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:90:12
...

aeubanks · 2024-07-02T21:27:50Z

@wlei-llvm yes a reduced repro would be helpful, I can't figure out what's wrong with just the assert

With #94815, the nodes belonging to dead functions are no longer invalidated, but kept around to batch delete at the end of the call graph walk. The ML inliner needs to be updated to handle this. This fixes some asserts getting hit, e.g. https://crbug.com/348376263.

As opposed to using Node::isDead(), which is no longer accurate after llvm#94815. This is only used in diagnostics.

With llvm#94815, the nodes belonging to dead functions are no longer invalidated, but kept around to batch delete at the end of the call graph walk. The ML inliner needs to be updated to handle this. This fixes some asserts getting hit, e.g. https://crbug.com/348376263.

wlei-llvm · 2024-07-08T21:57:33Z

@wlei-llvm yes a reduced repro would be helpful, I can't figure out what's wrong with just the assert

Hi @aeubanks , here is the reduced repro:

cmd:

opt -disable-output -passes="cgscc(devirt<4>(inline,openmp-opt-cgscc,function<>(simplifycfg<>)))" < reduced.ll

reduced.ll:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-redhat-linux-gnu"
define internal fastcc void @foo() #0 personality ptr null {
  br label %1
1:                                                ; preds = %2, %0
  ret void
2:                                                ; No predecessors!
  call fastcc void @bar()
  br label %1
}
define fastcc void @baz() personality ptr null {
  switch i64 0, label %3 [
    i64 0, label %3
    i64 1, label %3
    i64 2, label %3
    i64 3, label %4
    i64 4, label %3
    i64 5, label %3
    i64 6, label %3
    i64 7, label %3
    i64 8, label %3
    i64 9, label %3
    i64 10, label %3
    i64 11, label %3
    i64 12, label %3
    i64 13, label %3
    i64 14, label %3
    i64 15, label %3
    i64 16, label %3
  ]
1:                                                ; No predecessors!
  %2 = landingpad { ptr, i32 }
          cleanup
  call fastcc void @foo()
  resume { ptr, i32 } zeroinitializer
3:                                                ; preds = %4, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0
  ret void
4:                                                ; preds = %0
  call fastcc void @baz()
  br label %3
}
define fastcc void @bar() personality ptr null {
  call fastcc void @baz()
  ret void
}
; uselistorder directives
uselistorder ptr null, { 3, 4, 0, 5, 6, 1, 7, 8, 2 }
attributes #0 = { "target-cpu"="haswell" }
!llvm.module.flags = !{!0}
!0 = !{i32 7, !"openmp", i32 50}

Please take a look, thanks!

aeubanks · 2024-07-09T16:17:39Z

I have a fix for the assert incoming

After llvm#94815, this is only used within ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that function.

The RefSCC that a function marked dead is in may still contain valid SCCs that we want to visit. We rely on InvalidatedSCCs to skip SCCs containing dead functions. The addition of RefSCCs in CallGraphUpdater to InvalidatedRefSCCs was causing asserts as reported in #94815. Fix some more CallGraphUpdater function deletion methods as well.

aeubanks · 2024-07-10T16:55:39Z

#98213 should fix the assert, let me know if you are still seeing issues. sorry for the trouble (we really need to remove CallGraphUpdater...)

wlei-llvm · 2024-07-10T17:17:11Z

#98213 should fix the assert, let me know if you are still seeing issues. sorry for the trouble (we really need to remove CallGraphUpdater...)

Thanks for the quick fix, I will test it on our services shortly.

The RefSCC that a function marked dead is in may still contain valid SCCs that we want to visit. We rely on InvalidatedSCCs to skip SCCs containing dead functions. The addition of RefSCCs in CallGraphUpdater to InvalidatedRefSCCs was causing asserts as reported in llvm#94815. Fix some more CallGraphUpdater function deletion methods as well.

wlei-llvm · 2024-07-16T16:43:22Z

#98213 should fix the assert, let me know if you are still seeing issues. sorry for the trouble (we really need to remove CallGraphUpdater...)

Thanks for the quick fix, I will test it on our services shortly.

I confirm it fixes the assertion errors on our services, thanks again!

After llvm#94815, this is only used within ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that function. (cherry picked from commit b312cbf)

With llvm#94815, the nodes belonging to dead functions are no longer invalidated, but kept around to batch delete at the end of the call graph walk. The ML inliner needs to be updated to handle this. This fixes some asserts getting hit, e.g. https://crbug.com/348376263. (cherry picked from commit 94471e6)

The RefSCC that a function marked dead is in may still contain valid SCCs that we want to visit. We rely on InvalidatedSCCs to skip SCCs containing dead functions. The addition of RefSCCs in CallGraphUpdater to InvalidatedRefSCCs was causing asserts as reported in llvm#94815. Fix some more CallGraphUpdater function deletion methods as well. (cherry picked from commit 8d800e6)