Skip to content

Commit 176f3d5

Browse files
aeubanksSvyatoslavScherbina
authored andcommitted
[CGSCC] Fix compile time blowup with large RefSCCs (llvm#94815)
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC. There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction(). 1) Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass(). 2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler. Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph. Compile time: https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u (cherry picked from commit 71497cc)
1 parent d675d6a commit 176f3d5

File tree

11 files changed

+132
-146
lines changed

11 files changed

+132
-146
lines changed

llvm/include/llvm/Analysis/CGSCCPassManager.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,10 @@ struct CGSCCUpdateResult {
306306
SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
307307
&InlinedInternalEdges;
308308

309+
/// Functions that a pass has considered to be dead to be removed at the end
310+
/// of the call graph walk in batch.
311+
SmallVector<Function *, 4> &DeadFunctions;
312+
309313
/// Weak VHs to keep track of indirect calls for the purposes of detecting
310314
/// devirtualization.
311315
///

llvm/include/llvm/Analysis/LazyCallGraph.h

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -832,7 +832,7 @@ class LazyCallGraph {
832832
/// self-edges and edge removals which result in a spanning tree with no
833833
/// more cycles.
834834
[[nodiscard]] SmallVector<RefSCC *, 1>
835-
removeInternalRefEdge(Node &SourceN, ArrayRef<Node *> TargetNs);
835+
removeInternalRefEdges(ArrayRef<std::pair<Node *, Node *>> Edges);
836836

837837
/// A convenience wrapper around the above to handle trivial cases of
838838
/// inserting a new call edge.
@@ -1056,18 +1056,18 @@ class LazyCallGraph {
10561056
/// once SCCs have started to be formed. These routines have strict contracts
10571057
/// but may be called at any point.
10581058

1059-
/// Remove a dead function from the call graph (typically to delete it).
1059+
/// Remove dead functions from the call graph.
10601060
///
1061-
/// Note that the function must have an empty use list, and the call graph
1062-
/// must be up-to-date prior to calling this. That means it is by itself in
1063-
/// a maximal SCC which is by itself in a maximal RefSCC, etc. No structural
1064-
/// changes result from calling this routine other than potentially removing
1065-
/// entry points into the call graph.
1061+
/// These functions should have already been passed to markDeadFunction().
1062+
/// This is done as a batch to prevent compile time blowup as a result of
1063+
/// handling a single function at a time.
1064+
void removeDeadFunctions(ArrayRef<Function *> DeadFs);
1065+
1066+
/// Mark a function as dead to be removed later by removeDeadFunctions().
10661067
///
1067-
/// If SCC formation has begun, this function must not be part of the current
1068-
/// DFS in order to call this safely. Typically, the function will have been
1069-
/// fully visited by the DFS prior to calling this routine.
1070-
void removeDeadFunction(Function &F);
1068+
/// The function body should have no incoming or outgoing call or ref edges.
1069+
/// For example, a function with a single "unreachable" instruction.
1070+
void markDeadFunction(Function &F);
10711071

10721072
/// Add a new function split/outlined from an existing function.
10731073
///

llvm/lib/Analysis/CGSCCPassManager.cpp

Lines changed: 9 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -160,10 +160,12 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
160160
SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
161161
InlinedInternalEdges;
162162

163+
SmallVector<Function *, 4> DeadFunctions;
164+
163165
CGSCCUpdateResult UR = {
164-
RCWorklist, CWorklist, InvalidRefSCCSet,
165-
InvalidSCCSet, nullptr, PreservedAnalyses::all(),
166-
InlinedInternalEdges, {}};
166+
RCWorklist, CWorklist, InvalidRefSCCSet,
167+
InvalidSCCSet, nullptr, PreservedAnalyses::all(),
168+
InlinedInternalEdges, DeadFunctions, {}};
167169

168170
// Request PassInstrumentation from analysis manager, will use it to run
169171
// instrumenting callbacks for the passes later.
@@ -344,6 +346,10 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
344346
} while (!RCWorklist.empty());
345347
}
346348

349+
CG.removeDeadFunctions(DeadFunctions);
350+
for (Function *DeadF : DeadFunctions)
351+
DeadF->eraseFromParent();
352+
347353
#if defined(EXPENSIVE_CHECKS)
348354
// Verify that the call graph is still valid.
349355
CG.verify();
@@ -1037,36 +1043,6 @@ static LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(
10371043
return true;
10381044
});
10391045

1040-
// Now do a batch removal of the internal ref edges left.
1041-
auto NewRefSCCs = RC->removeInternalRefEdge(N, DeadTargets);
1042-
if (!NewRefSCCs.empty()) {
1043-
// The old RefSCC is dead, mark it as such.
1044-
UR.InvalidatedRefSCCs.insert(RC);
1045-
1046-
// Note that we don't bother to invalidate analyses as ref-edge
1047-
// connectivity is not really observable in any way and is intended
1048-
// exclusively to be used for ordering of transforms rather than for
1049-
// analysis conclusions.
1050-
1051-
// Update RC to the "bottom".
1052-
assert(G.lookupSCC(N) == C && "Changed the SCC when splitting RefSCCs!");
1053-
RC = &C->getOuterRefSCC();
1054-
assert(G.lookupRefSCC(N) == RC && "Failed to update current RefSCC!");
1055-
1056-
// The RC worklist is in reverse postorder, so we enqueue the new ones in
1057-
// RPO except for the one which contains the source node as that is the
1058-
// "bottom" we will continue processing in the bottom-up walk.
1059-
assert(NewRefSCCs.front() == RC &&
1060-
"New current RefSCC not first in the returned list!");
1061-
for (RefSCC *NewRC : llvm::reverse(llvm::drop_begin(NewRefSCCs))) {
1062-
assert(NewRC != RC && "Should not encounter the current RefSCC further "
1063-
"in the postorder list of new RefSCCs.");
1064-
UR.RCWorklist.insert(NewRC);
1065-
LLVM_DEBUG(dbgs() << "Enqueuing a new RefSCC in the update worklist: "
1066-
<< *NewRC << "\n");
1067-
}
1068-
}
1069-
10701046
// Next demote all the call edges that are now ref edges. This helps make
10711047
// the SCCs small which should minimize the work below as we don't want to
10721048
// form cycles that this would break.

llvm/lib/Analysis/LazyCallGraph.cpp

Lines changed: 64 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1160,8 +1160,8 @@ void LazyCallGraph::RefSCC::removeOutgoingEdge(Node &SourceN, Node &TargetN) {
11601160
}
11611161

11621162
SmallVector<LazyCallGraph::RefSCC *, 1>
1163-
LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
1164-
ArrayRef<Node *> TargetNs) {
1163+
LazyCallGraph::RefSCC::removeInternalRefEdges(
1164+
ArrayRef<std::pair<Node *, Node *>> Edges) {
11651165
// We return a list of the resulting *new* RefSCCs in post-order.
11661166
SmallVector<RefSCC *, 1> Result;
11671167

@@ -1179,25 +1179,21 @@ LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
11791179
#endif
11801180

11811181
// First remove the actual edges.
1182-
for (Node *TargetN : TargetNs) {
1183-
assert(!(*SourceN)[*TargetN].isCall() &&
1182+
for (auto [SourceN, TargetN] : Edges) {
1183+
assert(!(**SourceN)[*TargetN].isCall() &&
11841184
"Cannot remove a call edge, it must first be made a ref edge");
11851185

1186-
bool Removed = SourceN->removeEdgeInternal(*TargetN);
1186+
bool Removed = (*SourceN)->removeEdgeInternal(*TargetN);
11871187
(void)Removed;
11881188
assert(Removed && "Target not in the edge set for this caller?");
11891189
}
11901190

11911191
// Direct self references don't impact the ref graph at all.
1192-
if (llvm::all_of(TargetNs,
1193-
[&](Node *TargetN) { return &SourceN == TargetN; }))
1194-
return Result;
1195-
11961192
// If all targets are in the same SCC as the source, because no call edges
11971193
// were removed there is no RefSCC structure change.
1198-
SCC &SourceC = *G->lookupSCC(SourceN);
1199-
if (llvm::all_of(TargetNs, [&](Node *TargetN) {
1200-
return G->lookupSCC(*TargetN) == &SourceC;
1194+
if (llvm::all_of(Edges, [&](std::pair<Node *, Node *> E) {
1195+
return E.first == E.second ||
1196+
G->lookupSCC(*E.first) == G->lookupSCC(*E.second);
12011197
}))
12021198
return Result;
12031199

@@ -1499,7 +1495,7 @@ void LazyCallGraph::removeEdge(Node &SourceN, Node &TargetN) {
14991495
assert(Removed && "Target not in the edge set for this caller?");
15001496
}
15011497

1502-
void LazyCallGraph::removeDeadFunction(Function &F) {
1498+
void LazyCallGraph::markDeadFunction(Function &F) {
15031499
// FIXME: This is unnecessarily restrictive. We should be able to remove
15041500
// functions which recursively call themselves.
15051501
assert(F.hasZeroLiveUses() &&
@@ -1517,57 +1513,66 @@ void LazyCallGraph::removeDeadFunction(Function &F) {
15171513

15181514
Node &N = *NI->second;
15191515

1520-
// Cannot remove a function which has yet to be visited in the DFS walk, so
1521-
// if we have a node at all then we must have an SCC and RefSCC.
1522-
auto CI = SCCMap.find(&N);
1523-
assert(CI != SCCMap.end() &&
1524-
"Tried to remove a node without an SCC after DFS walk started!");
1525-
SCC &C = *CI->second;
1526-
RefSCC *RC = &C.getOuterRefSCC();
1527-
1528-
// In extremely rare cases, we can delete a dead function which is still in a
1529-
// non-trivial RefSCC. This can happen due to spurious ref edges sticking
1530-
// around after an IR function reference is removed.
1531-
if (RC->size() != 1) {
1532-
SmallVector<Node *, 0> NodesInRC;
1533-
for (SCC &OtherC : *RC) {
1534-
for (Node &OtherN : OtherC)
1535-
NodesInRC.push_back(&OtherN);
1516+
// Remove all call edges out of dead function.
1517+
for (Edge E : *N) {
1518+
if (E.isCall())
1519+
N->setEdgeKind(E.getNode(), Edge::Ref);
1520+
}
1521+
}
1522+
1523+
void LazyCallGraph::removeDeadFunctions(ArrayRef<Function *> DeadFs) {
1524+
if (DeadFs.empty())
1525+
return;
1526+
1527+
// Group dead functions by the RefSCC they're in.
1528+
DenseMap<RefSCC *, SmallVector<Node *, 1>> RCs;
1529+
for (Function *DeadF : DeadFs) {
1530+
Node *N = lookup(*DeadF);
1531+
#ifndef NDEBUG
1532+
for (Edge &E : **N) {
1533+
assert(!E.isCall() &&
1534+
"dead function shouldn't have any outgoing call edges");
15361535
}
1537-
for (Node *OtherN : NodesInRC) {
1538-
if ((*OtherN)->lookup(N)) {
1539-
auto NewRefSCCs =
1540-
RC->removeInternalRefEdge(*OtherN, ArrayRef<Node *>(&N));
1541-
// If we've split into multiple RefSCCs, RC is now invalid and the
1542-
// RefSCC containing C will be different.
1543-
if (!NewRefSCCs.empty())
1544-
RC = &C.getOuterRefSCC();
1536+
#endif
1537+
RefSCC *RC = lookupRefSCC(*N);
1538+
RCs[RC].push_back(N);
1539+
}
1540+
// Remove outgoing edges from all dead functions. Dead functions should
1541+
// already have had their call edges removed in markDeadFunction(), so we only
1542+
// need to worry about spurious ref edges.
1543+
for (auto [RC, DeadNs] : RCs) {
1544+
SmallVector<std::pair<Node *, Node *>> InternalEdgesToRemove;
1545+
for (Node *DeadN : DeadNs) {
1546+
for (Edge &E : **DeadN) {
1547+
if (lookupRefSCC(E.getNode()) == RC)
1548+
InternalEdgesToRemove.push_back({DeadN, &E.getNode()});
1549+
else
1550+
RC->removeOutgoingEdge(*DeadN, E.getNode());
15451551
}
15461552
}
1553+
// We ignore the returned RefSCCs since at this point we're done with CGSCC
1554+
// iteration and don't need to add it to any worklists.
1555+
(void)RC->removeInternalRefEdges(InternalEdgesToRemove);
1556+
for (Node *DeadN : DeadNs) {
1557+
RefSCC *DeadRC = lookupRefSCC(*DeadN);
1558+
assert(DeadRC->size() == 1);
1559+
assert(DeadRC->begin()->size() == 1);
1560+
DeadRC->clear();
1561+
DeadRC->G = nullptr;
1562+
}
15471563
}
1564+
// Clean up data structures.
1565+
for (Function *DeadF : DeadFs) {
1566+
Node &N = *lookup(*DeadF);
1567+
1568+
EntryEdges.removeEdgeInternal(N);
1569+
SCCMap.erase(SCCMap.find(&N));
1570+
NodeMap.erase(NodeMap.find(DeadF));
15481571

1549-
NodeMap.erase(NI);
1550-
EntryEdges.removeEdgeInternal(N);
1551-
SCCMap.erase(CI);
1552-
1553-
// This node must be the only member of its SCC as it has no callers, and
1554-
// that SCC must be the only member of a RefSCC as it has no references.
1555-
// Validate these properties first.
1556-
assert(C.size() == 1 && "Dead functions must be in a singular SCC");
1557-
assert(RC->size() == 1 && "Dead functions must be in a singular RefSCC");
1558-
1559-
// Finally clear out all the data structures from the node down through the
1560-
// components. postorder_ref_scc_iterator will skip empty RefSCCs, so no need
1561-
// to adjust LazyCallGraph data structures.
1562-
N.clear();
1563-
N.G = nullptr;
1564-
N.F = nullptr;
1565-
C.clear();
1566-
RC->clear();
1567-
RC->G = nullptr;
1568-
1569-
// Nothing to delete as all the objects are allocated in stable bump pointer
1570-
// allocators.
1572+
N.clear();
1573+
N.G = nullptr;
1574+
N.F = nullptr;
1575+
}
15711576
}
15721577

15731578
// Gets the Edge::Kind from one function to another by looking at the function's

llvm/lib/Transforms/IPO/Inliner.cpp

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -740,6 +740,14 @@ InlinerPass::getAdvisor(const ModuleAnalysisManagerCGSCCProxy::Result &MAM,
740740
return *IAA->getAdvisor();
741741
}
742742

743+
void makeFunctionBodyUnreachable(Function &F) {
744+
F.dropAllReferences();
745+
for (BasicBlock &BB : make_early_inc_range(F))
746+
BB.eraseFromParent();
747+
BasicBlock *BB = BasicBlock::Create(F.getContext(), "", &F);
748+
new UnreachableInst(F.getContext(), BB);
749+
}
750+
743751
PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
744752
CGSCCAnalysisManager &AM, LazyCallGraph &CG,
745753
CGSCCUpdateResult &UR) {
@@ -988,11 +996,9 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
988996
}),
989997
Calls.end());
990998

991-
// Clear the body and queue the function itself for deletion when we
992-
// finish inlining and call graph updates.
993-
// Note that after this point, it is an error to do anything other
994-
// than use the callee's address or delete it.
995-
Callee.dropAllReferences();
999+
// Clear the body and queue the function itself for call graph
1000+
// updating when we finish inlining.
1001+
makeFunctionBodyUnreachable(Callee);
9961002
assert(!is_contained(DeadFunctions, &Callee) &&
9971003
"Cannot put cause a function to become dead twice!");
9981004
DeadFunctions.push_back(&Callee);
@@ -1070,7 +1076,7 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
10701076
if (!DeadFunctionsInComdats.empty()) {
10711077
filterDeadComdatFunctions(DeadFunctionsInComdats);
10721078
for (auto *Callee : DeadFunctionsInComdats)
1073-
Callee->dropAllReferences();
1079+
makeFunctionBodyUnreachable(*Callee);
10741080
DeadFunctions.append(DeadFunctionsInComdats);
10751081
}
10761082

@@ -1082,25 +1088,18 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
10821088
// that is OK as all we do is delete things and add pointers to unordered
10831089
// sets.
10841090
for (Function *DeadF : DeadFunctions) {
1091+
CG.markDeadFunction(*DeadF);
10851092
// Get the necessary information out of the call graph and nuke the
10861093
// function there. Also, clear out any cached analyses.
10871094
auto &DeadC = *CG.lookupSCC(*CG.lookup(*DeadF));
10881095
FAM.clear(*DeadF, DeadF->getName());
10891096
AM.clear(DeadC, DeadC.getName());
1090-
auto &DeadRC = DeadC.getOuterRefSCC();
1091-
CG.removeDeadFunction(*DeadF);
10921097

10931098
// Mark the relevant parts of the call graph as invalid so we don't visit
10941099
// them.
10951100
UR.InvalidatedSCCs.insert(&DeadC);
1096-
UR.InvalidatedRefSCCs.insert(&DeadRC);
1097-
1098-
// If the updated SCC was the one containing the deleted function, clear it.
1099-
if (&DeadC == UR.UpdatedC)
1100-
UR.UpdatedC = nullptr;
11011101

1102-
// And delete the actual function from the module.
1103-
M.getFunctionList().erase(DeadF);
1102+
UR.DeadFunctions.push_back(DeadF);
11041103

11051104
++NumDeleted;
11061105
}

llvm/lib/Transforms/Utils/CallGraphUpdater.cpp

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,16 +67,20 @@ bool CallGraphUpdater::finalize() {
6767

6868
FAM.clear(*DeadFn, DeadFn->getName());
6969
AM->clear(*DeadSCC, DeadSCC->getName());
70-
LCG->removeDeadFunction(*DeadFn);
70+
LCG->markDeadFunction(*DeadFn);
7171

7272
// Mark the relevant parts of the call graph as invalid so we don't
7373
// visit them.
7474
UR->InvalidatedSCCs.insert(DeadSCC);
7575
UR->InvalidatedRefSCCs.insert(&DeadRC);
76+
UR->DeadFunctions.push_back(DeadFn);
77+
} else {
78+
// The CGSCC infrastructure batch deletes functions at the end of the
79+
// call graph walk, so only erase the function if we're not using that
80+
// infrastructure.
81+
// The function is now really dead and de-attached from everything.
82+
DeadFn->eraseFromParent();
7683
}
77-
78-
// The function is now really dead and de-attached from everything.
79-
DeadFn->eraseFromParent();
8084
}
8185
}
8286

llvm/test/Other/cgscc-refscc-mutation-order.ll

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@
1515
; CHECK-NOT: InstCombinePass
1616
; CHECK: Running pass: InstCombinePass on f4
1717
; CHECK-NOT: InstCombinePass
18-
; CHECK: Running pass: InstCombinePass on f1
19-
; CHECK-NOT: InstCombinePass
2018

2119
@a1 = alias void (), ptr @f1
2220
@a2 = alias void (), ptr @f2

llvm/test/Other/devirt-invalidated.ll

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
; RUN: opt -passes='devirt<0>(inline)' < %s -S | FileCheck %s
22

3-
; CHECK-NOT: internal
43
; CHECK: define void @e()
5-
; CHECK-NOT: internal
64

75
define void @e() {
86
entry:

llvm/test/Transforms/Inline/cgscc-cycle-debug.ll

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
; CHECK: Running an SCC pass across the RefSCC: [(test1_a, test1_b, test1_c)]
1414
; CHECK: Enqueuing the existing SCC in the worklist:(test1_b)
1515
; CHECK: Enqueuing a newly formed SCC:(test1_c)
16-
; CHECK: Enqueuing a new RefSCC in the update worklist: [(test1_b)]
1716
; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_c'
1817
; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_a'
1918
; CHECK: Re-running SCC passes after a refinement of the current SCC: (test1_c, test1_a)

0 commit comments

Comments
 (0)