Skip to content

Commit 71497cc

Browse files
authored
[CGSCC] Fix compile time blowup with large RefSCCs (#94815)
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC. There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction(). 1) Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass(). 2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler. Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph. Compile time: https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u
1 parent a03e93e commit 71497cc

File tree

13 files changed

+132
-148
lines changed

13 files changed

+132
-148
lines changed

llvm/include/llvm/Analysis/CGSCCPassManager.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,10 @@ struct CGSCCUpdateResult {
306306
SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
307307
&InlinedInternalEdges;
308308

309+
/// Functions that a pass has considered to be dead to be removed at the end
310+
/// of the call graph walk in batch.
311+
SmallVector<Function *, 4> &DeadFunctions;
312+
309313
/// Weak VHs to keep track of indirect calls for the purposes of detecting
310314
/// devirtualization.
311315
///

llvm/include/llvm/Analysis/LazyCallGraph.h

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -832,7 +832,7 @@ class LazyCallGraph {
832832
/// self-edges and edge removals which result in a spanning tree with no
833833
/// more cycles.
834834
[[nodiscard]] SmallVector<RefSCC *, 1>
835-
removeInternalRefEdge(Node &SourceN, ArrayRef<Node *> TargetNs);
835+
removeInternalRefEdges(ArrayRef<std::pair<Node *, Node *>> Edges);
836836

837837
/// A convenience wrapper around the above to handle trivial cases of
838838
/// inserting a new call edge.
@@ -1056,18 +1056,18 @@ class LazyCallGraph {
10561056
/// once SCCs have started to be formed. These routines have strict contracts
10571057
/// but may be called at any point.
10581058

1059-
/// Remove a dead function from the call graph (typically to delete it).
1059+
/// Remove dead functions from the call graph.
10601060
///
1061-
/// Note that the function must have an empty use list, and the call graph
1062-
/// must be up-to-date prior to calling this. That means it is by itself in
1063-
/// a maximal SCC which is by itself in a maximal RefSCC, etc. No structural
1064-
/// changes result from calling this routine other than potentially removing
1065-
/// entry points into the call graph.
1061+
/// These functions should have already been passed to markDeadFunction().
1062+
/// This is done as a batch to prevent compile time blowup as a result of
1063+
/// handling a single function at a time.
1064+
void removeDeadFunctions(ArrayRef<Function *> DeadFs);
1065+
1066+
/// Mark a function as dead to be removed later by removeDeadFunctions().
10661067
///
1067-
/// If SCC formation has begun, this function must not be part of the current
1068-
/// DFS in order to call this safely. Typically, the function will have been
1069-
/// fully visited by the DFS prior to calling this routine.
1070-
void removeDeadFunction(Function &F);
1068+
/// The function body should have no incoming or outgoing call or ref edges.
1069+
/// For example, a function with a single "unreachable" instruction.
1070+
void markDeadFunction(Function &F);
10711071

10721072
/// Add a new function split/outlined from an existing function.
10731073
///

llvm/lib/Analysis/CGSCCPassManager.cpp

Lines changed: 9 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -158,10 +158,12 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
158158
SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4>
159159
InlinedInternalEdges;
160160

161+
SmallVector<Function *, 4> DeadFunctions;
162+
161163
CGSCCUpdateResult UR = {
162-
RCWorklist, CWorklist, InvalidRefSCCSet,
163-
InvalidSCCSet, nullptr, PreservedAnalyses::all(),
164-
InlinedInternalEdges, {}};
164+
RCWorklist, CWorklist, InvalidRefSCCSet,
165+
InvalidSCCSet, nullptr, PreservedAnalyses::all(),
166+
InlinedInternalEdges, DeadFunctions, {}};
165167

166168
// Request PassInstrumentation from analysis manager, will use it to run
167169
// instrumenting callbacks for the passes later.
@@ -340,6 +342,10 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) {
340342
} while (!RCWorklist.empty());
341343
}
342344

345+
CG.removeDeadFunctions(DeadFunctions);
346+
for (Function *DeadF : DeadFunctions)
347+
DeadF->eraseFromParent();
348+
343349
#if defined(EXPENSIVE_CHECKS)
344350
// Verify that the call graph is still valid.
345351
CG.verify();
@@ -1030,36 +1036,6 @@ static LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(
10301036
return true;
10311037
});
10321038

1033-
// Now do a batch removal of the internal ref edges left.
1034-
auto NewRefSCCs = RC->removeInternalRefEdge(N, DeadTargets);
1035-
if (!NewRefSCCs.empty()) {
1036-
// The old RefSCC is dead, mark it as such.
1037-
UR.InvalidatedRefSCCs.insert(RC);
1038-
1039-
// Note that we don't bother to invalidate analyses as ref-edge
1040-
// connectivity is not really observable in any way and is intended
1041-
// exclusively to be used for ordering of transforms rather than for
1042-
// analysis conclusions.
1043-
1044-
// Update RC to the "bottom".
1045-
assert(G.lookupSCC(N) == C && "Changed the SCC when splitting RefSCCs!");
1046-
RC = &C->getOuterRefSCC();
1047-
assert(G.lookupRefSCC(N) == RC && "Failed to update current RefSCC!");
1048-
1049-
// The RC worklist is in reverse postorder, so we enqueue the new ones in
1050-
// RPO except for the one which contains the source node as that is the
1051-
// "bottom" we will continue processing in the bottom-up walk.
1052-
assert(NewRefSCCs.front() == RC &&
1053-
"New current RefSCC not first in the returned list!");
1054-
for (RefSCC *NewRC : llvm::reverse(llvm::drop_begin(NewRefSCCs))) {
1055-
assert(NewRC != RC && "Should not encounter the current RefSCC further "
1056-
"in the postorder list of new RefSCCs.");
1057-
UR.RCWorklist.insert(NewRC);
1058-
LLVM_DEBUG(dbgs() << "Enqueuing a new RefSCC in the update worklist: "
1059-
<< *NewRC << "\n");
1060-
}
1061-
}
1062-
10631039
// Next demote all the call edges that are now ref edges. This helps make
10641040
// the SCCs small which should minimize the work below as we don't want to
10651041
// form cycles that this would break.

llvm/lib/Analysis/LazyCallGraph.cpp

Lines changed: 64 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1160,8 +1160,8 @@ void LazyCallGraph::RefSCC::removeOutgoingEdge(Node &SourceN, Node &TargetN) {
11601160
}
11611161

11621162
SmallVector<LazyCallGraph::RefSCC *, 1>
1163-
LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
1164-
ArrayRef<Node *> TargetNs) {
1163+
LazyCallGraph::RefSCC::removeInternalRefEdges(
1164+
ArrayRef<std::pair<Node *, Node *>> Edges) {
11651165
// We return a list of the resulting *new* RefSCCs in post-order.
11661166
SmallVector<RefSCC *, 1> Result;
11671167

@@ -1179,25 +1179,21 @@ LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
11791179
#endif
11801180

11811181
// First remove the actual edges.
1182-
for (Node *TargetN : TargetNs) {
1183-
assert(!(*SourceN)[*TargetN].isCall() &&
1182+
for (auto [SourceN, TargetN] : Edges) {
1183+
assert(!(**SourceN)[*TargetN].isCall() &&
11841184
"Cannot remove a call edge, it must first be made a ref edge");
11851185

1186-
bool Removed = SourceN->removeEdgeInternal(*TargetN);
1186+
bool Removed = (*SourceN)->removeEdgeInternal(*TargetN);
11871187
(void)Removed;
11881188
assert(Removed && "Target not in the edge set for this caller?");
11891189
}
11901190

11911191
// Direct self references don't impact the ref graph at all.
1192-
if (llvm::all_of(TargetNs,
1193-
[&](Node *TargetN) { return &SourceN == TargetN; }))
1194-
return Result;
1195-
11961192
// If all targets are in the same SCC as the source, because no call edges
11971193
// were removed there is no RefSCC structure change.
1198-
SCC &SourceC = *G->lookupSCC(SourceN);
1199-
if (llvm::all_of(TargetNs, [&](Node *TargetN) {
1200-
return G->lookupSCC(*TargetN) == &SourceC;
1194+
if (llvm::all_of(Edges, [&](std::pair<Node *, Node *> E) {
1195+
return E.first == E.second ||
1196+
G->lookupSCC(*E.first) == G->lookupSCC(*E.second);
12011197
}))
12021198
return Result;
12031199

@@ -1499,7 +1495,7 @@ void LazyCallGraph::removeEdge(Node &SourceN, Node &TargetN) {
14991495
assert(Removed && "Target not in the edge set for this caller?");
15001496
}
15011497

1502-
void LazyCallGraph::removeDeadFunction(Function &F) {
1498+
void LazyCallGraph::markDeadFunction(Function &F) {
15031499
// FIXME: This is unnecessarily restrictive. We should be able to remove
15041500
// functions which recursively call themselves.
15051501
assert(F.hasZeroLiveUses() &&
@@ -1515,57 +1511,66 @@ void LazyCallGraph::removeDeadFunction(Function &F) {
15151511

15161512
Node &N = *NI->second;
15171513

1518-
// Cannot remove a function which has yet to be visited in the DFS walk, so
1519-
// if we have a node at all then we must have an SCC and RefSCC.
1520-
auto CI = SCCMap.find(&N);
1521-
assert(CI != SCCMap.end() &&
1522-
"Tried to remove a node without an SCC after DFS walk started!");
1523-
SCC &C = *CI->second;
1524-
RefSCC *RC = &C.getOuterRefSCC();
1525-
1526-
// In extremely rare cases, we can delete a dead function which is still in a
1527-
// non-trivial RefSCC. This can happen due to spurious ref edges sticking
1528-
// around after an IR function reference is removed.
1529-
if (RC->size() != 1) {
1530-
SmallVector<Node *, 0> NodesInRC;
1531-
for (SCC &OtherC : *RC) {
1532-
for (Node &OtherN : OtherC)
1533-
NodesInRC.push_back(&OtherN);
1514+
// Remove all call edges out of dead function.
1515+
for (Edge E : *N) {
1516+
if (E.isCall())
1517+
N->setEdgeKind(E.getNode(), Edge::Ref);
1518+
}
1519+
}
1520+
1521+
void LazyCallGraph::removeDeadFunctions(ArrayRef<Function *> DeadFs) {
1522+
if (DeadFs.empty())
1523+
return;
1524+
1525+
// Group dead functions by the RefSCC they're in.
1526+
DenseMap<RefSCC *, SmallVector<Node *, 1>> RCs;
1527+
for (Function *DeadF : DeadFs) {
1528+
Node *N = lookup(*DeadF);
1529+
#ifndef NDEBUG
1530+
for (Edge &E : **N) {
1531+
assert(!E.isCall() &&
1532+
"dead function shouldn't have any outgoing call edges");
15341533
}
1535-
for (Node *OtherN : NodesInRC) {
1536-
if ((*OtherN)->lookup(N)) {
1537-
auto NewRefSCCs =
1538-
RC->removeInternalRefEdge(*OtherN, ArrayRef<Node *>(&N));
1539-
// If we've split into multiple RefSCCs, RC is now invalid and the
1540-
// RefSCC containing C will be different.
1541-
if (!NewRefSCCs.empty())
1542-
RC = &C.getOuterRefSCC();
1534+
#endif
1535+
RefSCC *RC = lookupRefSCC(*N);
1536+
RCs[RC].push_back(N);
1537+
}
1538+
// Remove outgoing edges from all dead functions. Dead functions should
1539+
// already have had their call edges removed in markDeadFunction(), so we only
1540+
// need to worry about spurious ref edges.
1541+
for (auto [RC, DeadNs] : RCs) {
1542+
SmallVector<std::pair<Node *, Node *>> InternalEdgesToRemove;
1543+
for (Node *DeadN : DeadNs) {
1544+
for (Edge &E : **DeadN) {
1545+
if (lookupRefSCC(E.getNode()) == RC)
1546+
InternalEdgesToRemove.push_back({DeadN, &E.getNode()});
1547+
else
1548+
RC->removeOutgoingEdge(*DeadN, E.getNode());
15431549
}
15441550
}
1551+
// We ignore the returned RefSCCs since at this point we're done with CGSCC
1552+
// iteration and don't need to add it to any worklists.
1553+
(void)RC->removeInternalRefEdges(InternalEdgesToRemove);
1554+
for (Node *DeadN : DeadNs) {
1555+
RefSCC *DeadRC = lookupRefSCC(*DeadN);
1556+
assert(DeadRC->size() == 1);
1557+
assert(DeadRC->begin()->size() == 1);
1558+
DeadRC->clear();
1559+
DeadRC->G = nullptr;
1560+
}
15451561
}
1562+
// Clean up data structures.
1563+
for (Function *DeadF : DeadFs) {
1564+
Node &N = *lookup(*DeadF);
1565+
1566+
EntryEdges.removeEdgeInternal(N);
1567+
SCCMap.erase(SCCMap.find(&N));
1568+
NodeMap.erase(NodeMap.find(DeadF));
15461569

1547-
NodeMap.erase(NI);
1548-
EntryEdges.removeEdgeInternal(N);
1549-
SCCMap.erase(CI);
1550-
1551-
// This node must be the only member of its SCC as it has no callers, and
1552-
// that SCC must be the only member of a RefSCC as it has no references.
1553-
// Validate these properties first.
1554-
assert(C.size() == 1 && "Dead functions must be in a singular SCC");
1555-
assert(RC->size() == 1 && "Dead functions must be in a singular RefSCC");
1556-
1557-
// Finally clear out all the data structures from the node down through the
1558-
// components. postorder_ref_scc_iterator will skip empty RefSCCs, so no need
1559-
// to adjust LazyCallGraph data structures.
1560-
N.clear();
1561-
N.G = nullptr;
1562-
N.F = nullptr;
1563-
C.clear();
1564-
RC->clear();
1565-
RC->G = nullptr;
1566-
1567-
// Nothing to delete as all the objects are allocated in stable bump pointer
1568-
// allocators.
1570+
N.clear();
1571+
N.G = nullptr;
1572+
N.F = nullptr;
1573+
}
15691574
}
15701575

15711576
// Gets the Edge::Kind from one function to another by looking at the function's

llvm/lib/Transforms/IPO/Inliner.cpp

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,14 @@ InlinerPass::getAdvisor(const ModuleAnalysisManagerCGSCCProxy::Result &MAM,
197197
return *IAA->getAdvisor();
198198
}
199199

200+
void makeFunctionBodyUnreachable(Function &F) {
201+
F.dropAllReferences();
202+
for (BasicBlock &BB : make_early_inc_range(F))
203+
BB.eraseFromParent();
204+
BasicBlock *BB = BasicBlock::Create(F.getContext(), "", &F);
205+
new UnreachableInst(F.getContext(), BB);
206+
}
207+
200208
PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
201209
CGSCCAnalysisManager &AM, LazyCallGraph &CG,
202210
CGSCCUpdateResult &UR) {
@@ -448,11 +456,9 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
448456
}),
449457
Calls.end());
450458

451-
// Clear the body and queue the function itself for deletion when we
452-
// finish inlining and call graph updates.
453-
// Note that after this point, it is an error to do anything other
454-
// than use the callee's address or delete it.
455-
Callee.dropAllReferences();
459+
// Clear the body and queue the function itself for call graph
460+
// updating when we finish inlining.
461+
makeFunctionBodyUnreachable(Callee);
456462
assert(!is_contained(DeadFunctions, &Callee) &&
457463
"Cannot put cause a function to become dead twice!");
458464
DeadFunctions.push_back(&Callee);
@@ -530,7 +536,7 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
530536
if (!DeadFunctionsInComdats.empty()) {
531537
filterDeadComdatFunctions(DeadFunctionsInComdats);
532538
for (auto *Callee : DeadFunctionsInComdats)
533-
Callee->dropAllReferences();
539+
makeFunctionBodyUnreachable(*Callee);
534540
DeadFunctions.append(DeadFunctionsInComdats);
535541
}
536542

@@ -542,25 +548,18 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
542548
// that is OK as all we do is delete things and add pointers to unordered
543549
// sets.
544550
for (Function *DeadF : DeadFunctions) {
551+
CG.markDeadFunction(*DeadF);
545552
// Get the necessary information out of the call graph and nuke the
546553
// function there. Also, clear out any cached analyses.
547554
auto &DeadC = *CG.lookupSCC(*CG.lookup(*DeadF));
548555
FAM.clear(*DeadF, DeadF->getName());
549556
AM.clear(DeadC, DeadC.getName());
550-
auto &DeadRC = DeadC.getOuterRefSCC();
551-
CG.removeDeadFunction(*DeadF);
552557

553558
// Mark the relevant parts of the call graph as invalid so we don't visit
554559
// them.
555560
UR.InvalidatedSCCs.insert(&DeadC);
556-
UR.InvalidatedRefSCCs.insert(&DeadRC);
557-
558-
// If the updated SCC was the one containing the deleted function, clear it.
559-
if (&DeadC == UR.UpdatedC)
560-
UR.UpdatedC = nullptr;
561561

562-
// And delete the actual function from the module.
563-
M.getFunctionList().erase(DeadF);
562+
UR.DeadFunctions.push_back(DeadF);
564563

565564
++NumDeleted;
566565
}

llvm/lib/Transforms/Utils/CallGraphUpdater.cpp

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,16 +67,20 @@ bool CallGraphUpdater::finalize() {
6767

6868
FAM.clear(*DeadFn, DeadFn->getName());
6969
AM->clear(*DeadSCC, DeadSCC->getName());
70-
LCG->removeDeadFunction(*DeadFn);
70+
LCG->markDeadFunction(*DeadFn);
7171

7272
// Mark the relevant parts of the call graph as invalid so we don't
7373
// visit them.
7474
UR->InvalidatedSCCs.insert(DeadSCC);
7575
UR->InvalidatedRefSCCs.insert(&DeadRC);
76+
UR->DeadFunctions.push_back(DeadFn);
77+
} else {
78+
// The CGSCC infrastructure batch deletes functions at the end of the
79+
// call graph walk, so only erase the function if we're not using that
80+
// infrastructure.
81+
// The function is now really dead and de-attached from everything.
82+
DeadFn->eraseFromParent();
7683
}
77-
78-
// The function is now really dead and de-attached from everything.
79-
DeadFn->eraseFromParent();
8084
}
8185
}
8286

llvm/test/Other/cgscc-refscc-mutation-order.ll

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@
1515
; CHECK-NOT: InstCombinePass
1616
; CHECK: Running pass: InstCombinePass on f4
1717
; CHECK-NOT: InstCombinePass
18-
; CHECK: Running pass: InstCombinePass on f1
19-
; CHECK-NOT: InstCombinePass
2018

2119
@a1 = alias void (), ptr @f1
2220
@a2 = alias void (), ptr @f2

llvm/test/Other/devirt-invalidated.ll

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
; RUN: opt -passes='devirt<0>(inline)' < %s -S | FileCheck %s
22

3-
; CHECK-NOT: internal
43
; CHECK: define void @e()
5-
; CHECK-NOT: internal
64

75
define void @e() {
86
entry:

llvm/test/Transforms/Attributor/ArgumentPromotion/live_called_from_dead.ll

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@ define internal i32 @caller(ptr %B) {
3737
; CGSCC-LABEL: define {{[^@]+}}@caller
3838
; CGSCC-SAME: () #[[ATTR0]] {
3939
; CGSCC-NEXT: [[A:%.*]] = alloca i32, align 4
40-
; CGSCC-NEXT: [[A2:%.*]] = alloca i8, i32 0, align 4
4140
; CGSCC-NEXT: [[A1:%.*]] = alloca i8, i32 0, align 4
4241
; CGSCC-NEXT: ret i32 0
4342
;

0 commit comments

Comments
 (0)