Skip to content

Commit 2c7e348

Browse files
committed
EscapeAnalysis: rework graph update and merge algorithms
The two major take aways from this patch are: (1) Impose graph structure and reduce superfluous nodes and edges. Incrementally make the connection graph and the APIs used to construct it more structured. _This allows node properties based on the SILValue to be reliably added to nodes_ Although that was the initial motiviation, there are other benefits. Non-content nodes now have verifiable SILValues. Content nodes now have meaningful SILValues even though they can't be guaranteed due to merging. As a result it is *much* easier to debug the graph and correlate it with the SIL. Rather than a web of connection graph nodes with no identity and edges that don't correspond to anything in SIL, the graph nodes now have value number that correspond to the instruction used to dereference the node. The edges also exhibit structure now. A pointsTo edge now (in practice) always corresponds to a real pointer deference in the SIL. Doing this required more than just adding some helpers, it was also necessary to rewrite the graph merge and update algorithms. (2) Split up underlying functionality into more explicit steps Breaks apart the most complex parts of the graph algorithms into small self-contained, self-checked steps. The purpose of each step is clear and it's possible to reason about correctness from basic invariants. Each merge step can now run full graph verification. This was also done to move toward an invariant that the graph is never mutated during a query. But to finish that goal, we need to add a use-point query. With that, there will be no node creation, use point propagation, new defer edges, etc. after graph building. At the very least, this will make it sane to debug the output of the analysis. --- Here is a change-by-change description in diff order: Replace `updatePointsTo` with `initializePointsTo` and `mergePointsTo`. Merging is very simple on its own. Initialization requires some extra consideration for graph invariants. This separation makes it possible to write stong asserts and to independently reason about the correctness of each step based on static invariants. Replace `getContentNode` with `createContentNode`, and add two higher level APIs `createMergedContent`, and `getFieldContent`. This makes explicit the important cases of merging nodes and creating a special nodes for class fields. This slightly simplifies adding properties to content nodes and helps understand the structure of the graph. Factor out an `escapeContentsOf` helper for use elsewhere... Add a `getValueContent` helper. This is where we can tie the properties of content nodes to the address values that are used to address that content. This now also ensures that a Value node's value field is consistent with all SILValues that map to it. Add -escapes-internal-verify to check that the graph is in a valid state after every merge or update step. This verification drove the partial rewrite of mergeAllScheduledNodes. ConnectionGraph::defer implementation: explictly handle the three possible cases of pointsTo initialization or pointsTo merging at the top level, so that those underlying implementations do not need to dynamically handle weirdly different scenarios. ConnectionGraph::initializePointsTo implementation: this simplified implementation is possible by relying on invariants that can be checked at each merge/update step. The major functional difference is that it avoids creating unnecessary pointsTo edges. The previous implementation often created pointsTo edges when adding defer edges just to be conservative. Fixing this saved my sanity during debugging because the pointsTo edges now always correspond to a SIL operations that dereference the pointer. I'm also arguing without evidence that this should be much more efficient. ConnectionGraph::mergeAllScheduledNodes implementation: Add verification to each step so that we can prove the other utilities that are used while merging aren't making incorrect assumptions about the graph state. Remove checks for merged nodes now that the graph is consistently valid. Also remove a loop at the end that didn't seem to do anything. The diff is impossible to review, but the idea is basically the same. As long as it's still possible to scan through the steps in the new code without getting totally lost, then the goal was achieved. ConnectionGraph::mergePointsTo: This is extremely simple now. In all the places where we used to call updatePointsTo, and now call mergePointsTo, it's a lot easier for someone debugging the code to reason about what could possibly happen at that point. `createMergedContent` is a placeholder for transferring node properties. The `getFieldContent` helper may seem silly, but I find it helpful to see all the important ways that content can be created in one place next to the createContentNode, and I like the way that the creation of the special "field content" node is more explicit in the source. ConnectionGraph::mergeFrom implementation: this is only a minor cleanup to remove some control flow nesting and use the CGNodeWorklist abstraction. In AnalyzeInstruction, add EscapeAnalysis::getValueContent helper. It eliminates an extra step of going through the value node to get at its content node. This is where we can derive content node properties from the SILValue that dereferences the content. We can update the content node's associated value 'V' if it's useful. It's also a place to put assertions specific to the first level of content. In AnalyzeInstruction, Array semantic calls: add support for getValueContent so we can derive node properties. This is also nice because it's explicit about which nodes are value content vs. field content. In AnalyzeInstruction, cleanup Release handling: use the explicit APIs: getValueContent, getFieldContent, and escapeContentsOf. In AnalyzeInstruction, assert that load-like things can't produce addresses. In AnalyzeInstruction, add comments to clarify object projection handling. In AnalyzeInstruction, add comments to explain store handling. In AnalyzeInstruction, drop the assumption that all partial applies hold pointers. In AnalyzeInstruction, handle aggregates differently so that Value nodes are always consistent with their SILValue and can be verified. Aggregates nodes are still coalesced if they only have a single pointer-type subelement. If we arbitrarily coalesced an aggregate with just one of its subelements then there would be no consistent way to identify the value that corresponds to a connection graph node.
1 parent cd3ada5 commit 2c7e348

File tree

3 files changed

+700
-468
lines changed

3 files changed

+700
-468
lines changed

include/swift/SILOptimizer/Analysis/EscapeAnalysis.h

Lines changed: 67 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -403,17 +403,7 @@ class EscapeAnalysis : public BottomUpIPAnalysis {
403403

404404
/// Checks an invariant of the connection graph: The points-to nodes of
405405
/// the defer-successors must match with the points-to of this node.
406-
bool matchPointToOfDefers() const {
407-
for (CGNode *Def : defersTo) {
408-
if (pointsTo != Def->pointsTo)
409-
return false;
410-
}
411-
/// A defer-path in the graph must not end without the specified points-to
412-
/// node.
413-
if (pointsTo && !pointsToIsEdge && defersTo.empty())
414-
return false;
415-
return true;
416-
}
406+
bool matchPointToOfDefers(bool allowMerge = false) const;
417407

418408
friend class CGNodeMap;
419409
friend class ConnectionGraph;
@@ -480,7 +470,7 @@ class EscapeAnalysis : public BottomUpIPAnalysis {
480470
llvm_unreachable("Unhandled EscapeState in switch.");
481471
}
482472

483-
/// Returns the content node if of this node if it exists in the graph.
473+
/// Returns the content node of this node if it exists in the graph.
484474
CGNode *getContentNodeOrNull() const {
485475
return pointsTo;
486476
}
@@ -603,14 +593,25 @@ class EscapeAnalysis : public BottomUpIPAnalysis {
603593
}
604594
}
605595

596+
/// Initialize the 'pointsTo' fields of all nodes in the defer web of \p
597+
/// initialiNode.
598+
///
599+
/// If \p createEdge is true, a proper pointsTo edge will be created from \p
600+
/// initialNode to \p pointsTo.
601+
void initializePointsTo(CGNode *initialNode, CGNode *newPointsTo,
602+
bool createEdge = false);
603+
604+
void initializePointsToEdge(CGNode *initialNode, CGNode *newPointsTo) {
605+
initializePointsTo(initialNode, newPointsTo, true);
606+
}
607+
606608
/// Merges all nodes which are added to the ToMerge list.
607609
void mergeAllScheduledNodes();
608610

609-
/// Transitively updates pointsTo of all nodes in the defer-edge web,
610-
/// starting at \p InitialNode.
611-
/// If a node in the web already points to another content node, the other
612-
/// content node is scheduled to be merged with \p pointsTo.
613-
void updatePointsTo(CGNode *InitialNode, CGNode *pointsTo);
611+
/// Transitively update pointsTo of all nodes in the defer-edge web,
612+
/// reaching and reachable from \p initialNode. All nodes in this defer web
613+
/// must already have an initialized `pointsTo`.
614+
void mergePointsTo(CGNode *initialNode, CGNode *pointsTo);
614615

615616
/// Utility function to clear the isInWorkList flags of all nodes in
616617
/// \p WorkList.
@@ -627,12 +628,20 @@ class EscapeAnalysis : public BottomUpIPAnalysis {
627628
/// Returns null, if V is not a "pointer".
628629
CGNode *getNode(ValueBase *V, bool createIfNeeded = true);
629630

630-
/// Gets or creates a content node to which \a AddrNode points to during
631-
/// initial graph construction. This may not be called after defer edges
632-
/// have been created. Doing so would break the invariant that all
633-
/// non-content nodes ultimately have a pointsTo edge to a single content
634-
/// node.
635-
CGNode *getContentNode(CGNode *AddrNode);
631+
/// Helper to create a content node and update the pointsTo graph. \p
632+
/// addrNode will point to the new content node. The new content node is
633+
/// directly initialized with the remaining function arguments.
634+
CGNode *createContentNode(CGNode *addrNode, SILValue addrVal);
635+
636+
/// Create a new content node based on an existing content node to support
637+
/// graph merging.
638+
///
639+
/// \p destAddrNode will point to to new content. The content's initial
640+
/// state will be initialized based on the \p srcContent node.
641+
CGNode *createMergedContent(CGNode *destAddrNode, CGNode *srcContent);
642+
643+
/// Get a node represnting the field data within the given RC node.
644+
CGNode *getFieldContent(CGNode *rcNode);
636645

637646
/// Get or creates a pseudo node for the function return value.
638647
CGNode *getReturnNode() {
@@ -679,31 +688,21 @@ class EscapeAnalysis : public BottomUpIPAnalysis {
679688
return Idx;
680689
}
681690

682-
/// Specifies that the node's value escapes to global or unidentified
683-
/// memory.
684-
void setEscapesGlobal(CGNode *Node) {
685-
Node->mergeEscapeState(EscapeState::Global);
686-
687-
// Make sure to have a content node. Otherwise we may end up not merging
688-
// the global-escape state into a caller graph (only content nodes are
689-
// merged). Either the node itself is a content node or we let the node
690-
// point to one.
691-
if (Node->Type != NodeType::Content)
692-
getContentNode(Node);
691+
void escapeContentsOf(CGNode *Node) {
692+
CGNode *escapedContent = Node->getContentNodeOrNull();
693+
if (!escapedContent) {
694+
escapedContent = createContentNode(Node, Node->V);
695+
}
696+
escapedContent->markEscaping();
693697
}
694698

695699
/// Creates a defer-edge between \p From and \p To.
696700
/// This may trigger node merges to keep the graph invariance 4).
697701
/// Returns the \p From node or its merge-target in case \p From was merged
698702
/// during adding the edge.
699-
/// The \p EdgeAdded is set to true if there was no defer-edge between
700-
/// \p From and \p To, yet.
701-
CGNode *defer(CGNode *From, CGNode *To, bool &EdgeAdded) {
702-
if (addDeferEdge(From, To))
703-
EdgeAdded = true;
704-
mergeAllScheduledNodes();
705-
return From->getMergeTarget();
706-
}
703+
/// \p Changed is set to true if a defer edge was added or any nodes were
704+
/// merged.
705+
CGNode *defer(CGNode *From, CGNode *To, bool &Changed);
707706

708707
/// Creates a defer-edge between \p From and \p To.
709708
/// This may trigger node merges to keep the graph invariance 4).
@@ -802,7 +801,7 @@ class EscapeAnalysis : public BottomUpIPAnalysis {
802801
void dumpCG() const;
803802

804803
/// Checks if the graph is OK.
805-
void verify() const;
804+
void verify(bool allowMerge = false) const;
806805

807806
/// Just verifies the graph structure. This function can also be called
808807
/// during the graph is modified, e.g. in mergeAllScheduledNodes().
@@ -889,9 +888,22 @@ class EscapeAnalysis : public BottomUpIPAnalysis {
889888
SILValue getPointerRoot(SILValue value) const;
890889

891890
/// If \p pointer is a pointer, set it to global escaping.
892-
void setEscapesGlobal(ConnectionGraph *ConGraph, ValueBase *pointer) {
893-
if (CGNode *Node = ConGraph->getNode(pointer))
894-
ConGraph->setEscapesGlobal(Node);
891+
void setEscapesGlobal(ConnectionGraph *conGraph, ValueBase *pointer) {
892+
CGNode *Node = conGraph->getNode(pointer);
893+
if (!Node)
894+
return;
895+
896+
if (Node->isContent()) {
897+
Node->markEscaping();
898+
return;
899+
}
900+
Node->mergeEscapeState(EscapeState::Global);
901+
902+
// Make sure to have a content node. Otherwise we may end up not merging
903+
// the global-escape state into a caller graph (only content nodes are
904+
// merged). Either the node itself is a content node or we let the node
905+
// point to one.
906+
conGraph->escapeContentsOf(Node);
895907
}
896908

897909
/// Gets or creates FunctionEffects for \p F.
@@ -902,6 +914,15 @@ class EscapeAnalysis : public BottomUpIPAnalysis {
902914
return FInfo;
903915
}
904916

917+
/// Get or create the node representing the memory pointed to by \p
918+
/// addrVal. If \p addrVal is an address, then return the content node for the
919+
/// variable's memory. Otherwise, \p addrVal may contain a reference, so
920+
/// return the content node for the referenced heap object.
921+
///
922+
/// Note that \p addrVal cannot be an address within a heap object, such as
923+
/// an address from ref_element_addr or project_box.
924+
CGNode *getValueContent(ConnectionGraph *conGraph, SILValue addrVal);
925+
905926
/// Build a connection graph for reach callee from the callee list.
906927
bool buildConnectionGraphForCallees(SILInstruction *Caller,
907928
CalleeList Callees,

0 commit comments

Comments
 (0)