-
Notifications
You must be signed in to change notification settings - Fork 10.5k
SILCloner and SILInliner rewrite. #19786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Mostly functionally neutral: - may fix latent bugs. - may reduce useless basic blocks after inlining. This rewrite encapsulates the cloner's internal state, providing a clean API for the CRTP subclasses. The subclasses are rewritten to use the exposed API and extension points. This makes it much easier to understand, work with, and extend SIL cloners, which are central to many optimization passes. Basic SIL invariants are now clearly expressed and enforced. There is no longer a intricate dance between multiple levels of subclasses operating on underlying low-level data structures. All of the logic needed to keep the original SIL in a consistent state is contained within the SILCloner itself. Subclasses only need to be responsible for their own modifications. The immediate motiviation is to make CFG updates self-contained so that SIL remains in a valid state. This will allow the removal of critical edge splitting hacks and will allow general SIL utilities to take advantage of the fact that we don't allow critical edges. This rewrite establishes a simple principal that should be followed everywhere: aside from the primitive mutation APIs on SIL data types, each SIL utility is responsibile for leaving SIL in a valid state and the logic for doing so should exist in one central location. This includes, for example: - Generating a valid CFG, splitting edges if needed. - Returning a valid instruction iterator if any instructions are removed. - Updating dominance. - Updating SSA (block arguments). (Dominance info and SSA properties are fundamental to SIL verification). LoopInfo is also somewhat fundamental to SIL, and should generally be updated, but it isn't required. This also fixes some latent bugs related to iterator invalidation in recursivelyDeleteTriviallyDeadInstructions and SILInliner. Note that the SILModule deletion callback should be avoided. It can be useful as a simple cache invalidation mechanism, but it is otherwise bug prone, too limited to be very useful, and basically bad design. Utilities that mutate should return a valid instruction iterator and provide their own deletion callbacks.
More lines were added in this PR as a result of many more comments, And in various SILCLoner extensions: |
@swift-ci test. |
@swift-ci test source compatibility. |
@swift-ci benchmark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the core approach of putting the value map directly in the cloner and making SILInliner
not directly a cloner. A few minor comments.
ValueMap[DestPHIArg] = SILValue(BlockArg); | ||
// Since we don't call any CFG cloning entry point, we can call | ||
// `foldValue` immediately as if cloning has already started. This simply | ||
// avoids handling AvailVals during `remap` or defining a custome |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: custome
And you've copy-pasted this comment a few times.
/// | ||
/// *NOTE*: This attempts to perform inlining unconditionally and thus asserts | ||
/// if inlining will fail. All users /must/ check that a function is allowed | ||
/// to be inlined using SILInliner::canInlineFunction before calling this | ||
/// to be inlined using SILInliner::canInlineApplyAite before calling this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
/// whenever the visitor that clones an instruction skips doPostProcess(). | ||
void foldValue(SILValue origValue, SILValue mappedValue) { | ||
ValueMap.insert({origValue, mappedValue}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like the most intuitive name for this method; whenever I see it, I expect it to be a more involved operation, not just updating some internal state. setMappedValue
, maybe?
Should this assert that the insertion took effect? If origValue
is already mapped, this call is implicitly a no-op, which sounds like a source of bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added an assert that values aren't remapped multiple times.
I'm not attached to the name, but...
- the SILCloner public API has no notion of a "map" at all.
- during cloning, the visitors have to call one of these two APIs for each original value:
doPostProcess
orfoldValue
.foldValue
should only be called when there is no new cloned value-producer on which to calldoPostProcess
. This happens whenever the original value-producer is folded during cloning, typically because of type/constant propagation. foldValue
is definitely more meaningful thandoPostProcess
but I'd be happy to rename both APIs if I had a better idea.
I added some comments:
81fa786
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so it's a more meaningful name when called within the cloner.
I'm not trying to insist that you rename this, but maybe setValueInClone
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point I'm getting at is that it's important for the cloner implementation not to directly map values unless it has folded away an instruction in the cloned code. Otherwise, it must go through the usual doPostProcess
API, which I'm now calling recordClonedInstruction
. Hopefully this makes things more clear:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that works for me.
include/swift/SIL/SILCloner.h
Outdated
newF.getBlocks().splice(SILFunction::iterator(insertBeforeBB), | ||
newF.getBlocks(), | ||
SILFunction::iterator(MappedBB)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a SILFunction::createBasicBlockBefore
function now.
@@ -53,45 +53,42 @@ unwind: | |||
// CHECK: [[TEMP:%.*]] = alloc_stack $Indirect<SomeSubclass> | |||
// CHECK: [[MK_IND:%.*]] = function_ref @make_indirect | |||
// CHECK: apply [[MK_IND]]<SomeSubclass>([[TEMP]]) | |||
// CHECK: br bb3 | |||
// CHECK: destroy_addr [[TEMP]] : $*Indirect<SomeSubclass> | |||
// CHECK: cond_br %0, bb3, bb5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes to test cases are just about visitation order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I tried to ensure the visitation order was the same as before. This is a result of generalizing the fall-through to ReturnBB optimization. So, the inliner does a bit more optimization on the fly now, which just fell out of simplifying the code.
F.getBlocks().splice(F.getBlocks().end(), F.getBlocks(), | ||
SILFunction::iterator(ReturnToBB)); | ||
ReturnToBB = | ||
callerBB->split(std::next(Apply.getInstruction()->getIterator())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an intentional decision to not put the return-to block in the same relative position to the inlined code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line just splits callerBB in place. SILBasicBlock::split()
places ReturnBB right after callerBB. The call to SILCloner::cloneFunctionBody(getCalleeFunction(), callerBB, entryArgs);
will insert all inlined code immediately after callerBB and before ReturnBB. So I'm not sure what you're getting at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense.
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR. Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein). Hardware Overview
|
FYI the debug source compatibility failure in ‘fluent’ is a known declaration checker but not related to this PR |
include/swift/SIL/SILCloner.h
Outdated
// terminator. | ||
void visitInstructionsInBlock(SILBasicBlock *BB); | ||
|
||
// Visit a block's terminator. This is called with each block in DFS predorder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: s/predorder/preorder/
include/swift/SIL/SILCloner.h
Outdated
clonePhiArgs(startBB); | ||
|
||
// Premap exit blocks to terminate so that visitBlocksDepthFirst terminates | ||
// after discovering the cloned region. Mappint an exit block to itself |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Mappint/Mapping/
friend class SILCloner<SILInliner>; | ||
using SuperTy = TypeSubstCloner<SILInliner, SILOptFunctionBuilder>; | ||
|
||
class SILInliner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the #include "...TypeSubstCloner.h" is no longer needed in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah thanks. I did some #include minimization here:
c781d78
@swift-ci test. |
1 similar comment
@swift-ci test. |
A recent SILCloner rewrite removed a special case hack for single basic block callee functions: commit c6865c0 Merge: 76e6c41 9e440d1 Author: Andrew Trick <[email protected]> Date: Thu Oct 11 14:23:32 2018 Merge pull request swiftlang#19786 from atrick/silcloner-cleanup SILCloner and SILInliner rewrite. Instead, the new inliner simply merges trivial unconditional branches after inlining the return block. This way, the CFG is always in canonical state after inlining. This is more robust, and avoids interfering with subsequent SIL passes when non-single-block callees are inlined. The problem is that inlining a series of calls within a large block could result in interleaved block splitting and merging operations, which is quadratic in the block size. This showed up when inlining the tens of thousands of array subscript calls emitted for a large array initialization. The first half of the fix is to simply defer block merging until all calls are inlined. We can't expect SimplifyCFG to run immediately after inlining, nor would we want to do that, *especially* for mandatory inlining. This fix instead exposes block merging as a trivial utility. Note: by eliminating some unconditional branches, this change could reduce the number of debug locations emitted. This does not fundamentally change any debug information guarantee, and I was unable to observe any behavior difference in the debugger.
This can only be reviewed by first reading the new version of
SILCloner.h, SILInliner.h as a whole, along with the SILInlineCloner
class definition in SILInliner.cpp. The diff points out the code that
needs review but can't meaningfully be reviewed by itself.
This is not a from-scratch rewrite. I attempted to make minimal
changes to most existing SILCloner clients/subclasses--they can be
cleaned up separately. Note that I did need to rewrite SILInliner in
this PR because it so deeply violated SILCloner encapsulation.
This is a large PR, but it was already extracted from a much broader
set of changes and I don't see a meaningful way to further subdivide it.
Mostly functionally neutral:
This rewrite encapsulates the cloner's internal state, providing a
clean API for the CRTP subclasses. The subclasses are rewritten to use
the exposed API and extension points. This makes it much easier to
understand, work with, and extend SIL cloners, which are central to
many optimization passes. Basic SIL invariants are now clearly
expressed and enforced. There is no longer a intricate dance between
multiple levels of subclasses operating on underlying low-level data
structures. All of the logic needed to keep the original SIL in a
consistent state is contained within the SILCloner itself. Subclasses
only need to be responsible for their own modifications.
The immediate motiviation is to make CFG updates self-contained so
that SIL remains in a valid state. This will allow the removal of
critical edge splitting hacks and will allow general SIL utilities to
take advantage of the fact that we don't allow critical edges.
This rewrite establishes a simple principal that should be followed
everywhere: aside from the primitive mutation APIs on SIL data types,
each SIL utility is responsibile for leaving SIL in a valid state and
the logic for doing so should exist in one central location.
This includes, for example:
(Dominance info and SSA properties are fundamental to SIL verification).
LoopInfo is also somewhat fundamental to SIL, and should generally be
updated, but it isn't required.
This also fixes some latent bugs related to iterator invalidation in
recursivelyDeleteTriviallyDeadInstructions and SILInliner. Note that
the SILModule deletion callback should be avoided. It can be useful as
a simple cache invalidation mechanism, but it is otherwise bug prone,
too limited to be very useful, and basically bad design. Utilities
that mutate should return a valid instruction iterator and provide
their own deletion callbacks.