-
Notifications
You must be signed in to change notification settings - Fork 10.5k
SILOptimizer: A new "TempLValueOpt" optimization for copy_addr #32428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci test |
@swift-ci benchmark |
Build failed |
Build failed |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci test |
1 similar comment
@swift-ci test |
On Alias Analysis support for enums. This is something alias analysis should definitely handle. I'm not surprised is wasn't handled, since the code is neglected. But it's definitely worth fixing. With something so fundamental, I think it's good to have negative tests for any complex cases, particularly in aliasAddressProjection, which I'm not sure I fully understand. The way aliasAddressProjection and aliasInner are mutually recursive makes it hard to reason about. We obviously need to be very careful about introducing false positives here. For example, we should test that While we're at it: More intersting: What about the fact that existential storage can actually alias even though the existential values are from different storage addresses? I think you at least need to come up with a proof that this it's generally safe to treat them as non-aliasing. I don't think it's as simple as "it makes sense for alias analysis to look through them". How do we know it's always correct to "look through them"? |
On |
On SILCombiner::combineCopyAndDestroy... In principle, I don't like scanning the instruction stream within a SILCombineVisitor. SILCombine should only be for SSA-based peepholes. (I have the same criticism of This particular case might be justifiable in practice, since it bails at the first instruction that
CopyForwarding::hoistDestroy can easily be run as a separate pass. (The implementation is mostly in CopyForwarding::hoistDestroy is not limited to a single basic block and not limited to a single destroy. Any case that would be handled by CopyForwarding::hoistDestroy currently assumes a "known-use" model. As with most our analyses, we really need both a "known-use" and "known-instruction-window" mode. The known-use mode is great because it's very efficient, its cost does not scale with the size of the function, and it handles arbitrarily complex code. Of course, it doesn't handle any SSA def-use pattern that we haven't specifically taught it to recognize. A known-instruction-window mode is good for handling many patterns where we can't analyze the a value's source and can't find all aliases in the known-use mode. It's expensive, because it needs Alias and Escape Analysis, but important at -O. (CopyForwarding was originally supposed to be a -Onone peephole). So, in short, we can just add an AliasAnalysis query to the existing destroy hoisting and easily separate it from Note that the AliasAnalysis query needs to bail on anything that can either read, write, or release memory, so it's fairly limited, but could still handle cases where the destroy is close to the copy, but we can't analysis the copy's source address for some reason. Somehow, your current implementation seems to be ignoring cases like this:
|
From what I can tell so far, tempLValueOpt is an attempt to add AliasAnalysis to CopyForwarding. I think it's a very bad idea for SILCombine to depend on Alias/Escape Analysis. SILCombine should be SSA peepholes that can rerun frequently after other transforms without recomputing interprocedural analysis. (We really need to do Scanning the instructions in SILCombine also doesn't make sense to me. That's quadratic even if SILCombine does nothing, and cubic or worse of SILCombine makes changes--multiplied by the cost of the alias analysis queries. I can't even reason about what happens in the presence of other SILCombines that invalidate alias and escape analysis or adds new instructions that are unrecognized by the previous analysis. It would be fine to just add AliasAnalysis to CopyForwarding, since we're not using it as the -Onone peephole that it was meant to be. I haven't looked yet at why CopyForwarding doesn't handle these patterns without your change, but there are two reasonable approaches:
OR
|
Ok, TempLValue is actually a different algorithm than backwardPropagateCopy. This is really just handling the case of a true "temporary":
...with no other uses of the temp outside the range of instructions from So this could be a separate TempLValue pass. It's easy to extract destroy hoisting out of CopyForwarding since this new pass also requires it. Should I do that? I'm still not sure why backwardPropagateCopy wouldn't handle these cases if it used AliasAnalysis, but it may make sense to separate passes that depend on AliasAnalysis from those that don't. |
@atrick Thanks for reviewing!
|
@swift-ci test |
1 similar comment
@swift-ci test |
Build failed |
Build failed |
This is mostly for the record: The problem with CopyForwarding is that it's analyzing copy's destination just so it doesn't need to check AliasAnalysis later. One problem is that findAddressRootAndUsers only looks through one level of But the more basic problem is that |
It would be super helpful to describe this analysis and the transformation in a way that someone can reason about. It's simple enough to specify in a comment:
Also, explain why this does not handle reads from %dest:
Are you just assuming there needs to be a write to %dest before the initialization? I'm not sure where that's guaranteed. I'm not even sure what a "write" has to do with a "destroy". You could either somehow check that %dest isn't be destroyed, or just check for both reads and writes--then you don't care if the copy_addr is an initialization. Actually, I'm not even sure why there needs to be a destroy at all before the [initialization] for concrete types. The original problem with CopyForwarding is that it wanted SILGen to produce straightforward initialize-destroy patterns. That was a deliberate assumption for something that was supposed to run right after SILGen, but not verified in canonical SIL. Eventually we wanted to rerun CopyForwarding later and decided passes shouldn't make unverified SIL assumptions (unless the pass happens to be ARC). |
@swift-ci test |
1 similar comment
@swift-ci test |
You are right, destroy_addr is not modeled as "write". I have to check for reads and writes. |
Build failed |
Build failed |
@swift-ci test |
@swift-ci test |
Looks like we need to model a take (e.g. load [take], copy_addr [take]) as a write to memory. I added a commit which fixes this. It can cause memory lifetime failures in OSSA. |
destroy->eraseFromParent(); | ||
return true; | ||
} | ||
if (inst->mayWriteToMemory()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's where destroy hoisting does not check for reads. Maybe you didn't push the latest change yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I forgot to change this. I pushed a new version with a fix
@swift-ci smoke test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the two comment blocks in destroy hoisting.
…ntial_addr. Look "through" those instructions when trying to find the underlying objects.
Currently we only have load [take] in OSSA which needed to be changed. (copy_addr is not handled in MemBehavior at all, yet) Even if the memory is physically not modified, conceptually it's "destroyed" when the value is taken. Optimizations, like TempRValueOpt rely on this behavior when the check for may-writes. This fixes a MemoryLifetime failure in TempRValueOpt.
Only let copy_addr have side effects if the source or destination really aliases with the address in question.
Optimizes copies from a temporary (an "l-value") to a destination. %temp = alloc_stack $Ty instructions_which_store_to %temp copy_addr [take] %temp to %destination dealloc_stack %temp is optimized to destroy_addr %destination instructions_which_store_to %destination The name TempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side. The TempLValueOpt is similar to CopyForwarding::backwardPropagateCopy. It's more restricted (e.g. the copy-source must be an alloc_stack). That enables other patterns to be optimized, which backwardPropagateCopy cannot handle. This pass also performs a small peephole optimization which simplifies copy_addr - destroy sequences. copy_addr %source to %destination destroy_addr %source is replace with copy_addr [take] %source to %destination
@swift-ci smoke test |
Optimizes copies from a temporary (an "l-value") to a destination.
is optimized to
The name tempLValueOpt refers to the TempRValueOpt pass, which performs a related transformation, just with the temporary on the "right" side.
The tempLValueOpt is similar to CopyForwarding::backwardPropagateCopy.
It's more restricted (e.g. the copy-source must be an alloc_stack, the copy-destination must be initialized).
That enables other patterns to be optimized, which backwardPropagateCopy cannot handle.
This PR also contains some other small improvements related to copy_addr and other address instructions. For details see the commit messages.
The effect of this optimizations is that code with address-only types (e.g. generic functions) has significantly less copy instructions, especially initializations of address-only enums (e.g. optionals).
This is a benefit for performance and code size and can also reduce the amount of generic metadata which is generated at runtime.