-
Notifications
You must be signed in to change notification settings - Fork 10.5k
⚠️Add a new TempRValue optimization to the CopyForwarding pass. #11328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a separate optimization that detects short-lived temporaries that can be eliminated. This is necessary now that SILGen no longer performs basic RValue forwarding in some cases. SR-5508: Performance regression in benchmarks caused by removing SILGen peephole for LoadExpr in +0 context
@swift-ci test. |
@swift-ci benchmark. |
Build comment file:Optimized (O)Regression (5)
Improvement (4)
No Changes (318)
Unoptimized (Onone)Regression (2)
Improvement (11)
No Changes (314)
Hardware Overview
|
// copy_addr [initialization] and destroy_addr. | ||
TempRValue rvalue; | ||
rvalue.copy = copyInst; | ||
for (auto *useOper : copyDest->getUses()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking if the user is in the same block as the copy_addr here might save some compile time.
Consider a very large basic block with lots of copy_addrs where this is not the case. You would iterate over all instructions of the basic block for all copy_addrs (which is quadratic).
@@ -1291,6 +1410,11 @@ class CopyForwardingPass : public SILFunctionTransform | |||
} | |||
} | |||
|
|||
// Eliminate short-lived temporaries. | |||
for (auto &rvalue : RValueCopies) { | |||
removeTempRValue(rvalue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always guaranteed that the instructions which are deleted here are not referenced in the subsequent code?
E.g. if there are 2 copy_addrs referencing the same alloc_stack
Maybe it's saver that removeTempRValue just does a replaceAllUsesWith and let the alloc/dealloc/destroy/copy_addr be deleted by a subsequent cleanup pass.
This would also simplify the code a bit, because you would not need the TempRValue struct.
rvalue.dealloc = cast<DeallocStackInst>(user); | ||
continue; | ||
} | ||
if (user->mayWriteToMemory()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this check is needed here, but it does not harm.
If all uses have to be in the same block you would catch wring instructions in the code below.
/// writes to memory. This is sufficient to prove the copy is unnecesary. | ||
/// | ||
/// Return the single dealloc_stack corresponding to the temporary storage for | ||
/// teh copy's destination. Return null if the copy does not initialize an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: teh
Removing these obvious copies causes as many new regressions as it resolves. We should wait until we understand what really needs to be fixed here before merging this. |
…ization. This is a much more aggressive attempt at removing temporary "rvalues" inside the CopyForwarding pass. It is now handles basic switch CFGs and does some memory disambiguation. This is still insufficient to handle the case that caused the worst benchmark regressions, which are caused by UnfoldSequence. The problem is, that executes an escaping closure during the temporary rvalue's lifetime. The only way around that is to fully analyze both sides of the copy down through all uses to prove they are completely nonescaping. Note, the original intention of the CopyForwarding pass was a very quick Onone peephole that did not require escape analysis or alias analysis.
This version successfully optimizes UnfoldSequence, which is the holy grail. Unfortunately, there's a problem building the stdlib which seems to be a use-after-free. Note that we need to remove copied from CopiedDefs when we eliminate them, but it doesn't seem to be working properly. Once that's fixed, we need to run the benchmarks.
Thanks Andy! |
I can reproduce the crash. I am investigating. |
|
I think this is the test case that is causing the failure:
It is something related to the CopiedDef list having a dangling pointer in it. I need to spend a little more time with this. |
I found the problem. The problem is that CopiedDefs is removing the copy instruction instead of the copyDest. When I fix that, I hit a different issue later in the pipeline. I am looking at it now. |
My patch:
|
@eeckstein I am going to close this and open one with my own branch. That way I can push things. I will reference this PR. |
This is a separate optimization that detects short-lived temporaries that can be
eliminated. This is necessary now that SILGen no longer performs basic RValue
forwarding in some cases.
SR-5508: Performance regression in benchmarks caused by removing SILGen peephole
for LoadExpr in +0 context