-
Notifications
You must be signed in to change notification settings - Fork 10.5k
SILOptimizer: Add a new TempRValue optimization pass #11361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci Please test |
Reviewing now. |
@swift-ci smoke benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a quick look through of some small things. I want to work through some test cases real quick.
} | ||
|
||
// Increment the instruction iterator now in case the CopyInst is deleted. | ||
++II; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just increment at line 1382?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it would crash if the next instruction after the copy_addr is one that is deleted in the optimization, e.g. destroy_addr.
Very unlikely but it could happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, I'll add a comment :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
for (auto &BB : *getFunction()) { | ||
auto II = BB.begin(); | ||
while (II != BB.end()) { | ||
CopyAddrInst *CopyInst = dyn_cast<CopyAddrInst>(&*II); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wrote the type twice. Use auto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I always do. But I admit, I have no idea if it is against any of our coding styles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LLVM style guide basically says if the type is already on the line, use auto. So in this case, CopyAddrInst is the template parameter of dyn_cast. So putting CopyAddrInst on the same line is redundant. I.e. this gives the same info to the reader:
auto *CopyInst = dyn_cast<CopyAddrInst>(&*II);
Here is the link:
http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to make this change, feel free to do it in a follow on commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
// Increment the instruction iterator now in case the CopyInst is deleted. | ||
++II; | ||
|
||
if (CopyInst && CopyInst->getSrc() == CopyInst->getDest()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you incremented earlier, you could combine the if statements at line 1393 and line 1384
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been tempted to represent deleted copies as identity copies, but while working on this PR I had a vague recollection that we had some rule against identity copies. I actually think this is a good way to do it as long we can guarantee any identity copies are removed before IRGen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe to do unconditionally with an unrelated identity copy_addr [take] or copy_addr [init]? (Not sure). By unrelated I mean a copy_addr that was not optimized by the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Andy said, there should not be any identity copies, except coming from this optimization.
But it is a good question. I think even if there would be identity copies, we would be safe:
copy_addr %a to %a
and
copy_addr [take] %a to [initialize] %a
are no-ops and are safe to be removed.
copy_addr %a to [initialize] %a
is illegal because we would initialize over an already initialized location.
copy_addr [take] %a to %a
would effectively be a destroy of %a with letting the object be stored to %a. Kind of illegal. I'm sure we are not generating such a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on how the copy_addr is implemented behind the scenes. For instance, if you look at how SIL.rst defines the copy_addr [take]
's algorithm in its loadable type form, it is essentially loading a value, releasing, and storing the value again. In the case of a protocol witness or a class, this would not necessarily be incorrect.
That being said, using copy_addr for such a thing is a stretch. We should just ban it via an assertion in copy_addr's constructor and maybe in the verifier?
// TODO: handle non-destructive projections of enums | ||
// (unchecked_take_enum_data_addr of Optional is nondestructive.) | ||
switch (user->getKind()) { | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like your indenting is off here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it may be good to handle unchecked_take_enum_data_addr of Optional (no other enums). Optional is pervasive enough, it would be good to handle it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. I think we should do this as a follow up only on master - just to keep this patch as simple as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
case ValueKind::CopyAddrInst: { | ||
// copy_addr which read from the temporary are like loads. | ||
// TODO: Handle copy_addr [take]. But this doesn't seem to be important. | ||
CopyAddrInst *copyFromTmp = cast<CopyAddrInst>(user); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *
?
if (copyInst->isTakeOfSrc() || !copyInst->isInitializationOfDest()) | ||
return false; | ||
|
||
AllocStackInst *tempObj = dyn_cast<AllocStackInst>(copyInst->getDest()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *
?
while (!tempObj->use_empty()) { | ||
Operand *use = *tempObj->use_begin(); | ||
SILInstruction *user = use->getUser(); | ||
switch (user->getKind()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you use a switch here to be ultra conservative b/c these are the only users that you support (rather than just RAUWing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was really dead-set on not rewriting replace-all-uses, but this is really a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
@@ -477,6 +477,9 @@ SILPassPipelinePlan::getPerformancePassPipeline(const SILOptions &Options) { | |||
// stdlib. | |||
addPerfEarlyModulePassPipeline(P); | |||
|
|||
// Cleanup after SILGen: remove trivial copies to temporaries. | |||
P.addTempRValueOpt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move P.addTempRValueOpt();
into addPerfEarlyModulePassPipeline
or addHighLevelEarlyLoopOptPipeline
? We don't seem to add any passes explicitly inside getPerformancePassPipeline and usually do it inside helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. If I have to make some other changes I'll do it. Otherwise in a follow-up commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to a separate pass is safer and more likely to catch what SILGen used to handle.
I don't see anything that needs to be fixed before committing.
/// only written by the initializing copy. | ||
/// | ||
/// 2. No instructions between the copy and the destroy of its destination | ||
/// writes to the source. This is sufficient to prove the copy is unnecesary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are wrong now, but i'll just fix them in a follow up PR.
// Increment the instruction iterator now in case the CopyInst is deleted. | ||
++II; | ||
|
||
if (CopyInst && CopyInst->getSrc() == CopyInst->getDest()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been tempted to represent deleted copies as identity copies, but while working on this PR I had a vague recollection that we had some rule against identity copies. I actually think this is a good way to do it as long we can guarantee any identity copies are removed before IRGen.
while (!tempObj->use_empty()) { | ||
Operand *use = *tempObj->use_begin(); | ||
SILInstruction *user = use->getUser(); | ||
switch (user->getKind()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was really dead-set on not rewriting replace-all-uses, but this is really a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question, an ask for comments, and some nits.
NumLoadsFound++; | ||
|
||
// If this is the last use of the temp and modifies the source it is ok. | ||
if (NumLoadsFound == useInsts.size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here, explaining why this is safe?
To me it was non-obvious and I had to read the rest of the file to understand why. From what I can tell, this is only safe since you need to know that the only "load" (from collect loads) that can use the temp and modify the source is a full copy_addr that is an identity transformation of the memory (i.e. releasing the original and storing the copy into the original's location). Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. But this is very simple: it just checks if there is no (potential) modification of the source before the last use of the temp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. The comment is totally misleading. I'll fix it.
case ValueKind::TupleElementAddrInst: | ||
case ValueKind::LoadInst: | ||
case ValueKind::LoadBorrowInst: | ||
use->set(copyInst->getSrc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not 100% thought about this, but is it safe to do this unconditionally on a copy_addr without considering its flags?
I forgot to delete this note to myself at the bottom of the file. Please ignore ; ).
// Increment the instruction iterator now in case the CopyInst is deleted. | ||
++II; | ||
|
||
if (CopyInst && CopyInst->getSrc() == CopyInst->getDest()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe to do unconditionally with an unrelated identity copy_addr [take] or copy_addr [init]? (Not sure). By unrelated I mean a copy_addr that was not optimized by the algorithm.
for (auto &BB : *getFunction()) { | ||
auto II = BB.begin(); | ||
while (II != BB.end()) { | ||
CopyAddrInst *CopyInst = dyn_cast<CopyAddrInst>(&*II); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LLVM style guide basically says if the type is already on the line, use auto. So in this case, CopyAddrInst is the template parameter of dyn_cast. So putting CopyAddrInst on the same line is redundant. I.e. this gives the same info to the reader:
auto *CopyInst = dyn_cast<CopyAddrInst>(&*II);
Here is the link:
http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable
This is a separate optimization that detects short-lived temporaries that can be eliminated. This is necessary now that SILGen no longer performs basic RValue forwarding in some cases. SR-5508: Performance regression in benchmarks caused by removing SILGen peephole for LoadExpr in +0 context
@swift-ci Please test |
@swift-ci Please smoke benchmark |
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
// For some reason, not all normal uses have been seen between the copy and | ||
// the end of the initialization block. We should never reach here. | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use an assert or unreachable here so in debug builds, we catch this. I am fine with this in a follow-on commit.
@swift-ci Please test |
1 similar comment
@swift-ci Please test |
NumLoadsFound++; | ||
|
||
// If this is the last use of the temp we are ok. After this point, | ||
// modifications to the source don't matter anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the discussion on these comments. It isn't really that simple. The order that we check for last uses and memory writes is subtle and important. It needs to be ordered this way to handle copy_addr uses, which could possibly write-back copyDest
into copySrc
. This won't work though if we ever handle an apply
that takes copyDest
as a read-only argument but may write to copySrc
.
@swift-ci test |
1 similar comment
@swift-ci test |
Hmmm... I guess CI isn't back yet. |
CI will be back later tonight. |
@swift-ci smoke test |
@swift-ci Please smoke benchmark |
PR testing is not supported after PR has been merged. |
This is a separate optimization that detects short-lived temporaries that can be eliminated.
This is necessary now that SILGen no longer performs basic RValue forwarding in some cases.
SR-5508: Performance regression in benchmarks caused by removing SILGen peephole for LoadExpr in +0 context
This is another attempt for #11328
and #11350
I did run the benchmarks locally and it recovers all regressions from #11026