Skip to content

SILOptimizer: Add a new TempRValue optimization pass #11361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 7, 2017

Conversation

eeckstein
Copy link
Contributor

@eeckstein eeckstein commented Aug 4, 2017

This is a separate optimization that detects short-lived temporaries that can be eliminated.
This is necessary now that SILGen no longer performs basic RValue forwarding in some cases.

SR-5508: Performance regression in benchmarks caused by removing SILGen peephole for LoadExpr in +0 context

This is another attempt for #11328
and #11350

I did run the benchmarks locally and it recovers all regressions from #11026

@eeckstein eeckstein changed the title [do not merge!] SILOptimizer: Add a new TempRValue optimization pass SILOptimizer: Add a new TempRValue optimization pass Aug 5, 2017
@eeckstein
Copy link
Contributor Author

@swift-ci Please test

@eeckstein eeckstein requested review from gottesmm and atrick August 5, 2017 00:46
@gottesmm
Copy link
Contributor

gottesmm commented Aug 5, 2017

Reviewing now.

@gottesmm
Copy link
Contributor

gottesmm commented Aug 5, 2017

@swift-ci smoke benchmark

Copy link
Contributor

@gottesmm gottesmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quick look through of some small things. I want to work through some test cases real quick.

}

// Increment the instruction iterator now in case the CopyInst is deleted.
++II;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just increment at line 1382?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it would crash if the next instruction after the copy_addr is one that is deleted in the optimization, e.g. destroy_addr.
Very unlikely but it could happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I'll add a comment :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

for (auto &BB : *getFunction()) {
auto II = BB.begin();
while (II != BB.end()) {
CopyAddrInst *CopyInst = dyn_cast<CopyAddrInst>(&*II);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote the type twice. Use auto?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I always do. But I admit, I have no idea if it is against any of our coding styles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLVM style guide basically says if the type is already on the line, use auto. So in this case, CopyAddrInst is the template parameter of dyn_cast. So putting CopyAddrInst on the same line is redundant. I.e. this gives the same info to the reader:

auto *CopyInst = dyn_cast<CopyAddrInst>(&*II);

Here is the link:

http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to make this change, feel free to do it in a follow on commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

// Increment the instruction iterator now in case the CopyInst is deleted.
++II;

if (CopyInst && CopyInst->getSrc() == CopyInst->getDest()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you incremented earlier, you could combine the if statements at line 1393 and line 1384

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been tempted to represent deleted copies as identity copies, but while working on this PR I had a vague recollection that we had some rule against identity copies. I actually think this is a good way to do it as long we can guarantee any identity copies are removed before IRGen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to do unconditionally with an unrelated identity copy_addr [take] or copy_addr [init]? (Not sure). By unrelated I mean a copy_addr that was not optimized by the algorithm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Andy said, there should not be any identity copies, except coming from this optimization.
But it is a good question. I think even if there would be identity copies, we would be safe:
copy_addr %a to %a
and
copy_addr [take] %a to [initialize] %a
are no-ops and are safe to be removed.

copy_addr %a to [initialize] %a
is illegal because we would initialize over an already initialized location.

copy_addr [take] %a to %a
would effectively be a destroy of %a with letting the object be stored to %a. Kind of illegal. I'm sure we are not generating such a thing.

Copy link
Contributor

@gottesmm gottesmm Aug 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how the copy_addr is implemented behind the scenes. For instance, if you look at how SIL.rst defines the copy_addr [take]'s algorithm in its loadable type form, it is essentially loading a value, releasing, and storing the value again. In the case of a protocol witness or a class, this would not necessarily be incorrect.

That being said, using copy_addr for such a thing is a stretch. We should just ban it via an assertion in copy_addr's constructor and maybe in the verifier?

// TODO: handle non-destructive projections of enums
// (unchecked_take_enum_data_addr of Optional is nondestructive.)
switch (user->getKind()) {
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like your indenting is off here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it may be good to handle unchecked_take_enum_data_addr of Optional (no other enums). Optional is pervasive enough, it would be good to handle it as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I think we should do this as a follow up only on master - just to keep this patch as simple as possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

case ValueKind::CopyAddrInst: {
// copy_addr which read from the temporary are like loads.
// TODO: Handle copy_addr [take]. But this doesn't seem to be important.
CopyAddrInst *copyFromTmp = cast<CopyAddrInst>(user);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto *?

if (copyInst->isTakeOfSrc() || !copyInst->isInitializationOfDest())
return false;

AllocStackInst *tempObj = dyn_cast<AllocStackInst>(copyInst->getDest());
Copy link
Contributor

@gottesmm gottesmm Aug 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto *?

while (!tempObj->use_empty()) {
Operand *use = *tempObj->use_begin();
SILInstruction *user = use->getUser();
switch (user->getKind()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you use a switch here to be ultra conservative b/c these are the only users that you support (rather than just RAUWing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was really dead-set on not rewriting replace-all-uses, but this is really a better approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -477,6 +477,9 @@ SILPassPipelinePlan::getPerformancePassPipeline(const SILOptions &Options) {
// stdlib.
addPerfEarlyModulePassPipeline(P);

// Cleanup after SILGen: remove trivial copies to temporaries.
P.addTempRValueOpt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move P.addTempRValueOpt(); into addPerfEarlyModulePassPipeline or addHighLevelEarlyLoopOptPipeline ? We don't seem to add any passes explicitly inside getPerformancePassPipeline and usually do it inside helper functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. If I have to make some other changes I'll do it. Otherwise in a follow-up commit

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to a separate pass is safer and more likely to catch what SILGen used to handle.

I don't see anything that needs to be fixed before committing.

/// only written by the initializing copy.
///
/// 2. No instructions between the copy and the destroy of its destination
/// writes to the source. This is sufficient to prove the copy is unnecesary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are wrong now, but i'll just fix them in a follow up PR.

// Increment the instruction iterator now in case the CopyInst is deleted.
++II;

if (CopyInst && CopyInst->getSrc() == CopyInst->getDest()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been tempted to represent deleted copies as identity copies, but while working on this PR I had a vague recollection that we had some rule against identity copies. I actually think this is a good way to do it as long we can guarantee any identity copies are removed before IRGen.

while (!tempObj->use_empty()) {
Operand *use = *tempObj->use_begin();
SILInstruction *user = use->getUser();
switch (user->getKind()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was really dead-set on not rewriting replace-all-uses, but this is really a better approach.

Copy link
Contributor

@gottesmm gottesmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question, an ask for comments, and some nits.

NumLoadsFound++;

// If this is the last use of the temp and modifies the source it is ok.
if (NumLoadsFound == useInsts.size())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here, explaining why this is safe?

To me it was non-obvious and I had to read the rest of the file to understand why. From what I can tell, this is only safe since you need to know that the only "load" (from collect loads) that can use the temp and modify the source is a full copy_addr that is an identity transformation of the memory (i.e. releasing the original and storing the copy into the original's location). Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. But this is very simple: it just checks if there is no (potential) modification of the source before the last use of the temp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. The comment is totally misleading. I'll fix it.

case ValueKind::TupleElementAddrInst:
case ValueKind::LoadInst:
case ValueKind::LoadBorrowInst:
use->set(copyInst->getSrc());
Copy link
Contributor

@gottesmm gottesmm Aug 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not 100% thought about this, but is it safe to do this unconditionally on a copy_addr without considering its flags?

I forgot to delete this note to myself at the bottom of the file. Please ignore ; ).

// Increment the instruction iterator now in case the CopyInst is deleted.
++II;

if (CopyInst && CopyInst->getSrc() == CopyInst->getDest()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to do unconditionally with an unrelated identity copy_addr [take] or copy_addr [init]? (Not sure). By unrelated I mean a copy_addr that was not optimized by the algorithm.

for (auto &BB : *getFunction()) {
auto II = BB.begin();
while (II != BB.end()) {
CopyAddrInst *CopyInst = dyn_cast<CopyAddrInst>(&*II);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLVM style guide basically says if the type is already on the line, use auto. So in this case, CopyAddrInst is the template parameter of dyn_cast. So putting CopyAddrInst on the same line is redundant. I.e. this gives the same info to the reader:

auto *CopyInst = dyn_cast<CopyAddrInst>(&*II);

Here is the link:

http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

This is a separate optimization that detects short-lived temporaries that can be eliminated.
This is necessary now that SILGen no longer performs basic RValue forwarding in some cases.

SR-5508: Performance regression in benchmarks caused by removing SILGen peephole for LoadExpr in +0 context
@eeckstein
Copy link
Contributor Author

@gottesmm @atrick @swiftix Thanks for the review. As CI is still down anyway, I updated the PR with a new version

@eeckstein
Copy link
Contributor Author

@swift-ci Please test

@eeckstein
Copy link
Contributor Author

@swift-ci Please smoke benchmark

@eeckstein
Copy link
Contributor Author

@swift-ci Please test

Copy link
Contributor

@gottesmm gottesmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
// For some reason, not all normal uses have been seen between the copy and
// the end of the initialization block. We should never reach here.
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use an assert or unreachable here so in debug builds, we catch this. I am fine with this in a follow-on commit.

@gottesmm
Copy link
Contributor

gottesmm commented Aug 6, 2017

@swift-ci Please test

1 similar comment
@gottesmm
Copy link
Contributor

gottesmm commented Aug 6, 2017

@swift-ci Please test

NumLoadsFound++;

// If this is the last use of the temp we are ok. After this point,
// modifications to the source don't matter anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the discussion on these comments. It isn't really that simple. The order that we check for last uses and memory writes is subtle and important. It needs to be ordered this way to handle copy_addr uses, which could possibly write-back copyDest into copySrc. This won't work though if we ever handle an apply that takes copyDest as a read-only argument but may write to copySrc.

@gottesmm
Copy link
Contributor

gottesmm commented Aug 6, 2017

@swift-ci test

1 similar comment
@gottesmm
Copy link
Contributor

gottesmm commented Aug 6, 2017

@swift-ci test

@gottesmm
Copy link
Contributor

gottesmm commented Aug 6, 2017

Hmmm... I guess CI isn't back yet.

@shahmishal
Copy link
Member

CI will be back later tonight.

@shahmishal
Copy link
Member

@swift-ci smoke test

@eeckstein eeckstein merged commit 6258841 into swiftlang:master Aug 7, 2017
@eeckstein
Copy link
Contributor Author

@swift-ci Please smoke benchmark

@shahmishal
Copy link
Member

PR testing is not supported after PR has been merged.

@eeckstein eeckstein deleted the copyprop branch April 17, 2021 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants