Skip to content

[semantic-arc] Optimize more lifetime joining when a copy_value, destroy_value are in the same block. #34844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gottesmm
Copy link
Contributor

Specifically the optimization that is being performed here is the elimination of
lifetimes that hand off ownership in between two regions of code. Example:

%1 = copy_value %0
...
destroy_value %0
...
apply %consumingUser(%1)

We really want to handle this case since it is a natural thing that comes up in
programs. That being said when I originally implemented this optimization, I
started with the assumption that the copy_value, destroy_value had to be in
different blocks and for simplification that:

  1. The operand of the copy_value had only one destroy: the destroy_value that we
    are trying to eliminate.

  2. The copy_value only had one consuming user, the consuminer user that we are
    tracking.

This isn't much to work with unless one wants to do a heavier weight analysis.
So since this was meant to be a cheap optimization, I realized quickly that
beyond checking if the consuming user is a terminator/is in a terminator block,
I couldn't really do much more than optimize based off of the assumption that
the copy_value, destroy_value, and consumingUser were in the same block.

With that in mind, I am trying to eliminate now the emit*AndFold operations from
type lowering. It is surprising that APIs like type lowering/SILBuilder can
delete instructions. It is even worse and can introduce memory unsafety in the
compiler if the caller code needs to destroy instructions itself after updating
state. Thus I had a need to teach the slimmed down semantic-arc-opts that we run
at -Onone how to do that really simple optimization.

After some thought, I realized that all of the "emit*AndFold" operations are
only looking earlier in the block for the obvious pattern and that pattern is
the same thing I was doing in the previous implementation once I had gotten to
the point where I could only do anything interesting with the copy_value,
destroy_value in the same block.

So I split the previous "is this optimization safe" into two variants:

  1. A variant that does the terminator check that I mentioned previously and only
    does so if the copy_value has a single consuming use and the destroy_value is
    its operand's only consuming use.

  2. A variant that allows for our copy_value operand to have multiple
    destroy_value, but we require that the destroy_value and the copy_value we
    want to eliminate are in the same block and that the copy_value has a single
    consuming use and it is in the same block as the copy_value, destroy_value.

Then I changed the optimization to first check if a destroy_value is the only
consuming use of the copy's operand. In that case, we perform (1) and optimize
if it is safe.

If we didn't succeed while performing that optimization, then we visit all
destroy_value uses of the operand of the copy_value. Then if any of those
destroy_value are in the same block as the copy_value we want to optimize then:

a. If the copy_value had multiple consuming insts, then we know that those
destroy_value can not be in the same block as copy_value since we would have
a double lifetime ending use. This means that we can immediately eliminate
the copy_value, destory_value without further work.

b. If the copy_value had only a single consuming inst and the consuming inst is
not in the same block as the copy_value then we know it is after the
destroy_value, so we can optimize without further work.

c. Otherwise, we know that our copy_value has a single consuming inst and that
it is in the same block as the copy_value, destroy_value. In that case, we
perform the safety check above from (2).

Thus we are able to keep doing the same optimizations as before but also handle
in a much stronger way simple hand-off copies.

@gottesmm gottesmm requested review from atrick and meg-gupta November 20, 2020 09:15
@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

This is going to let me remove the infamous *AndFold functions from TypeLowering.

@gottesmm gottesmm force-pushed the pr-b83c2f04638cb94494351a5eb9221c6a343adc57 branch from 7e47c9f to 2599a49 Compare November 21, 2020 22:45
@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

@swift-ci test source compatibility

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
FlattenListLoop 2277 2504 +10.0% 0.91x (?)
Data.append.Sequence.809B.Count.RE 103 111 +7.8% 0.93x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
UTF8Decode_InitFromData_ascii_as_ascii 642 761 +18.5% 0.84x (?)
DataAccessBytesMedium 94 102 +8.5% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
PrefixAnySequenceLazy 1867 1630 -12.7% 1.15x (?)
Data.append.Sequence.809B.Count.I 103 96 -6.8% 1.07x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
NSStringConversion.LongUTF8 3726 6270 +68.3% 0.59x
NSStringConversion.Long 13720 21607 +57.5% 0.63x
NSStringConversion.Medium 5207 8132 +56.2% 0.64x
NSStringConversion.MutableCopy.LongUTF8 2319 3525 +52.0% 0.66x
NSStringConversion.UTF8 4732 7180 +51.7% 0.66x (?)
NSStringConversion.MutableCopy.Long 4212 6030 +43.2% 0.70x
NSStringConversion.MutableCopy.Medium 3525 4888 +38.7% 0.72x
NSStringConversion.MutableCopy.UTF8 2079 2556 +22.9% 0.81x (?)
UTF8Decode_InitFromBytes_ascii_as_ascii 455 520 +14.3% 0.88x (?)
StringToDataEmpty 3900 4400 +12.8% 0.89x (?)
StringToDataLargeUnicode 9400 10200 +8.5% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
Integrate 4373 4038 -7.7% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 2599a49ec97539689aaafc345255f351dbe2f2a7

@gottesmm
Copy link
Contributor Author

@swift-ci clean test linux platform

@atrick
Copy link
Contributor

atrick commented Nov 22, 2020

Similar to
#34820
and
#34812

It seems completely redundant with CopyPropagation

@gottesmm
Copy link
Contributor Author

Took a look at the source compat failure. Found something interesting. The error has to do with the following situation:

  %8 = apply %7(%0) : $@convention(method) (@guaranteed TableAlias) -> @owned TableAlias
  %15 = begin_borrow %8
  // ...
  // different block
  %111 = copy_value %8 : $TableAlias              // user: %112
  %112 = enum $Optional<TableAlias>, #Optional.some!enumelt, %111 : $TableAlias // user: %123
  end_borrow %15
  destroy_value %8 : $TableAlias
  br bb15(%112) // Consumer of %112

The current patch would produce this:

  %8 = apply %7(%0) : $@convention(method) (@guaranteed TableAlias) -> @owned TableAlias
  %15 = begin_borrow %8
  // ...
  // different block
  %112 = enum $Optional<TableAlias>, #Optional.some!enumelt, %8 : $TableAlias // user: %123
  end_borrow %15
  br bb15(%112) // Consumer of %112

This creates a problem since at %123, the lifetime of %8 is over, but %15 is still depending on it. So what I need to do is if the consuming use (in this case br bb15) is in the same block, walk the block and replace all of the references with a new borrow. I imagine at -Onone we don't want to do more than just walk the block.

…pecific opts and use that to add new split up semantic-arc tests.

This split will ensure we are testing only what a specific optimization is doing
rather than all together.

NOTE: The way I split up the tests is I first split up each of the tests by
subject area by hand that I thought were specifically testing one pass. Then any
tests where the FileCheck tests started to fail due to us not running the other
passes, I put back a copy in the original semantic-arc-opts.sil to ensure we do
not regress.

Note, the tests that I copied for each of these passes
are originally from semantic-arc-opts.sil. I left them there as well so we could
see all of the opts together. I also only copied the ones that were testing pass
specific functionality.
…se}s(Operand *)

This originally trafficked in Instructions and for some reason the name was
never changed.

I also changed the result type to be a bool and added the ability for the passed
in closure to signal failure (and iteration stop) by returning false. This also
makes it possible to use visitLocalEndScopeUses in if statements which can be
useful.
…roy_value are in the same block.

Specifically the optimization that is being performed here is the elimination of
lifetimes that hand off ownership in between two regions of code. Example:

```
%1 = copy_value %0
...
destroy_value %0
...
apply %consumingUser(%1)
```

We really want to handle this case since it is a natural thing that comes up in
programs and will let me eliminate the *evil* usage of emitDestroyValueOperation
in TypeLowering without needing to modify huge amounts of tests.
@gottesmm gottesmm force-pushed the pr-b83c2f04638cb94494351a5eb9221c6a343adc57 branch from 2599a49 to 85965ed Compare November 26, 2020 06:38
@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci test source compatibility

@gottesmm
Copy link
Contributor Author

All of the failures in the source combat suite are due to float16 issues, not this PR (unlike before).

@gottesmm
Copy link
Contributor Author

@atrick I am just committing some small peepholes so I can eliminate the and fold stuff. I don't want to invest the time in copy prop right now.

@gottesmm
Copy link
Contributor Author

Re testing.

@gottesmm
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 85965ed

@gottesmm
Copy link
Contributor Author

@swift-ci smoke test OS X platform

@gottesmm gottesmm merged commit 3ef6fba into swiftlang:main Nov 30, 2020
@gottesmm gottesmm deleted the pr-b83c2f04638cb94494351a5eb9221c6a343adc57 branch November 30, 2020 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants