[semantic-arc] Optimize more lifetime joining when a copy_value, destroy_value are in the same block. #34844

gottesmm · 2020-11-20T09:15:06Z

Specifically the optimization that is being performed here is the elimination of
lifetimes that hand off ownership in between two regions of code. Example:

%1 = copy_value %0
...
destroy_value %0
...
apply %consumingUser(%1)

We really want to handle this case since it is a natural thing that comes up in
programs. That being said when I originally implemented this optimization, I
started with the assumption that the copy_value, destroy_value had to be in
different blocks and for simplification that:

The operand of the copy_value had only one destroy: the destroy_value that we
are trying to eliminate.
The copy_value only had one consuming user, the consuminer user that we are
tracking.

This isn't much to work with unless one wants to do a heavier weight analysis.
So since this was meant to be a cheap optimization, I realized quickly that
beyond checking if the consuming user is a terminator/is in a terminator block,
I couldn't really do much more than optimize based off of the assumption that
the copy_value, destroy_value, and consumingUser were in the same block.

With that in mind, I am trying to eliminate now the emit*AndFold operations from
type lowering. It is surprising that APIs like type lowering/SILBuilder can
delete instructions. It is even worse and can introduce memory unsafety in the
compiler if the caller code needs to destroy instructions itself after updating
state. Thus I had a need to teach the slimmed down semantic-arc-opts that we run
at -Onone how to do that really simple optimization.

After some thought, I realized that all of the "emit*AndFold" operations are
only looking earlier in the block for the obvious pattern and that pattern is
the same thing I was doing in the previous implementation once I had gotten to
the point where I could only do anything interesting with the copy_value,
destroy_value in the same block.

So I split the previous "is this optimization safe" into two variants:

A variant that does the terminator check that I mentioned previously and only
does so if the copy_value has a single consuming use and the destroy_value is
its operand's only consuming use.
A variant that allows for our copy_value operand to have multiple
destroy_value, but we require that the destroy_value and the copy_value we
want to eliminate are in the same block and that the copy_value has a single
consuming use and it is in the same block as the copy_value, destroy_value.

Then I changed the optimization to first check if a destroy_value is the only
consuming use of the copy's operand. In that case, we perform (1) and optimize
if it is safe.

If we didn't succeed while performing that optimization, then we visit all
destroy_value uses of the operand of the copy_value. Then if any of those
destroy_value are in the same block as the copy_value we want to optimize then:

a. If the copy_value had multiple consuming insts, then we know that those
destroy_value can not be in the same block as copy_value since we would have
a double lifetime ending use. This means that we can immediately eliminate
the copy_value, destory_value without further work.

b. If the copy_value had only a single consuming inst and the consuming inst is
not in the same block as the copy_value then we know it is after the
destroy_value, so we can optimize without further work.

c. Otherwise, we know that our copy_value has a single consuming inst and that
it is in the same block as the copy_value, destroy_value. In that case, we
perform the safety check above from (2).

Thus we are able to keep doing the same optimizations as before but also handle
in a much stronger way simple hand-off copies.

gottesmm · 2020-11-20T09:15:29Z

@swift-ci smoke test

gottesmm · 2020-11-21T21:31:12Z

This is going to let me remove the infamous *AndFold functions from TypeLowering.

gottesmm · 2020-11-21T22:45:21Z

@swift-ci test

gottesmm · 2020-11-21T22:58:44Z

@swift-ci benchmark

gottesmm · 2020-11-21T22:59:03Z

@swift-ci test source compatibility

swift-ci · 2020-11-21T23:35:49Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	2277	2504	+10.0%	0.91x (?)
Data.append.Sequence.809B.Count.RE	103	111	+7.8%	0.93x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromData_ascii_as_ascii	642	761	+18.5%	0.84x (?)
DataAccessBytesMedium	94	102	+8.5%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
PrefixAnySequenceLazy	1867	1630	-12.7%	1.15x (?)
Data.append.Sequence.809B.Count.I	103	96	-6.8%	1.07x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
NSStringConversion.LongUTF8	3726	6270	+68.3%	0.59x
NSStringConversion.Long	13720	21607	+57.5%	0.63x
NSStringConversion.Medium	5207	8132	+56.2%	0.64x
NSStringConversion.MutableCopy.LongUTF8	2319	3525	+52.0%	0.66x
NSStringConversion.UTF8	4732	7180	+51.7%	0.66x (?)
NSStringConversion.MutableCopy.Long	4212	6030	+43.2%	0.70x
NSStringConversion.MutableCopy.Medium	3525	4888	+38.7%	0.72x
NSStringConversion.MutableCopy.UTF8	2079	2556	+22.9%	0.81x (?)
UTF8Decode_InitFromBytes_ascii_as_ascii	455	520	+14.3%	0.88x (?)
StringToDataEmpty	3900	4400	+12.8%	0.89x (?)
StringToDataLargeUnicode	9400	10200	+8.5%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
Integrate	4373	4038	-7.7%	1.08x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

swift-ci · 2020-11-22T00:49:47Z

Build failed
Swift Test Linux Platform
Git Sha - 2599a49ec97539689aaafc345255f351dbe2f2a7

gottesmm · 2020-11-22T04:31:03Z

@swift-ci clean test linux platform

atrick · 2020-11-22T17:56:57Z

Similar to
#34820
and
#34812

It seems completely redundant with CopyPropagation

gottesmm · 2020-11-23T02:10:20Z

Took a look at the source compat failure. Found something interesting. The error has to do with the following situation:

  %8 = apply %7(%0) : $@convention(method) (@guaranteed TableAlias) -> @owned TableAlias
  %15 = begin_borrow %8
  // ...
  // different block
  %111 = copy_value %8 : $TableAlias              // user: %112
  %112 = enum $Optional<TableAlias>, #Optional.some!enumelt, %111 : $TableAlias // user: %123
  end_borrow %15
  destroy_value %8 : $TableAlias
  br bb15(%112) // Consumer of %112

The current patch would produce this:

  %8 = apply %7(%0) : $@convention(method) (@guaranteed TableAlias) -> @owned TableAlias
  %15 = begin_borrow %8
  // ...
  // different block
  %112 = enum $Optional<TableAlias>, #Optional.some!enumelt, %8 : $TableAlias // user: %123
  end_borrow %15
  br bb15(%112) // Consumer of %112

This creates a problem since at %123, the lifetime of %8 is over, but %15 is still depending on it. So what I need to do is if the consuming use (in this case br bb15) is in the same block, walk the block and replace all of the references with a new borrow. I imagine at -Onone we don't want to do more than just walk the block.

…pecific opts and use that to add new split up semantic-arc tests. This split will ensure we are testing only what a specific optimization is doing rather than all together. NOTE: The way I split up the tests is I first split up each of the tests by subject area by hand that I thought were specifically testing one pass. Then any tests where the FileCheck tests started to fail due to us not running the other passes, I put back a copy in the original semantic-arc-opts.sil to ensure we do not regress. Note, the tests that I copied for each of these passes are originally from semantic-arc-opts.sil. I left them there as well so we could see all of the opts together. I also only copied the ones that were testing pass specific functionality.

…se}s(Operand *) This originally trafficked in Instructions and for some reason the name was never changed. I also changed the result type to be a bool and added the ability for the passed in closure to signal failure (and iteration stop) by returning false. This also makes it possible to use visitLocalEndScopeUses in if statements which can be useful.

…roy_value are in the same block. Specifically the optimization that is being performed here is the elimination of lifetimes that hand off ownership in between two regions of code. Example: ``` %1 = copy_value %0 ... destroy_value %0 ... apply %consumingUser(%1) ``` We really want to handle this case since it is a natural thing that comes up in programs and will let me eliminate the *evil* usage of emitDestroyValueOperation in TypeLowering without needing to modify huge amounts of tests.

gottesmm · 2020-11-26T06:38:21Z

@swift-ci test

gottesmm · 2020-11-26T06:38:31Z

@swift-ci test source compatibility

gottesmm · 2020-11-26T19:35:15Z

All of the failures in the source combat suite are due to float16 issues, not this PR (unlike before).

gottesmm · 2020-11-26T20:08:28Z

@atrick I am just committing some small peepholes so I can eliminate the and fold stuff. I don't want to invest the time in copy prop right now.

gottesmm · 2020-11-30T06:54:06Z

Re testing.

gottesmm · 2020-11-30T06:54:10Z

@swift-ci test

swift-ci · 2020-11-30T08:53:05Z

Build failed
Swift Test OS X Platform
Git Sha - 85965ed

gottesmm · 2020-11-30T09:26:06Z

@swift-ci smoke test OS X platform

gottesmm requested review from atrick and meg-gupta November 20, 2020 09:15

gottesmm force-pushed the pr-b83c2f04638cb94494351a5eb9221c6a343adc57 branch from 7e47c9f to 2599a49 Compare November 21, 2020 22:45

gottesmm mentioned this pull request Nov 21, 2020

[ome] Invoke simplifyInstruction after lowering ownership and use replaceAllSimplifiedUsesAndErase instead of a manual RAUW. #34813

Merged

gottesmm added 3 commits November 25, 2020 19:32

gottesmm force-pushed the pr-b83c2f04638cb94494351a5eb9221c6a343adc57 branch from 2599a49 to 85965ed Compare November 26, 2020 06:38

gottesmm merged commit 3ef6fba into swiftlang:main Nov 30, 2020

gottesmm deleted the pr-b83c2f04638cb94494351a5eb9221c6a343adc57 branch November 30, 2020 12:49

[semantic-arc] Optimize more lifetime joining when a copy_value, destroy_value are in the same block. #34844

[semantic-arc] Optimize more lifetime joining when a copy_value, destroy_value are in the same block. #34844

Uh oh!

Conversation

gottesmm commented Nov 20, 2020

Uh oh!

gottesmm commented Nov 20, 2020

Uh oh!

gottesmm commented Nov 21, 2020

Uh oh!

gottesmm commented Nov 21, 2020

Uh oh!

gottesmm commented Nov 21, 2020

Uh oh!

gottesmm commented Nov 21, 2020

Uh oh!

swift-ci commented Nov 21, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

swift-ci commented Nov 22, 2020

Uh oh!

gottesmm commented Nov 22, 2020

Uh oh!

atrick commented Nov 22, 2020

Uh oh!

gottesmm commented Nov 23, 2020

Uh oh!

gottesmm commented Nov 26, 2020

Uh oh!

gottesmm commented Nov 26, 2020

Uh oh!

gottesmm commented Nov 26, 2020

Uh oh!

gottesmm commented Nov 26, 2020

Uh oh!

gottesmm commented Nov 30, 2020

Uh oh!

gottesmm commented Nov 30, 2020

Uh oh!

swift-ci commented Nov 30, 2020

Uh oh!

gottesmm commented Nov 30, 2020

Uh oh!

Uh oh!