[ownership] Add simple support for concatenating together simple live ranges #30087

gottesmm · 2020-02-27T01:33:23Z

Specifically, I added support for handling cases where the first live range ends
in the same block as the second live range starts, but after the start of that
live range. As an example:

bb0:
   %0 = ...

bbN:
   %1 = copy_value %0
   ...
   destroy_value %0
   ...
   br bb*

bbM:
   destroy_value %1

==>

bb0:
   %0 = ...

bbN:
   br bb*

bbM:
   destroy_value %0

Replace this paragraph with a description of your changes and rationale. Provide links to external references/discussions if appropriate.

Resolves SR-NNNN.

gottesmm · 2020-02-27T01:33:31Z

@swift-ci benchmark

gottesmm · 2020-02-27T01:33:37Z

@swift-ci smoke test

atrick

I don't think this is correct...

atrick · 2020-02-27T06:36:49Z

lib/SILOptimizer/Transforms/SemanticARCOpts.cpp

+  SILValue operand = cvi->getOperand();
+  if (operand.getOwnershipKind() != ValueOwnershipKind::Owned) {
+    return false;
+  }


This looks like an intentional style change toward using braces for single-line expression followed by single line statement, because I've seen it in the last few PRs. Or is it just a recurring accident?
(incidentally, I don't think the style needs to dictate this, just wondering if I should stop telling people how to do it in reviews)

I prefer to always have braces in this case because I think I made a mistake b/c of that once when refactoring. So I just do it to avoid having to think about the problem.

I just told @meg-gupta not to do this because I was trying to be a good reviewer, so I have to take it back now. Frankly, I think it's fine to use braces whenever you want to.

atrick · 2020-02-27T06:42:12Z

lib/SILOptimizer/Transforms/SemanticARCOpts.cpp

+  // DISCUSSION: The reason why we do not need to check if the destroy_value is
+  // after the copy_value in the block is that if the destroy_value of the
+  // copy_value's operand was before the copy_value, the copy_value would be a
+  // use-after-free of the operand.


Good discussion, but the implementation looks incorrect because it appears to be ignoring non-consuming uses of the copy's source after the copy's destroy.

Yes. The ownership verifier caught it. I just tracked it down.

gottesmm · 2020-02-27T21:31:40Z

@swift-ci smoke test

gottesmm · 2020-02-27T21:31:44Z

@swift-ci benchmark

gottesmm · 2020-02-27T21:32:12Z

@swift-ci smoke test

gottesmm · 2020-02-27T21:32:18Z

@swift-ci benchmark

swift-ci · 2020-02-27T22:17:38Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
SortIntPyramid	360	455	+26.4%	0.79x
RemoveWhereSwapInts	33	38	+15.2%	0.87x (?)
ArrayPlusEqualFiveElementCollection	4366	4884	+11.9%	0.89x (?)
RemoveWhereMoveInts	17	19	+11.8%	0.89x (?)
NSStringConversion.Long	491	543	+10.6%	0.90x (?)
NormalizedIterator_fastPrenormal	570	630	+10.5%	0.90x
ArraySetElement	262	283	+8.0%	0.93x (?)
PrefixWhileSequence	164	177	+7.9%	0.93x (?)
NormalizedIterator_latin1	206	222	+7.8%	0.93x (?)
SuffixSequence	233	251	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
DistinctClassFieldAccesses	198	173	-12.6%	1.14x (?)
ArrayAppendLazyMap	3640	3300	-9.3%	1.10x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
ArraySetElement	285	326	+14.4%	0.87x (?)
RandomShuffleLCG2	416	464	+11.5%	0.90x (?)
Set.isSubset.Seq.Int.Empty	107	118	+10.3%	0.91x (?)
ArrayPlusEqualFiveElementCollection	4440	4884	+10.0%	0.91x (?)
DataCreateEmptyArray	1650	1800	+9.1%	0.92x (?)
MapReduceAnyCollection	220	238	+8.2%	0.92x (?)
Chars2	3100	3350	+8.1%	0.93x (?)
PrefixWhileSequence	178	192	+7.9%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
IterateData	917	833	-9.2%	1.10x (?)
ArrayAppendLazyMap	4340	3990	-8.1%	1.09x (?)
StringHasPrefixAscii	1220	1130	-7.4%	1.08x (?)
Set.isDisjoint.Int.Empty	55	51	-7.3%	1.08x (?)

Code size: -Osize

Performance: -Onone

Improvement	OLD	NEW	DELTA	RATIO
DataSubscriptMedium	61	56	-8.2%	1.09x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

atrick

LGTM

I was gonna suggest handling

apply(dest)
destroy(src)

But the apply could also have a guaranteed use of src

gottesmm · 2020-02-27T23:01:22Z

@swift-ci benchmark

gottesmm · 2020-02-27T23:01:54Z

And also I forgot about fixing one other thing to make this work a little better.

swift-ci · 2020-02-27T23:52:20Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
RandomShuffleLCG2	704	768	+9.1%	0.92x (?)
FlattenListLoop	4086	4445	+8.8%	0.92x (?)
RemoveWhereSwapInts	59	64	+8.5%	0.92x (?)
Array2D	6944	7520	+8.3%	0.92x (?)
ArrayPlusEqualFiveElementCollection	7807	8436	+8.1%	0.93x (?)
MapReduceAnyCollection	368	397	+7.9%	0.93x (?)
MapReduce	370	399	+7.8%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
LazilyFilteredArrayContains	35700	30300	-15.1%	1.18x
ObjectiveCBridgeStringHash	79	68	-13.9%	1.16x
ArrayAppendLazyMap	6980	6070	-13.0%	1.15x (?)
ObjectiveCBridgeStringGetASCIIContents	327	305	-6.7%	1.07x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
DataAccessBytesMedium	94	103	+9.6%	0.91x (?)
RemoveWhereMoveInts	34	37	+8.8%	0.92x (?)
ArrayPlusEqualFiveElementCollection	7770	8436	+8.6%	0.92x (?)
Array2D	6944	7520	+8.3%	0.92x (?)
RemoveWhereSwapInts	62	67	+8.1%	0.93x

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStringHash	78	68	-12.8%	1.15x
ArrayAppendLazyMap	7890	6990	-11.4%	1.13x (?)
String.data.Medium	93	84	-9.7%	1.11x (?)
String.data.LargeUnicode	96	88	-8.3%	1.09x (?)
UTF8Decode_InitFromBytes_ascii_as_ascii	503	466	-7.4%	1.08x (?)
ObjectiveCBridgeStringGetASCIIContents	327	305	-6.7%	1.07x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
StringMatch	45800	50400	+10.0%	0.91x (?)
EqualSubstringString	48	52	+8.3%	0.92x (?)
FindString.Rec3.Substring	850	918	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStringHash	79	68	-13.9%	1.16x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

gottesmm · 2020-03-02T00:02:20Z

@swift-ci smoke test

gottesmm · 2020-03-02T00:02:27Z

@swift-ci benchmark

gottesmm · 2020-03-02T00:04:12Z

@atrick I added two more cases here that I think will be useful:

If the consuming user is a return inst. I have seen this in cases where a guaranteed value
If the consuming user is not a return inst, but is in the return block and the destroy_value is not.

I have seen variations on that in various frameworks.

swift-ci · 2020-03-02T00:28:30Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
SortIntPyramid	365	450	+23.3%	0.81x
IterateData	796	902	+13.3%	0.88x (?)
StringHasPrefixAscii	1170	1260	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
EqualSubstringSubstring	28	22	-21.4%	1.27x
LessSubstringSubstring	28	22	-21.4%	1.27x
EqualSubstringSubstringGenericEquatable	28	22	-21.4%	1.27x (?)
EqualSubstringString	28	22	-21.4%	1.27x (?)
LessSubstringSubstringGenericComparable	28	22	-21.4%	1.27x
EqualStringSubstring	29	23	-20.7%	1.26x
NSStringConversion.Long	568	494	-13.0%	1.15x
Set.subtracting.Empty.Box	8	7	-12.5%	1.14x (?)
StringComparison_longSharedPrefix	349	314	-10.0%	1.11x (?)
NSStringConversion.Medium	264	242	-8.3%	1.09x (?)
StringBuilderWithLongSubstring	1100	1020	-7.3%	1.08x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
IterateData	850	935	+10.0%	0.91x (?)
DataAccessBytesMedium	54	59	+9.3%	0.92x (?)
SortLettersInPlace	360	391	+8.6%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
LessSubstringSubstring	28	22	-21.4%	1.27x
EqualSubstringString	28	22	-21.4%	1.27x
LessSubstringSubstringGenericComparable	28	22	-21.4%	1.27x
EqualSubstringSubstring	29	23	-20.7%	1.26x
EqualSubstringSubstringGenericEquatable	29	23	-20.7%	1.26x (?)
EqualStringSubstring	28	23	-17.9%	1.22x
NSStringConversion.Long	576	500	-13.2%	1.15x
FlattenListLoop	3165	2792	-11.8%	1.13x (?)
StringComparison_longSharedPrefix	349	314	-10.0%	1.11x (?)
ArrayPlusEqualSingleElementCollection	470	423	-10.0%	1.11x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
ArrayOfPOD	649	726	+11.9%	0.89x (?)
ObjectiveCBridgeStubFromNSStringRef	120	130	+8.3%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
EqualSubstringSubstringGenericEquatable	32	26	-18.7%	1.23x
LessSubstringSubstringGenericComparable	32	26	-18.7%	1.23x
LessSubstringSubstring	33	27	-18.2%	1.22x
EqualSubstringSubstring	32	27	-15.6%	1.19x (?)
EqualStringSubstring	33	28	-15.2%	1.18x (?)
EqualSubstringString	33	28	-15.2%	1.18x

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

gottesmm · 2020-03-02T00:47:20Z

@swift-ci benchmark

gottesmm · 2020-03-02T00:47:36Z

@Catfish-Man some juice maybe for string comparison.

gottesmm · 2020-03-02T00:55:34Z

... opaque values opt is failing. Optimizing it too much.

swift-ci · 2020-03-02T01:09:15Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
SortIntPyramid	360	450	+25.0%	0.80x
IterateData	796	920	+15.6%	0.87x
NSStringConversion.LongUTF8	307	344	+12.1%	0.89x (?)

Improvement	OLD	NEW	DELTA	RATIO
EqualSubstringSubstring	28	22	-21.4%	1.27x (?)
LessSubstringSubstring	28	22	-21.4%	1.27x
EqualSubstringSubstringGenericEquatable	28	22	-21.4%	1.27x (?)
EqualSubstringString	28	22	-21.4%	1.27x
LessSubstringSubstringGenericComparable	28	22	-21.4%	1.27x (?)
NSStringConversion.Long	568	494	-13.0%	1.15x (?)
ArrayPlusEqualSingleElementCollection	470	423	-10.0%	1.11x (?)
String.replaceSubrange.Substring.Small	48	44	-8.3%	1.09x (?)
NSStringConversion.Medium	264	243	-8.0%	1.09x (?)
DropFirstArrayLazy	14	13	-7.1%	1.08x (?)

Code size: -O

Performance: -Osize

Improvement	OLD	NEW	DELTA	RATIO
LessSubstringSubstring	28	22	-21.4%	1.27x
EqualSubstringString	28	22	-21.4%	1.27x (?)
LessSubstringSubstringGenericComparable	28	22	-21.4%	1.27x (?)
EqualSubstringSubstring	28	23	-17.9%	1.22x
EqualStringSubstring	28	23	-17.9%	1.22x (?)
EqualSubstringSubstringGenericEquatable	28	23	-17.9%	1.22x
NSStringConversion.Long	576	508	-11.8%	1.13x (?)
StringComparison_longSharedPrefix	349	313	-10.3%	1.12x (?)
NSStringConversion.Medium	266	243	-8.6%	1.09x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
DataCountMedium	50	55	+10.0%	0.91x (?)
UTF8Decode_InitFromBytes	131	141	+7.6%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
EqualSubstringSubstringGenericEquatable	32	26	-18.7%	1.23x (?)
LessSubstringSubstringGenericComparable	32	26	-18.7%	1.23x
EqualSubstringSubstring	33	27	-18.2%	1.22x (?)
LessSubstringSubstring	33	27	-18.2%	1.22x
EqualStringSubstring	33	28	-15.2%	1.18x
EqualSubstringString	33	28	-15.2%	1.18x
String.replaceSubrange.Substring.Small	50	46	-8.0%	1.09x (?)
DataCreateEmpty	800	740	-7.5%	1.08x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

gottesmm · 2020-03-02T01:17:07Z

Going to do a check that the few -O regressions are real or not. I have a feeling they are not, but I am going to check. (Pretty small).

gottesmm · 2020-03-02T02:51:31Z

Iterate data is real, but it is due to an inlining change. Specifically, we inline sequence data in one case and not in the other.

gottesmm · 2020-03-02T02:53:12Z

Out of curiosity, seeing if the improvements are real or not.

… ranges that do not need LiveRange analysis. Specifically, this PR adds support for optimizing simple cases where we do not need to compute LiveRanges with the idea of first doing simple transforms that involve small numbers of instructions first. With that in mind, we only optimize cases where our copy_value has a single consuming user and our owned value has a single destroy_value. To understand the transform here, consider the following SIL: ``` %0 = ... %1 = copy_value %0 (1) apply %guaranteedUser(%0) (2) destroy_value %0 (3) apply %cviConsumer(%1) (4) ``` We want to eliminate (2) and (3), effectively joining the lifetimes of %0 and %1, transforming the code to: ``` %0 = ... apply %guaranteedUser(%0) (2) apply %cviConsumer(%0) (4) ``` Easily, we can always do this transform in this case since we know that %0's lifetime ends strictly before the end of %1's due to (3) being before (4). This means that any uses that require liveness of %0 must be before (4) and thus no use-after-frees can result from removing (3) since we are not shrinking the underlying object's lifetime. Lets consider a different case where (3) and (4) are swapped. ``` %0 = ... %1 = copy_value %0 (1) apply %guaranteedUser(%0) (2) apply %cviConsumer(%1) (4) destroy_value %0 (3) ``` In this case, since there aren't any liveness requiring uses of %0 in between (4) and (3), we can still perform our transform. But what if there was a liveness requiring user in between (4) and (3). To analyze this, lets swap (2) and (4), yielding: ``` %0 = ... %1 = copy_value %0 (1) apply %cviConsumer(%1) (4) apply %guaranteedUser(%0) (2) destroy_value %0 (3) ``` In this case, if we were to perform our transform, we would get a use-after-free due do the transform shrinking the lifetime of the underlying object here from ending at (3) to ending at (4): ``` %0 = ... apply %cviConsumer(%1) (4) apply %guaranteedUser(%0) (2) // *kaboom* ``` So clearly, if (3) is after (4), clearly, we need to know that there aren't any liveness requiring uses in between them to be able to perform this optimization. But is this enough? Turns out no. There are two further issues that we must consider: 1. If (4) is forwards owned ownership, it is not truly "consuming" the underlying value in the sense of actually destroying the underlying value. This can be worked with by using the LiveRange abstraction. That being said, this PR is meant to contain simple transforms that do not need to use LiveRange. So, we bail if we see a forwarding instruction. 2. At the current time, we may not be able to find all normal uses since all of the instructions that are interior pointer constructs (e.x.: project_box) have not been required yet to always be guarded by borrows (the eventual end state). Thus we can not shrink lifetimes in general safely until that piece of work is done. Given all of those constraints, we only handle cases here where (3) is strictly before (4) so we know 100% we are not shrinking any lifetimes. This effectively is causing our correctness to rely on SILGen properly scoping lifetimes. Once all interior pointers are properly guarded, we will be able to be more aggressive here. With that in mind, we perform this transform given the following conditions noting that this pattern often times comes up around return values: 1. If the consuming user is a return inst. In such a case, we know that the destroy_value must be before the actual return inst. 2. If the consuming user is in the exit block and the destroy_value is not. 3. If the consuming user and destroy_value are in the same block and the consuming user is strictly later in that block than the destroy_value. In all of these cases, we are able to optimize without the need for LiveRanges. I am going to add support for this in a subsequent commit.

gottesmm · 2020-03-02T04:00:25Z

@swift-ci smoke test

gottesmm · 2020-03-02T04:00:33Z

@swift-ci smoke test

gottesmm · 2020-03-02T04:01:02Z

@swift-ci smoke test

gottesmm requested a review from atrick February 27, 2020 01:33

atrick requested changes Feb 27, 2020

View reviewed changes

gottesmm force-pushed the pr-7d5f0d45e4677ebbd92aeb1e261ab388a4ca5756 branch from 3d13b7a to c43991b Compare February 27, 2020 21:31

atrick approved these changes Feb 27, 2020

View reviewed changes

gottesmm force-pushed the pr-7d5f0d45e4677ebbd92aeb1e261ab388a4ca5756 branch from c43991b to fc348e2 Compare March 2, 2020 00:02

gottesmm force-pushed the pr-7d5f0d45e4677ebbd92aeb1e261ab388a4ca5756 branch from fc348e2 to bdf6143 Compare March 2, 2020 04:00

gottesmm merged commit 926f105 into swiftlang:master Mar 2, 2020

gottesmm deleted the pr-7d5f0d45e4677ebbd92aeb1e261ab388a4ca5756 branch March 2, 2020 08:28

[ownership] Add simple support for concatenating together simple live ranges #30087

[ownership] Add simple support for concatenating together simple live ranges #30087

Uh oh!

Conversation

gottesmm commented Feb 27, 2020

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

atrick Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

gottesmm Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

atrick Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

atrick Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

gottesmm Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

swift-ci commented Feb 27, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

swift-ci commented Feb 27, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

gottesmm commented Mar 2, 2020

Uh oh!

gottesmm commented Mar 2, 2020

Uh oh!

gottesmm commented Mar 2, 2020

Uh oh!

swift-ci commented Mar 2, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

gottesmm commented Mar 2, 2020

Uh oh!

gottesmm commented Mar 2, 2020

Uh oh!

gottesmm commented Mar 2, 2020

Uh oh!

swift-ci commented Mar 2, 2020