Skip to content

[ownership] Add simple support for concatenating together simple live ranges #30087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gottesmm
Copy link
Contributor

Specifically, I added support for handling cases where the first live range ends
in the same block as the second live range starts, but after the start of that
live range. As an example:

bb0:
   %0 = ...

bbN:
   %1 = copy_value %0
   ...
   destroy_value %0
   ...
   br bb*

bbM:
   destroy_value %1

==>

bb0:
   %0 = ...

bbN:
   br bb*

bbM:
   destroy_value %0

Replace this paragraph with a description of your changes and rationale. Provide links to external references/discussions if appropriate.

Resolves SR-NNNN.

@gottesmm gottesmm requested a review from atrick February 27, 2020 01:33
@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct...

SILValue operand = cvi->getOperand();
if (operand.getOwnershipKind() != ValueOwnershipKind::Owned) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an intentional style change toward using braces for single-line expression followed by single line statement, because I've seen it in the last few PRs. Or is it just a recurring accident?
(incidentally, I don't think the style needs to dictate this, just wondering if I should stop telling people how to do it in reviews)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to always have braces in this case because I think I made a mistake b/c of that once when refactoring. So I just do it to avoid having to think about the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just told @meg-gupta not to do this because I was trying to be a good reviewer, so I have to take it back now. Frankly, I think it's fine to use braces whenever you want to.

// DISCUSSION: The reason why we do not need to check if the destroy_value is
// after the copy_value in the block is that if the destroy_value of the
// copy_value's operand was before the copy_value, the copy_value would be a
// use-after-free of the operand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good discussion, but the implementation looks incorrect because it appears to be ignoring non-consuming uses of the copy's source after the copy's destroy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The ownership verifier caught it. I just tracked it down.

@gottesmm gottesmm force-pushed the pr-7d5f0d45e4677ebbd92aeb1e261ab388a4ca5756 branch from 3d13b7a to c43991b Compare February 27, 2020 21:31
@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
SortIntPyramid 360 455 +26.4% 0.79x
RemoveWhereSwapInts 33 38 +15.2% 0.87x (?)
ArrayPlusEqualFiveElementCollection 4366 4884 +11.9% 0.89x (?)
RemoveWhereMoveInts 17 19 +11.8% 0.89x (?)
NSStringConversion.Long 491 543 +10.6% 0.90x (?)
NormalizedIterator_fastPrenormal 570 630 +10.5% 0.90x
ArraySetElement 262 283 +8.0% 0.93x (?)
PrefixWhileSequence 164 177 +7.9% 0.93x (?)
NormalizedIterator_latin1 206 222 +7.8% 0.93x (?)
SuffixSequence 233 251 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
DistinctClassFieldAccesses 198 173 -12.6% 1.14x (?)
ArrayAppendLazyMap 3640 3300 -9.3% 1.10x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
ArraySetElement 285 326 +14.4% 0.87x (?)
RandomShuffleLCG2 416 464 +11.5% 0.90x (?)
Set.isSubset.Seq.Int.Empty 107 118 +10.3% 0.91x (?)
ArrayPlusEqualFiveElementCollection 4440 4884 +10.0% 0.91x (?)
DataCreateEmptyArray 1650 1800 +9.1% 0.92x (?)
MapReduceAnyCollection 220 238 +8.2% 0.92x (?)
Chars2 3100 3350 +8.1% 0.93x (?)
PrefixWhileSequence 178 192 +7.9% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
IterateData 917 833 -9.2% 1.10x (?)
ArrayAppendLazyMap 4340 3990 -8.1% 1.09x (?)
StringHasPrefixAscii 1220 1130 -7.4% 1.08x (?)
Set.isDisjoint.Int.Empty 55 51 -7.3% 1.08x (?)

Code size: -Osize

Performance: -Onone

Improvement OLD NEW DELTA RATIO
DataSubscriptMedium 61 56 -8.2% 1.09x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I was gonna suggest handling

apply(dest)
destroy(src)

But the apply could also have a guaranteed use of src

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

And also I forgot about fixing one other thing to make this work a little better.

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
RandomShuffleLCG2 704 768 +9.1% 0.92x (?)
FlattenListLoop 4086 4445 +8.8% 0.92x (?)
RemoveWhereSwapInts 59 64 +8.5% 0.92x (?)
Array2D 6944 7520 +8.3% 0.92x (?)
ArrayPlusEqualFiveElementCollection 7807 8436 +8.1% 0.93x (?)
MapReduceAnyCollection 368 397 +7.9% 0.93x (?)
MapReduce 370 399 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
LazilyFilteredArrayContains 35700 30300 -15.1% 1.18x
ObjectiveCBridgeStringHash 79 68 -13.9% 1.16x
ArrayAppendLazyMap 6980 6070 -13.0% 1.15x (?)
ObjectiveCBridgeStringGetASCIIContents 327 305 -6.7% 1.07x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
DataAccessBytesMedium 94 103 +9.6% 0.91x (?)
RemoveWhereMoveInts 34 37 +8.8% 0.92x (?)
ArrayPlusEqualFiveElementCollection 7770 8436 +8.6% 0.92x (?)
Array2D 6944 7520 +8.3% 0.92x (?)
RemoveWhereSwapInts 62 67 +8.1% 0.93x
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStringHash 78 68 -12.8% 1.15x
ArrayAppendLazyMap 7890 6990 -11.4% 1.13x (?)
String.data.Medium 93 84 -9.7% 1.11x (?)
String.data.LargeUnicode 96 88 -8.3% 1.09x (?)
UTF8Decode_InitFromBytes_ascii_as_ascii 503 466 -7.4% 1.08x (?)
ObjectiveCBridgeStringGetASCIIContents 327 305 -6.7% 1.07x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
StringMatch 45800 50400 +10.0% 0.91x (?)
EqualSubstringString 48 52 +8.3% 0.92x (?)
FindString.Rec3.Substring 850 918 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStringHash 79 68 -13.9% 1.16x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@gottesmm gottesmm force-pushed the pr-7d5f0d45e4677ebbd92aeb1e261ab388a4ca5756 branch from c43991b to fc348e2 Compare March 2, 2020 00:02
@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

@atrick I added two more cases here that I think will be useful:

  1. If the consuming user is a return inst. I have seen this in cases where a guaranteed value
  2. If the consuming user is not a return inst, but is in the return block and the destroy_value is not.

I have seen variations on that in various frameworks.

@swift-ci
Copy link
Contributor

swift-ci commented Mar 2, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
SortIntPyramid 365 450 +23.3% 0.81x
IterateData 796 902 +13.3% 0.88x (?)
StringHasPrefixAscii 1170 1260 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
EqualSubstringSubstring 28 22 -21.4% 1.27x
LessSubstringSubstring 28 22 -21.4% 1.27x
EqualSubstringSubstringGenericEquatable 28 22 -21.4% 1.27x (?)
EqualSubstringString 28 22 -21.4% 1.27x (?)
LessSubstringSubstringGenericComparable 28 22 -21.4% 1.27x
EqualStringSubstring 29 23 -20.7% 1.26x
NSStringConversion.Long 568 494 -13.0% 1.15x
Set.subtracting.Empty.Box 8 7 -12.5% 1.14x (?)
StringComparison_longSharedPrefix 349 314 -10.0% 1.11x (?)
NSStringConversion.Medium 264 242 -8.3% 1.09x (?)
StringBuilderWithLongSubstring 1100 1020 -7.3% 1.08x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
IterateData 850 935 +10.0% 0.91x (?)
DataAccessBytesMedium 54 59 +9.3% 0.92x (?)
SortLettersInPlace 360 391 +8.6% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
LessSubstringSubstring 28 22 -21.4% 1.27x
EqualSubstringString 28 22 -21.4% 1.27x
LessSubstringSubstringGenericComparable 28 22 -21.4% 1.27x
EqualSubstringSubstring 29 23 -20.7% 1.26x
EqualSubstringSubstringGenericEquatable 29 23 -20.7% 1.26x (?)
EqualStringSubstring 28 23 -17.9% 1.22x
NSStringConversion.Long 576 500 -13.2% 1.15x
FlattenListLoop 3165 2792 -11.8% 1.13x (?)
StringComparison_longSharedPrefix 349 314 -10.0% 1.11x (?)
ArrayPlusEqualSingleElementCollection 470 423 -10.0% 1.11x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
ArrayOfPOD 649 726 +11.9% 0.89x (?)
ObjectiveCBridgeStubFromNSStringRef 120 130 +8.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
EqualSubstringSubstringGenericEquatable 32 26 -18.7% 1.23x
LessSubstringSubstringGenericComparable 32 26 -18.7% 1.23x
LessSubstringSubstring 33 27 -18.2% 1.22x
EqualSubstringSubstring 32 27 -15.6% 1.19x (?)
EqualStringSubstring 33 28 -15.2% 1.18x (?)
EqualSubstringString 33 28 -15.2% 1.18x

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

@Catfish-Man some juice maybe for string comparison.

@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

... opaque values opt is failing. Optimizing it too much.

@swift-ci
Copy link
Contributor

swift-ci commented Mar 2, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
SortIntPyramid 360 450 +25.0% 0.80x
IterateData 796 920 +15.6% 0.87x
NSStringConversion.LongUTF8 307 344 +12.1% 0.89x (?)
 
Improvement OLD NEW DELTA RATIO
EqualSubstringSubstring 28 22 -21.4% 1.27x (?)
LessSubstringSubstring 28 22 -21.4% 1.27x
EqualSubstringSubstringGenericEquatable 28 22 -21.4% 1.27x (?)
EqualSubstringString 28 22 -21.4% 1.27x
LessSubstringSubstringGenericComparable 28 22 -21.4% 1.27x (?)
NSStringConversion.Long 568 494 -13.0% 1.15x (?)
ArrayPlusEqualSingleElementCollection 470 423 -10.0% 1.11x (?)
String.replaceSubrange.Substring.Small 48 44 -8.3% 1.09x (?)
NSStringConversion.Medium 264 243 -8.0% 1.09x (?)
DropFirstArrayLazy 14 13 -7.1% 1.08x (?)

Code size: -O

Performance: -Osize

Improvement OLD NEW DELTA RATIO
LessSubstringSubstring 28 22 -21.4% 1.27x
EqualSubstringString 28 22 -21.4% 1.27x (?)
LessSubstringSubstringGenericComparable 28 22 -21.4% 1.27x (?)
EqualSubstringSubstring 28 23 -17.9% 1.22x
EqualStringSubstring 28 23 -17.9% 1.22x (?)
EqualSubstringSubstringGenericEquatable 28 23 -17.9% 1.22x
NSStringConversion.Long 576 508 -11.8% 1.13x (?)
StringComparison_longSharedPrefix 349 313 -10.3% 1.12x (?)
NSStringConversion.Medium 266 243 -8.6% 1.09x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
DataCountMedium 50 55 +10.0% 0.91x (?)
UTF8Decode_InitFromBytes 131 141 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
EqualSubstringSubstringGenericEquatable 32 26 -18.7% 1.23x (?)
LessSubstringSubstringGenericComparable 32 26 -18.7% 1.23x
EqualSubstringSubstring 33 27 -18.2% 1.22x (?)
LessSubstringSubstring 33 27 -18.2% 1.22x
EqualStringSubstring 33 28 -15.2% 1.18x
EqualSubstringString 33 28 -15.2% 1.18x
String.replaceSubrange.Substring.Small 50 46 -8.0% 1.09x (?)
DataCreateEmpty 800 740 -7.5% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

Going to do a check that the few -O regressions are real or not. I have a feeling they are not, but I am going to check. (Pretty small).

@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

Iterate data is real, but it is due to an inlining change. Specifically, we inline sequence data in one case and not in the other.

@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

Out of curiosity, seeing if the improvements are real or not.

… ranges that do not need LiveRange analysis.

Specifically, this PR adds support for optimizing simple cases where we do not
need to compute LiveRanges with the idea of first doing simple transforms that
involve small numbers of instructions first. With that in mind, we only optimize
cases where our copy_value has a single consuming user and our owned value has a
single destroy_value. To understand the transform here, consider the following
SIL:

```
  %0 = ...
  %1 = copy_value %0                 (1)
  apply %guaranteedUser(%0)          (2)
  destroy_value %0                   (3)
  apply %cviConsumer(%1)             (4)
```

We want to eliminate (2) and (3), effectively joining the lifetimes of %0 and
%1, transforming the code to:

```
  %0 = ...
  apply %guaranteedUser(%0)          (2)
  apply %cviConsumer(%0)             (4)
```

Easily, we can always do this transform in this case since we know that %0's
lifetime ends strictly before the end of %1's due to (3) being before (4). This
means that any uses that require liveness of %0 must be before (4) and thus no
use-after-frees can result from removing (3) since we are not shrinking the
underlying object's lifetime. Lets consider a different case where (3) and (4)
are swapped.

```
  %0 = ...
  %1 = copy_value %0                 (1)
  apply %guaranteedUser(%0)          (2)
  apply %cviConsumer(%1)             (4)
  destroy_value %0                   (3)
```

In this case, since there aren't any liveness requiring uses of %0 in between
(4) and (3), we can still perform our transform. But what if there was a
liveness requiring user in between (4) and (3). To analyze this, lets swap (2)
and (4), yielding:

```
  %0 = ...
  %1 = copy_value %0                 (1)
  apply %cviConsumer(%1)             (4)
  apply %guaranteedUser(%0)          (2)
  destroy_value %0                   (3)
```

In this case, if we were to perform our transform, we would get a use-after-free
due do the transform shrinking the lifetime of the underlying object here from
ending at (3) to ending at (4):

```
  %0 = ...
  apply %cviConsumer(%1)             (4)
  apply %guaranteedUser(%0)          (2) // *kaboom*
```

So clearly, if (3) is after (4), clearly, we need to know that there aren't any
liveness requiring uses in between them to be able to perform this
optimization. But is this enough? Turns out no. There are two further issues
that we must consider:

1. If (4) is forwards owned ownership, it is not truly "consuming" the
underlying value in the sense of actually destroying the underlying value. This
can be worked with by using the LiveRange abstraction. That being said, this PR
is meant to contain simple transforms that do not need to use LiveRange. So, we
bail if we see a forwarding instruction.

2. At the current time, we may not be able to find all normal uses since all of
the instructions that are interior pointer constructs (e.x.: project_box) have
not been required yet to always be guarded by borrows (the eventual end
state). Thus we can not shrink lifetimes in general safely until that piece of
work is done.

Given all of those constraints, we only handle cases here where (3) is strictly
before (4) so we know 100% we are not shrinking any lifetimes. This effectively
is causing our correctness to rely on SILGen properly scoping lifetimes. Once
all interior pointers are properly guarded, we will be able to be more
aggressive here.

With that in mind, we perform this transform given the following conditions
noting that this pattern often times comes up around return values:

1. If the consuming user is a return inst. In such a case, we know that the
   destroy_value must be before the actual return inst.

2. If the consuming user is in the exit block and the destroy_value is not.

3. If the consuming user and destroy_value are in the same block and the
   consuming user is strictly later in that block than the destroy_value.

In all of these cases, we are able to optimize without the need for LiveRanges.
I am going to add support for this in a subsequent commit.
@gottesmm gottesmm force-pushed the pr-7d5f0d45e4677ebbd92aeb1e261ab388a4ca5756 branch from fc348e2 to bdf6143 Compare March 2, 2020 04:00
@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

@swift-ci smoke test

1 similar comment
@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

gottesmm commented Mar 2, 2020

@swift-ci smoke test

@gottesmm gottesmm merged commit 926f105 into swiftlang:master Mar 2, 2020
@gottesmm gottesmm deleted the pr-7d5f0d45e4677ebbd92aeb1e261ab388a4ca5756 branch March 2, 2020 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants