Skip to content

[semantic-arc-opts] Let the simple lifetime join optimization handle certain copies with forwarding insts. #34820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 20, 2020

Conversation

gottesmm
Copy link
Contributor

When I originally landed this optimization, I was trying to handle simple cases
without forwarding instructions in my head. In my head, handling this with
forwarding instructions meant using OwnershipLiveRange. Sadly, I didn't realize
at the time that if I didn't use OwnershipLiveRange and just treated forwarding
insts as normal lifetime ending instructions for owned values, everything just
worked (since our check was where %5 in the following was any lifetime ending
instruction for %4c.

%4 = copy_value %3 : $Builtin.NativeObject
%4c = copy_value %4 : $Builtin.NativeObject
destroy_value %4 : $Builtin.NativeObject
%5 = enum $FakeOptional<Builtin.NativeObject>, #FakeOptional.some!enumelt, %4c : $Builtin.NativeObject

The optimization causes the above to simplified to:

%4 = copy_value %3 : $Builtin.NativeObject
%5 = enum $FakeOptional<Builtin.NativeObject>, #FakeOptional.some!enumelt, %4 : $Builtin.NativeObject

Notice importantly that we are not shrinking the lifetime of %4, so we can do
this safely without interior pointers being guarded with borrow scopes
everywhere yet.

@gottesmm gottesmm requested review from atrick and meg-gupta November 19, 2020 02:08
@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci smoke benchmark

@gottesmm
Copy link
Contributor Author

@swift-ci test source compatibility

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
FlattenListFlatMap 6186 6973 +12.7% 0.89x (?)
LessSubstringSubstring 40 44 +10.0% 0.91x
EqualSubstringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x (?)
EqualSubstringString 40 44 +10.0% 0.91x (?)
LessSubstringSubstringGenericComparable 40 44 +10.0% 0.91x
EqualStringSubstring 41 45 +9.8% 0.91x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
String.data.LargeUnicode 111 132 +18.9% 0.84x (?)
NSStringConversion.UTF8 925 1049 +13.4% 0.88x (?)
String.data.Medium 111 125 +12.6% 0.89x (?)
EqualSubstringSubstring 40 44 +10.0% 0.91x
EqualStringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x (?)
EqualSubstringString 40 44 +10.0% 0.91x
LessSubstringSubstring 41 45 +9.8% 0.91x (?)
LessSubstringSubstringGenericComparable 41 45 +9.8% 0.91x
 
Improvement OLD NEW DELTA RATIO
StringToDataLargeUnicode 3800 3450 -9.2% 1.10x (?)
Breadcrumbs.UTF16ToIdxRange.longMixed 160 149 -6.9% 1.07x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
Breadcrumbs.MutatedIdxToUTF16.ASCII 12 14 +16.7% 0.86x (?)
DataAppendArray 6200 6700 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
DataToStringEmpty 3800 3300 -13.2% 1.15x (?)
StringToDataMedium 9800 8950 -8.7% 1.09x (?)
NSStringConversion.Rebridge.Mutable 2146 1967 -8.3% 1.09x (?)
String.data.Small 118 109 -7.6% 1.08x (?)
StringToDataLargeUnicode 10450 9700 -7.2% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

Thats some of the expected -Onone speedup since this optimization is really simple and also runs at -Onone.

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 4411 6892 +56.2% 0.64x (?)
FlattenListLoop 1629 2512 +54.2% 0.65x (?)
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
UTF8Decode_InitFromBytes_ascii_as_ascii 476 572 +20.2% 0.83x (?)
String.data.Medium 124 140 +12.9% 0.89x (?)
LessSubstringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x (?)
EqualSubstringString 40 44 +10.0% 0.91x (?)
LessSubstringSubstringGenericComparable 40 44 +10.0% 0.91x
EqualStringSubstring 41 45 +9.8% 0.91x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
String.data.LargeUnicode 117 137 +17.1% 0.85x (?)
EqualSubstringSubstring 40 44 +10.0% 0.91x (?)
EqualStringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x (?)
EqualSubstringString 40 44 +10.0% 0.91x (?)
LessSubstringSubstringGenericComparable 41 45 +9.8% 0.91x (?)
FlattenListFlatMap 6069 6606 +8.8% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
Breadcrumbs.UTF16ToIdxRange.longMixed 160 149 -6.9% 1.07x

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
Breadcrumbs.MutatedIdxToUTF16.ASCII 12 14 +16.7% 0.86x (?)
 
Improvement OLD NEW DELTA RATIO
DataToStringSmall 6650 6200 -6.8% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
String.data.Medium 119 131 +10.1% 0.91x (?)
LessSubstringSubstring 40 44 +10.0% 0.91x
EqualSubstringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x (?)
EqualSubstringString 40 44 +10.0% 0.91x (?)
LessSubstringSubstringGenericComparable 40 44 +10.0% 0.91x
EqualStringSubstring 41 45 +9.8% 0.91x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
String.data.Medium 108 122 +13.0% 0.89x (?)
String.data.LargeUnicode 120 134 +11.7% 0.90x (?)
EqualSubstringSubstring 40 44 +10.0% 0.91x (?)
EqualStringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x (?)
EqualSubstringString 40 44 +10.0% 0.91x (?)
LessSubstringSubstring 41 45 +9.8% 0.91x (?)
LessSubstringSubstringGenericComparable 41 45 +9.8% 0.91x (?)
FlattenListFlatMap 3861 4167 +7.9% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
Breadcrumbs.UTF16ToIdxRange.longMixed 160 149 -6.9% 1.07x (?)

Code size: -Osize

Performance: -Onone

Improvement OLD NEW DELTA RATIO
DataReset 5100 4400 -13.7% 1.16x (?)
StringBuilderLong 3250 2830 -12.9% 1.15x (?)
DictionaryBridgeToObjC_Access 1199 1065 -11.2% 1.13x (?)
DataReplaceSmall 5400 4800 -11.1% 1.12x (?)
StringBuilder 3823 3399 -11.1% 1.12x (?)
DataMutateBytesMedium 4900 4420 -9.8% 1.11x (?)
StringBuilderSmallReservingCapacity 3739 3414 -8.7% 1.10x (?)
NSStringConversion.Rebridge.LongUTF8 127 116 -8.7% 1.09x (?)
ArrayValueProp2 7537 6889 -8.6% 1.09x (?)
String.data.Empty 109 101 -7.3% 1.08x (?)
DictionaryLiteral 9360 8680 -7.3% 1.08x (?)
String.data.Small 112 104 -7.1% 1.08x (?)
StringUTF16Builder 3720 3460 -7.0% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
FlattenListFlatMap 3987 4493 +12.7% 0.89x (?)
LessSubstringSubstring 40 44 +10.0% 0.91x
EqualSubstringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x (?)
EqualSubstringString 40 44 +10.0% 0.91x (?)
LessSubstringSubstringGenericComparable 40 44 +10.0% 0.91x
EqualStringSubstring 41 45 +9.8% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
UTF8Decode_InitFromData_ascii_as_ascii 714 623 -12.7% 1.15x (?)
ObjectiveCBridgeStubFromArrayOfNSString2 2770 2520 -9.0% 1.10x (?)
NSStringConversion.UTF8 1016 927 -8.8% 1.10x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 4236 5420 +28.0% 0.78x (?)
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
EqualSubstringSubstring 40 44 +10.0% 0.91x
EqualStringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x (?)
EqualSubstringString 40 44 +10.0% 0.91x
LessSubstringSubstring 41 45 +9.8% 0.91x (?)
LessSubstringSubstringGenericComparable 41 45 +9.8% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
StringToDataLargeUnicode 3750 3400 -9.3% 1.10x (?)
Breadcrumbs.UTF16ToIdxRange.longMixed 160 149 -6.9% 1.07x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
Breadcrumbs.MutatedIdxToUTF16.ASCII 12 14 +16.7% 0.86x (?)
 
Improvement OLD NEW DELTA RATIO
StringBuilderSmallReservingCapacity 3830 3388 -11.5% 1.13x (?)
DataToStringSmall 6800 6100 -10.3% 1.11x (?)
StringToDataMedium 10200 9400 -7.8% 1.09x (?)
DictionaryBridgeToObjC_Access 1165 1083 -7.0% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

Copy link
Contributor

@meg-gupta meg-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

…certain copies with forwarding insts.

When I originally landed this optimization, I was trying to handle simple cases
without forwarding instructions in my head. In my head, handling this with
forwarding instructions meant using OwnershipLiveRange. Sadly, I didn't realize
at the time that if I didn't use OwnershipLiveRange and just treated forwarding
insts as normal lifetime ending instructions for owned values, everything just
worked (since our check was where %5 in the following was any lifetime ending
instruction for %4c.

```
%4 = copy_value %3 : $Builtin.NativeObject
%4c = copy_value %4 : $Builtin.NativeObject
destroy_value %4 : $Builtin.NativeObject
%5 = enum $FakeOptional<Builtin.NativeObject>, #FakeOptional.some!enumelt, %4c : $Builtin.NativeObject
```

The optimization causes the above to simplified to:

```
%4 = copy_value %3 : $Builtin.NativeObject
%5 = enum $FakeOptional<Builtin.NativeObject>, #FakeOptional.some!enumelt, %4 : $Builtin.NativeObject
```

Notice importantly that we are not shrinking the lifetime of %4, so we can do
this safely without interior pointers being guarded with borrow scopes
everywhere yet.
@gottesmm gottesmm force-pushed the simple-lifetime-join branch from c0abd81 to 5a2d63d Compare November 19, 2020 21:27
@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

@swift-ci smoke test OS X platform

1 similar comment
@gottesmm
Copy link
Contributor Author

@swift-ci smoke test OS X platform

@gottesmm
Copy link
Contributor Author

@swift-ci smoke test linux platform

@gottesmm gottesmm merged commit 546039f into swiftlang:main Nov 20, 2020
@gottesmm gottesmm deleted the simple-lifetime-join branch November 20, 2020 06:05
@atrick
Copy link
Contributor

atrick commented Nov 22, 2020

Just like, SemanticARCOptVisitor::tryPerformOwnedCopyValueOptimization, canSafelyJoinSimpleRange is extra code that can be deleted just by calling CopyPropagation as a utility on any owned live ranges that need cleanup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants