[semantic-arc-opts] Let the simple lifetime join optimization handle certain copies with forwarding insts. #34820

gottesmm · 2020-11-19T02:08:43Z

When I originally landed this optimization, I was trying to handle simple cases
without forwarding instructions in my head. In my head, handling this with
forwarding instructions meant using OwnershipLiveRange. Sadly, I didn't realize
at the time that if I didn't use OwnershipLiveRange and just treated forwarding
insts as normal lifetime ending instructions for owned values, everything just
worked (since our check was where %5 in the following was any lifetime ending
instruction for %4c.

%4 = copy_value %3 : $Builtin.NativeObject
%4c = copy_value %4 : $Builtin.NativeObject
destroy_value %4 : $Builtin.NativeObject
%5 = enum $FakeOptional<Builtin.NativeObject>, #FakeOptional.some!enumelt, %4c : $Builtin.NativeObject

The optimization causes the above to simplified to:

%4 = copy_value %3 : $Builtin.NativeObject
%5 = enum $FakeOptional<Builtin.NativeObject>, #FakeOptional.some!enumelt, %4 : $Builtin.NativeObject

Notice importantly that we are not shrinking the lifetime of %4, so we can do
this safely without interior pointers being guarded with borrow scopes
everywhere yet.

gottesmm · 2020-11-19T02:47:14Z

@swift-ci test

gottesmm · 2020-11-19T02:47:23Z

@swift-ci smoke benchmark

gottesmm · 2020-11-19T02:47:32Z

@swift-ci test source compatibility

swift-ci · 2020-11-19T04:01:21Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
StringFromLongWholeSubstring	4	5	+25.0%	0.80x
FlattenListFlatMap	6186	6973	+12.7%	0.89x (?)
LessSubstringSubstring	40	44	+10.0%	0.91x
EqualSubstringSubstring	40	44	+10.0%	0.91x (?)
EqualSubstringSubstringGenericEquatable	40	44	+10.0%	0.91x (?)
EqualSubstringString	40	44	+10.0%	0.91x (?)
LessSubstringSubstringGenericComparable	40	44	+10.0%	0.91x
EqualStringSubstring	41	45	+9.8%	0.91x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
StringFromLongWholeSubstring	4	5	+25.0%	0.80x
String.data.LargeUnicode	111	132	+18.9%	0.84x (?)
NSStringConversion.UTF8	925	1049	+13.4%	0.88x (?)
String.data.Medium	111	125	+12.6%	0.89x (?)
EqualSubstringSubstring	40	44	+10.0%	0.91x
EqualStringSubstring	40	44	+10.0%	0.91x (?)
EqualSubstringSubstringGenericEquatable	40	44	+10.0%	0.91x (?)
EqualSubstringString	40	44	+10.0%	0.91x
LessSubstringSubstring	41	45	+9.8%	0.91x (?)
LessSubstringSubstringGenericComparable	41	45	+9.8%	0.91x

Improvement	OLD	NEW	DELTA	RATIO
StringToDataLargeUnicode	3800	3450	-9.2%	1.10x (?)
Breadcrumbs.UTF16ToIdxRange.longMixed	160	149	-6.9%	1.07x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
Breadcrumbs.MutatedIdxToUTF16.ASCII	12	14	+16.7%	0.86x (?)
DataAppendArray	6200	6700	+8.1%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	3800	3300	-13.2%	1.15x (?)
StringToDataMedium	9800	8950	-8.7%	1.09x (?)
NSStringConversion.Rebridge.Mutable	2146	1967	-8.3%	1.09x (?)
String.data.Small	118	109	-7.6%	1.08x (?)
StringToDataLargeUnicode	10450	9700	-7.2%	1.08x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

gottesmm · 2020-11-19T04:06:30Z

@swift-ci benchmark

gottesmm · 2020-11-19T04:07:49Z

Thats some of the expected -Onone speedup since this optimization is really simple and also runs at -Onone.

swift-ci · 2020-11-19T05:12:29Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	4411	6892	+56.2%	0.64x (?)
FlattenListLoop	1629	2512	+54.2%	0.65x (?)
StringFromLongWholeSubstring	4	5	+25.0%	0.80x
UTF8Decode_InitFromBytes_ascii_as_ascii	476	572	+20.2%	0.83x (?)
String.data.Medium	124	140	+12.9%	0.89x (?)
LessSubstringSubstring	40	44	+10.0%	0.91x (?)
EqualSubstringSubstring	40	44	+10.0%	0.91x (?)
EqualSubstringSubstringGenericEquatable	40	44	+10.0%	0.91x (?)
EqualSubstringString	40	44	+10.0%	0.91x (?)
LessSubstringSubstringGenericComparable	40	44	+10.0%	0.91x
EqualStringSubstring	41	45	+9.8%	0.91x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
StringFromLongWholeSubstring	4	5	+25.0%	0.80x
String.data.LargeUnicode	117	137	+17.1%	0.85x (?)
EqualSubstringSubstring	40	44	+10.0%	0.91x (?)
EqualStringSubstring	40	44	+10.0%	0.91x (?)
EqualSubstringSubstringGenericEquatable	40	44	+10.0%	0.91x (?)
EqualSubstringString	40	44	+10.0%	0.91x (?)
LessSubstringSubstringGenericComparable	41	45	+9.8%	0.91x (?)
FlattenListFlatMap	6069	6606	+8.8%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
Breadcrumbs.UTF16ToIdxRange.longMixed	160	149	-6.9%	1.07x

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
Breadcrumbs.MutatedIdxToUTF16.ASCII	12	14	+16.7%	0.86x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringSmall	6650	6200	-6.8%	1.07x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

gottesmm · 2020-11-19T05:32:44Z

@swift-ci benchmark

swift-ci · 2020-11-19T06:13:59Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
StringFromLongWholeSubstring	4	5	+25.0%	0.80x
String.data.Medium	119	131	+10.1%	0.91x (?)
LessSubstringSubstring	40	44	+10.0%	0.91x
EqualSubstringSubstring	40	44	+10.0%	0.91x (?)
EqualSubstringSubstringGenericEquatable	40	44	+10.0%	0.91x (?)
EqualSubstringString	40	44	+10.0%	0.91x (?)
LessSubstringSubstringGenericComparable	40	44	+10.0%	0.91x
EqualStringSubstring	41	45	+9.8%	0.91x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
StringFromLongWholeSubstring	4	5	+25.0%	0.80x
String.data.Medium	108	122	+13.0%	0.89x (?)
String.data.LargeUnicode	120	134	+11.7%	0.90x (?)
EqualSubstringSubstring	40	44	+10.0%	0.91x (?)
EqualStringSubstring	40	44	+10.0%	0.91x (?)
EqualSubstringSubstringGenericEquatable	40	44	+10.0%	0.91x (?)
EqualSubstringString	40	44	+10.0%	0.91x (?)
LessSubstringSubstring	41	45	+9.8%	0.91x (?)
LessSubstringSubstringGenericComparable	41	45	+9.8%	0.91x (?)
FlattenListFlatMap	3861	4167	+7.9%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
Breadcrumbs.UTF16ToIdxRange.longMixed	160	149	-6.9%	1.07x (?)

Code size: -Osize

Performance: -Onone

Improvement	OLD	NEW	DELTA	RATIO
DataReset	5100	4400	-13.7%	1.16x (?)
StringBuilderLong	3250	2830	-12.9%	1.15x (?)
DictionaryBridgeToObjC_Access	1199	1065	-11.2%	1.13x (?)
DataReplaceSmall	5400	4800	-11.1%	1.12x (?)
StringBuilder	3823	3399	-11.1%	1.12x (?)
DataMutateBytesMedium	4900	4420	-9.8%	1.11x (?)
StringBuilderSmallReservingCapacity	3739	3414	-8.7%	1.10x (?)
NSStringConversion.Rebridge.LongUTF8	127	116	-8.7%	1.09x (?)
ArrayValueProp2	7537	6889	-8.6%	1.09x (?)
String.data.Empty	109	101	-7.3%	1.08x (?)
DictionaryLiteral	9360	8680	-7.3%	1.08x (?)
String.data.Small	112	104	-7.1%	1.08x (?)
StringUTF16Builder	3720	3460	-7.0%	1.08x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

gottesmm · 2020-11-19T06:18:58Z

@swift-ci benchmark

swift-ci · 2020-11-19T07:26:59Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
StringFromLongWholeSubstring	4	5	+25.0%	0.80x
FlattenListFlatMap	3987	4493	+12.7%	0.89x (?)
LessSubstringSubstring	40	44	+10.0%	0.91x
EqualSubstringSubstring	40	44	+10.0%	0.91x (?)
EqualSubstringSubstringGenericEquatable	40	44	+10.0%	0.91x (?)
EqualSubstringString	40	44	+10.0%	0.91x (?)
LessSubstringSubstringGenericComparable	40	44	+10.0%	0.91x
EqualStringSubstring	41	45	+9.8%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromData_ascii_as_ascii	714	623	-12.7%	1.15x (?)
ObjectiveCBridgeStubFromArrayOfNSString2	2770	2520	-9.0%	1.10x (?)
NSStringConversion.UTF8	1016	927	-8.8%	1.10x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	4236	5420	+28.0%	0.78x (?)
StringFromLongWholeSubstring	4	5	+25.0%	0.80x
EqualSubstringSubstring	40	44	+10.0%	0.91x
EqualStringSubstring	40	44	+10.0%	0.91x (?)
EqualSubstringSubstringGenericEquatable	40	44	+10.0%	0.91x (?)
EqualSubstringString	40	44	+10.0%	0.91x
LessSubstringSubstring	41	45	+9.8%	0.91x (?)
LessSubstringSubstringGenericComparable	41	45	+9.8%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
StringToDataLargeUnicode	3750	3400	-9.3%	1.10x (?)
Breadcrumbs.UTF16ToIdxRange.longMixed	160	149	-6.9%	1.07x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
Breadcrumbs.MutatedIdxToUTF16.ASCII	12	14	+16.7%	0.86x (?)

Improvement	OLD	NEW	DELTA	RATIO
StringBuilderSmallReservingCapacity	3830	3388	-11.5%	1.13x (?)
DataToStringSmall	6800	6100	-10.3%	1.11x (?)
StringToDataMedium	10200	9400	-7.8%	1.09x (?)
DictionaryBridgeToObjC_Access	1165	1083	-7.0%	1.08x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

meg-gupta

lgtm

…certain copies with forwarding insts. When I originally landed this optimization, I was trying to handle simple cases without forwarding instructions in my head. In my head, handling this with forwarding instructions meant using OwnershipLiveRange. Sadly, I didn't realize at the time that if I didn't use OwnershipLiveRange and just treated forwarding insts as normal lifetime ending instructions for owned values, everything just worked (since our check was where %5 in the following was any lifetime ending instruction for %4c. ``` %4 = copy_value %3 : $Builtin.NativeObject %4c = copy_value %4 : $Builtin.NativeObject destroy_value %4 : $Builtin.NativeObject %5 = enum $FakeOptional<Builtin.NativeObject>, #FakeOptional.some!enumelt, %4c : $Builtin.NativeObject ``` The optimization causes the above to simplified to: ``` %4 = copy_value %3 : $Builtin.NativeObject %5 = enum $FakeOptional<Builtin.NativeObject>, #FakeOptional.some!enumelt, %4 : $Builtin.NativeObject ``` Notice importantly that we are not shrinking the lifetime of %4, so we can do this safely without interior pointers being guarded with borrow scopes everywhere yet.

gottesmm · 2020-11-19T21:27:11Z

@swift-ci smoke test

gottesmm · 2020-11-19T22:55:10Z

@swift-ci smoke test OS X platform

gottesmm · 2020-11-19T23:07:33Z

@swift-ci smoke test OS X platform

gottesmm · 2020-11-20T01:59:25Z

@swift-ci smoke test linux platform

atrick · 2020-11-22T17:45:18Z

Just like, SemanticARCOptVisitor::tryPerformOwnedCopyValueOptimization, canSafelyJoinSimpleRange is extra code that can be deleted just by calling CopyPropagation as a utility on any owned live ranges that need cleanup

gottesmm requested review from atrick and meg-gupta November 19, 2020 02:08

meg-gupta reviewed Nov 19, 2020

View reviewed changes

gottesmm force-pushed the simple-lifetime-join branch from c0abd81 to 5a2d63d Compare November 19, 2020 21:27

gottesmm merged commit 546039f into swiftlang:main Nov 20, 2020

gottesmm deleted the simple-lifetime-join branch November 20, 2020 06:05

atrick mentioned this pull request Nov 22, 2020

[semantic-arc] Optimize more lifetime joining when a copy_value, destroy_value are in the same block. #34844

Merged

[semantic-arc-opts] Let the simple lifetime join optimization handle certain copies with forwarding insts. #34820

[semantic-arc-opts] Let the simple lifetime join optimization handle certain copies with forwarding insts. #34820

Uh oh!

Conversation

gottesmm commented Nov 19, 2020

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

swift-ci commented Nov 19, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

swift-ci commented Nov 19, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

swift-ci commented Nov 19, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

swift-ci commented Nov 19, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

meg-gupta left a comment

Choose a reason for hiding this comment

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

gottesmm commented Nov 19, 2020

Uh oh!

gottesmm commented Nov 20, 2020

Uh oh!

atrick commented Nov 22, 2020

Uh oh!

Uh oh!