[semantic-arc-opts] Change to always use the worklist. #27430

gottesmm · 2019-09-29T20:43:02Z

Previously we were:

Doing a linear scan, performing certain optimizations, and setting up a
worklist for future processing.
Draining the worklist of changed instructions until we reached a fix point.

The key thing here is that for (1) to be safe, we would need to not perform any
optimizations on block arguments since there was an ssumption that we would only
perform SSA-like optimizations.

I am going to be adding such optimizations soon, so it makes sense to just
convert the initial traversal to non-destructively seed the worklist and
eliminate that initial optimization pass.

This should be NFC.

gottesmm · 2019-09-29T20:43:38Z

@swift-ci smoke test

gottesmm · 2019-09-29T21:47:37Z

@swift-ci smoke test

gottesmm · 2019-09-29T21:47:39Z

@swift-ci smoke test

gottesmm · 2019-09-29T21:47:42Z

@swift-ci smoke test

gottesmm · 2019-09-29T21:47:46Z

@swift-ci smoke test

gottesmm · 2019-09-29T21:47:51Z

@swift-ci smoke test

gottesmm · 2019-10-01T06:16:33Z

I am fixing the outliner bug. The problem is that I am eliminating an extra release breaking the pattern matching.

My solution is to just teach the outliner that if it doesn't have that missing release that it should make an outlined function with a guaranteed parameter.

gottesmm · 2019-10-01T06:51:57Z

@swift-ci test

gottesmm · 2019-10-01T06:52:02Z

@swift-ci test

gottesmm · 2019-10-01T06:52:05Z

@swift-ci test

gottesmm · 2019-10-01T06:52:12Z

@swift-ci benchmark

gottesmm · 2019-10-01T06:52:16Z

@swift-ci benchmark

gottesmm · 2019-10-01T06:52:24Z

@swift-ci benchmark

swift-ci · 2019-10-01T06:54:10Z

Build failed
Swift Test OS X Platform
Git Sha - 9251a6231572cd697a054bbb1926f0d2c78f6add

swift-ci · 2019-10-01T07:13:42Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
SuffixArrayLazy	4	5	+25.0%	0.80x (?)
UTF8Decode_InitDecoding	104	122	+17.3%	0.85x

Improvement	OLD	NEW	DELTA	RATIO
NSStringConversion.Rebridge.LongUTF8	41	34	-17.1%	1.21x
NSStringConversion.Rebridge.Medium	138	117	-15.2%	1.18x
NSStringConversion.Rebridge.UTF8	343	291	-15.2%	1.18x
NSStringConversion.Rebridge.Long	139	118	-15.1%	1.18x
DistinctClassFieldAccesses	196	177	-9.7%	1.11x (?)
DataCreateEmptyArray	1550	1400	-9.7%	1.11x (?)
MapReduce	217	196	-9.7%	1.11x (?)
ObjectiveCBridgeStubFromNSDate	2890	2650	-8.3%	1.09x (?)
Set.isDisjoint.Seq.Box.Empty	66	61	-7.6%	1.08x (?)
ObjectiveCBridgeStubFromNSStringRef	111	103	-7.2%	1.08x (?)
StringHasSuffixAscii	1450	1350	-6.9%	1.07x (?)
NSStringConversion.Long	545	508	-6.8%	1.07x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitDecoding	102	122	+19.6%	0.84x

Improvement	OLD	NEW	DELTA	RATIO
NSStringConversion.Rebridge.LongUTF8	41	34	-17.1%	1.21x
NSStringConversion.Rebridge.Medium	138	117	-15.2%	1.18x (?)
NSStringConversion.Rebridge.UTF8	342	291	-14.9%	1.18x
NSStringConversion.Rebridge.Long	138	118	-14.5%	1.17x
ObjectiveCBridgeStubDateAccess	152	130	-14.5%	1.17x
Set.isDisjoint.Seq.Box.Empty	93	80	-14.0%	1.16x (?)
Set.isSubset.Seq.Int.Empty	122	106	-13.1%	1.15x (?)
Set.subtracting.Empty.Box	8	7	-12.5%	1.14x
Set.isStrictSubset.Seq.Int.Empty	118	106	-10.2%	1.11x (?)
Set.isStrictSubset.Int.Empty	50	45	-10.0%	1.11x (?)
StringHasSuffixAscii	1390	1260	-9.4%	1.10x
RandomShuffleLCG2	400	368	-8.0%	1.09x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
ArrayOfGenericPOD2	713	833	+16.8%	0.86x (?)
UTF8Decode_InitDecoding	122	142	+16.4%	0.86x (?)

Improvement	OLD	NEW	DELTA	RATIO
NopDeinit	126600	114000	-10.0%	1.11x (?)
ArrayInClass	2505	2275	-9.2%	1.10x
ExclusivityGlobal	113	104	-8.0%	1.09x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

Catfish-Man · 2019-10-01T07:16:33Z

20% improvement to NSString -> Swift String -> NSString conversion round trips :) nice!

gottesmm · 2019-10-01T07:39:58Z

Was just thinking about the outliner change here. Fundamentally without the release that we were peepholing before, outlining like this is not safe. That being said, I have a solution: outline with a coroutine. This will let me keep the original initial conversion in the same spot and not have to worry about lifetime extending inappropriately.

gottesmm · 2019-10-01T17:51:05Z

I talked with Arnold about the outliner thing. I am going to change the bridge argument code to handle this case with additional safety.

gottesmm · 2019-10-01T18:34:56Z

I am going to split out the outliner change into another PR.

gottesmm · 2019-10-01T20:59:37Z

Actually, I am going to put this down for a little bit.

gottesmm · 2019-11-21T19:39:57Z

Redoing the testing here.

gottesmm · 2019-11-21T19:40:01Z

@swift-ci test

gottesmm · 2019-11-21T19:40:07Z

@swift-ci benchmark

gottesmm · 2019-11-21T19:41:34Z

mmm... I forgot about that outliner issue.

swift-ci · 2019-11-21T19:43:38Z

Build failed
Swift Test OS X Platform
Git Sha - 90e9e0168f449a84d77ef5316c7a0a92816d319b

swift-ci · 2019-11-21T20:07:21Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDateAccess	130	152	+16.9%	0.86x
NSStringConversion.Long	495	578	+16.8%	0.86x
NSStringConversion.Medium	239	261	+9.2%	0.92x (?)
DictionaryGroup	146	157	+7.5%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ArrayAppendGenericStructs	1460	1080	-26.0%	1.35x (?)
EqualSubstringSubstring	27	21	-22.2%	1.29x
LessSubstringSubstring	27	21	-22.2%	1.29x (?)
EqualSubstringSubstringGenericEquatable	27	21	-22.2%	1.29x
EqualSubstringString	27	21	-22.2%	1.29x
LessSubstringSubstringGenericComparable	27	21	-22.2%	1.29x
EqualStringSubstring	28	22	-21.4%	1.27x
Dictionary4	190	157	-17.4%	1.21x
NSStringConversion.Rebridge.UTF8	343	292	-14.9%	1.17x
NSStringConversion.Rebridge.LongUTF8	41	35	-14.6%	1.17x
NSStringConversion.Rebridge.Long	138	118	-14.5%	1.17x
NSStringConversion.Rebridge.Medium	136	118	-13.2%	1.15x (?)
StringComparison_longSharedPrefix	355	321	-9.6%	1.11x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
ArrayAppendOptionals	590	1440	+144.1%	0.41x (?)
NSStringConversion.Long	503	567	+12.7%	0.89x
Calculator	141	154	+9.2%	0.92x (?)
NSStringConversion.Medium	239	259	+8.4%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
LessSubstringSubstring	27	21	-22.2%	1.29x
EqualSubstringString	27	21	-22.2%	1.29x (?)
LessSubstringSubstringGenericComparable	27	21	-22.2%	1.29x
EqualSubstringSubstring	28	22	-21.4%	1.27x (?)
EqualStringSubstring	28	22	-21.4%	1.27x (?)
EqualSubstringSubstringGenericEquatable	28	22	-21.4%	1.27x (?)
NSStringConversion.Rebridge.LongUTF8	41	34	-17.1%	1.21x
SortIntPyramid	530	450	-15.1%	1.18x
NSStringConversion.Rebridge.UTF8	343	293	-14.6%	1.17x
NSStringConversion.Rebridge.Medium	138	118	-14.5%	1.17x (?)
NSStringConversion.Rebridge.Long	138	118	-14.5%	1.17x (?)
ObjectiveCBridgeStubNSDateRefAccess	196	174	-11.2%	1.13x (?)
StringComparison_longSharedPrefix	356	321	-9.8%	1.11x (?)
UTF8Decode_InitDecoding	114	103	-9.6%	1.11x (?)

Code size: -Osize

Performance: -Onone

Improvement	OLD	NEW	DELTA	RATIO
EqualSubstringSubstringGenericEquatable	31	25	-19.4%	1.24x
LessSubstringSubstringGenericComparable	31	25	-19.4%	1.24x
EqualSubstringSubstring	32	26	-18.7%	1.23x (?)
LessSubstringSubstring	32	26	-18.7%	1.23x (?)
EqualStringSubstring	32	27	-15.6%	1.19x (?)
EqualSubstringString	32	27	-15.6%	1.19x (?)
UTF8Decode_InitDecoding	137	125	-8.8%	1.10x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

swift-ci · 2019-11-21T20:21:47Z

Build failed
Swift Test Linux Platform
Git Sha - 90e9e0168f449a84d77ef5316c7a0a92816d319b

gottesmm · 2019-12-06T01:52:11Z

I took a bit of a closer look at this. The code pattern that we are looking at is:

%15 = function_ref @$SSS10FoundationE19_bridgeToObjectiveCSo8NSStringCyF
%16 = apply %15(%14) : $@convention(method) (@guaranteed String) -> @owned NSString
%17 = enum $Optional<NSString>, #Optional.some!enumelt.1, %16 : $NSString
release_value %14 : $String
...
apply %objcMethod(%17, ...)  : $@convention(objc_method) (Optional<NSString> ...) ->
release_value %17 : $Optional<NSString>

From what I am reading in the code, we do not find the release_value of %14 due to better optimization. In such a case since we are going to use %14 in a guaranteed way anyways, we can just pass %14 as a guaranteed parameter without worry.

gottesmm · 2019-12-06T01:52:55Z

@swift-ci test

gottesmm · 2019-12-06T01:52:58Z

@swift-ci test

gottesmm · 2019-12-06T01:53:03Z

@swift-ci test

gottesmm · 2019-12-06T01:53:22Z

@swift-ci test source compatibility

gottesmm · 2019-12-06T01:53:29Z

@swift-ci test source compatibility

gottesmm · 2019-12-06T01:53:34Z

@swift-ci benchmark

gottesmm · 2019-12-06T01:53:37Z

@swift-ci benchmark

swift-ci · 2019-12-06T01:55:09Z

Build failed
Swift Test Linux Platform
Git Sha - 90e9e0168f449a84d77ef5316c7a0a92816d319b

swift-ci · 2019-12-06T01:57:21Z

Build failed
Swift Test OS X Platform
Git Sha - 90e9e0168f449a84d77ef5316c7a0a92816d319b

swift-ci · 2019-12-06T02:44:51Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	4094	5299	+29.4%	0.77x (?)
LazilyFilteredArrayContains	29500	34900	+18.3%	0.85x
Data.hash.Empty	71	77	+8.5%	0.92x (?)
RemoveWhereSwapInts	60	65	+8.3%	0.92x (?)
MapReduceClass2	37	40	+8.1%	0.93x
MapReduce	371	400	+7.8%	0.93x (?)
MapReduceAnyCollection	369	397	+7.6%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
StaticArray	3	2	-33.3%	1.50x (?)
NSStringConversion.Rebridge.LongUTF8	54	46	-14.8%	1.17x (?)
NSStringConversion.Rebridge.UTF8	453	391	-13.7%	1.16x (?)
NSStringConversion.Rebridge.Medium	184	159	-13.6%	1.16x (?)
NSStringConversion.Rebridge.Long	184	159	-13.6%	1.16x (?)
CharacterLiteralsLarge	108	97	-10.2%	1.11x
FloatingPointPrinting_Float_description_small	5832	5400	-7.4%	1.08x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	4086	4456	+9.1%	0.92x (?)
RemoveWhereMoveInts	34	37	+8.8%	0.92x (?)
Array2D	6928	7520	+8.5%	0.92x
Data.hash.Empty	71	77	+8.5%	0.92x (?)
RandomShuffleLCG2	768	832	+8.3%	0.92x
Set.isDisjoint.Int.Empty	99	107	+8.1%	0.93x (?)
RemoveWhereSwapInts	62	67	+8.1%	0.93x (?)
Set.isSubset.Int.Empty	88	95	+8.0%	0.93x

Improvement	OLD	NEW	DELTA	RATIO
NSStringConversion.Rebridge.Long	188	159	-15.4%	1.18x (?)
NSStringConversion.Rebridge.Medium	187	159	-15.0%	1.18x (?)
NSStringConversion.Rebridge.LongUTF8	55	47	-14.5%	1.17x (?)
NSStringConversion.Rebridge.UTF8	460	394	-14.3%	1.17x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeFromNSArrayAnyObjectToString	44800	48600	+8.5%	0.92x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

…hey can only match one function uniquely.

…do not find a release.

Previously we were: 1. Doing a linear scan, performing certain optimizations, and setting up a worklist for future processing. 2. Draining the worklist of changed instructions until we reached a fix point. The key thing here is that for (1) to be safe, we would need to not perform any optimizations on block arguments since there was an ssumption that we would only perform SSA-like optimizations. I am going to be adding such optimizations soon, so it makes sense to just convert the initial traversal to non-destructively seed the worklist and eliminate that initial optimization pass. This should be NFC.

gottesmm · 2019-12-06T19:54:44Z

@swift-ci test

gottesmm · 2019-12-06T19:54:46Z

@swift-ci test

swift-ci · 2019-12-06T19:57:01Z

Build failed
Swift Test Linux Platform
Git Sha - 934806d7913dcbaf1fa4c5f15f479a6e9b16abc2

swift-ci · 2019-12-06T19:59:10Z

Build failed
Swift Test OS X Platform
Git Sha - 934806d7913dcbaf1fa4c5f15f479a6e9b16abc2

gottesmm · 2019-12-06T23:26:23Z

The only regression I could reproduce is LazilyFilteredArrayContains. The rest are noise.

gottesmm · 2019-12-06T23:28:21Z

In terms of improvements, I am seeing that all are real improvements except for static array.

gottesmm · 2019-12-06T23:47:29Z

I think the difference in that lazy benchmark is a weird cache effect. The reason why I am pretty confidant of this is that:

In the benchmark everything except for retain/release calls have been inlined into the benchmark. So we know that we are not calling any stdlib code directly.
When I look at the assembly, the only differences are:

--- before.s	2019-12-06 15:44:03.000000000 -0800
+++ after.s	2019-12-06 15:44:20.000000000 -0800
@@ -113,14 +113,14 @@
 +0x1b2	    jmp                 "run_LazilyFilteredArrayContains(_:)+0x23"
 +0x1b7	ud2
 +0x1b9	ud2
-+0x1bb	leaq                1372206(%rip), %rdi
++0x1bb	leaq                1372214(%rip), %rdi
 +0x1c2	callq               "__swift_instantiateConcreteTypeFromMangledName"
 +0x1c7	movl                $64, %esi
 +0x1cc	movl                $7, %edx
 +0x1d1	movq                %rax, %rdi
 +0x1d4	callq               "DYLD-STUB$$swift_allocObject"
 +0x1d9	movq                %rax, %r15
-+0x1dc	movaps              1118573(%rip), %xmm0
++0x1dc	movaps              1118589(%rip), %xmm0
 +0x1e3	movups              %xmm0, 16(%rax)
 +0x1e7	movq                $0, -64(%rbp)
 +0x1ef	movabsq             $-2305843009213693952, %rax
@@ -128,7 +128,7 @@
 +0x1fd	leaq                -64(%rbp), %r13
 +0x201	movl                $29, %edi
 +0x206	callq               "DYLD-STUB$$_StringGuts.grow(_:)"
-+0x20b	leaq                1163982(%rip), %rax
++0x20b	leaq                1163998(%rip), %rax
 +0x212	movabsq             $-9223372036854775808, %rdx
 +0x21c	orq                 %rax, %rdx
 +0x21f	movq                1318170(%rip), %r13
@@ -139,7 +139,7 @@
 +0x23e	movq                %r13, %rcx
 +0x241	movq                %r12, %r8
 +0x244	callq               "DYLD-STUB$$String.write<A>(to:)"
-+0x249	leaq                1208176(%rip), %rdi
++0x249	leaq                1208192(%rip), %rdi
 +0x250	movl                $35, %esi
 +0x255	movl                $2, %edx
 +0x25a	callq               "DYLD-STUB$$StaticString.description.getter"
@@ -157,7 +157,7 @@
 +0x28d	movq                %r13, %rcx
 +0x290	movq                %r12, %r8
 +0x293	callq               "DYLD-STUB$$String.write<A>(to:)"
-+0x298	leaq                1207841(%rip), %rdi
++0x298	leaq                1207857(%rip), %rdi
 +0x29f	movl                $81, %esi
 +0x2a4	movl                $2, %edx
 +0x2a9	callq               "DYLD-STUB$$StaticString.description.getter"

I.e. some constant is offset by 16. I wonder if we need to do something to fix the alignment of globals. Not sure.

gottesmm requested a review from jckarter September 29, 2019 20:43

gottesmm force-pushed the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch from 9251a62 to b2b663e Compare September 29, 2019 21:47

gottesmm force-pushed the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch 4 times, most recently from 91deeb7 to 90e9e01 Compare October 1, 2019 06:51

gottesmm force-pushed the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch from 90e9e01 to 934806d Compare December 6, 2019 01:52

gottesmm added 3 commits December 6, 2019 11:27

[outliner] Add end sil function patterns to FileCheck tests so that t…

343b400

…hey can only match one function uniquely.

[outliner] Add support for passing bridging args as guaranteed if we …

d6e30b5

…do not find a release.

gottesmm force-pushed the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch from 934806d to 67ff2ae Compare December 6, 2019 19:51

gottesmm merged commit 84e23cc into swiftlang:master Dec 6, 2019

gottesmm deleted the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch December 6, 2019 23:49

[semantic-arc-opts] Change to always use the worklist. #27430

[semantic-arc-opts] Change to always use the worklist. #27430

Uh oh!

Conversation

gottesmm commented Sep 29, 2019

Uh oh!

gottesmm commented Sep 29, 2019

Uh oh!

gottesmm commented Sep 29, 2019

Uh oh!

gottesmm commented Sep 29, 2019

Uh oh!

gottesmm commented Sep 29, 2019

Uh oh!

gottesmm commented Sep 29, 2019

Uh oh!

gottesmm commented Sep 29, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

swift-ci commented Oct 1, 2019

Uh oh!

swift-ci commented Oct 1, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

Catfish-Man commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Oct 1, 2019

Uh oh!

gottesmm commented Nov 21, 2019

Uh oh!

gottesmm commented Nov 21, 2019

Uh oh!

gottesmm commented Nov 21, 2019

Uh oh!

gottesmm commented Nov 21, 2019

Uh oh!

swift-ci commented Nov 21, 2019

Uh oh!

swift-ci commented Nov 21, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

swift-ci commented Nov 21, 2019

Uh oh!

gottesmm commented Dec 6, 2019

Uh oh!

gottesmm commented Dec 6, 2019

Uh oh!

gottesmm commented Dec 6, 2019

Uh oh!

gottesmm commented Dec 6, 2019

Uh oh!