Skip to content

[semantic-arc-opts] Change to always use the worklist. #27430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gottesmm
Copy link
Contributor

Previously we were:

  1. Doing a linear scan, performing certain optimizations, and setting up a
    worklist for future processing.

  2. Draining the worklist of changed instructions until we reached a fix point.

The key thing here is that for (1) to be safe, we would need to not perform any
optimizations on block arguments since there was an ssumption that we would only
perform SSA-like optimizations.

I am going to be adding such optimizations soon, so it makes sense to just
convert the initial traversal to non-destructively seed the worklist and
eliminate that initial optimization pass.

This should be NFC.

@gottesmm gottesmm requested a review from jckarter September 29, 2019 20:43
@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm gottesmm force-pushed the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch from 9251a62 to b2b663e Compare September 29, 2019 21:47
@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

4 similar comments
@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

I am fixing the outliner bug. The problem is that I am eliminating an extra release breaking the pattern matching.

My solution is to just teach the outliner that if it doesn't have that missing release that it should make an outlined function with a guaranteed parameter.

@gottesmm gottesmm force-pushed the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch 4 times, most recently from 91deeb7 to 90e9e01 Compare October 1, 2019 06:51
@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

@swift-ci test

2 similar comments
@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

@swift-ci test

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

@swift-ci test

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

@swift-ci benchmark

2 similar comments
@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

@swift-ci benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Oct 1, 2019

Build failed
Swift Test OS X Platform
Git Sha - 9251a6231572cd697a054bbb1926f0d2c78f6add

@swift-ci
Copy link
Contributor

swift-ci commented Oct 1, 2019

Performance: -O

Regression OLD NEW DELTA RATIO
SuffixArrayLazy 4 5 +25.0% 0.80x (?)
UTF8Decode_InitDecoding 104 122 +17.3% 0.85x
 
Improvement OLD NEW DELTA RATIO
NSStringConversion.Rebridge.LongUTF8 41 34 -17.1% 1.21x
NSStringConversion.Rebridge.Medium 138 117 -15.2% 1.18x
NSStringConversion.Rebridge.UTF8 343 291 -15.2% 1.18x
NSStringConversion.Rebridge.Long 139 118 -15.1% 1.18x
DistinctClassFieldAccesses 196 177 -9.7% 1.11x (?)
DataCreateEmptyArray 1550 1400 -9.7% 1.11x (?)
MapReduce 217 196 -9.7% 1.11x (?)
ObjectiveCBridgeStubFromNSDate 2890 2650 -8.3% 1.09x (?)
Set.isDisjoint.Seq.Box.Empty 66 61 -7.6% 1.08x (?)
ObjectiveCBridgeStubFromNSStringRef 111 103 -7.2% 1.08x (?)
StringHasSuffixAscii 1450 1350 -6.9% 1.07x (?)
NSStringConversion.Long 545 508 -6.8% 1.07x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 102 122 +19.6% 0.84x
 
Improvement OLD NEW DELTA RATIO
NSStringConversion.Rebridge.LongUTF8 41 34 -17.1% 1.21x
NSStringConversion.Rebridge.Medium 138 117 -15.2% 1.18x (?)
NSStringConversion.Rebridge.UTF8 342 291 -14.9% 1.18x
NSStringConversion.Rebridge.Long 138 118 -14.5% 1.17x
ObjectiveCBridgeStubDateAccess 152 130 -14.5% 1.17x
Set.isDisjoint.Seq.Box.Empty 93 80 -14.0% 1.16x (?)
Set.isSubset.Seq.Int.Empty 122 106 -13.1% 1.15x (?)
Set.subtracting.Empty.Box 8 7 -12.5% 1.14x
Set.isStrictSubset.Seq.Int.Empty 118 106 -10.2% 1.11x (?)
Set.isStrictSubset.Int.Empty 50 45 -10.0% 1.11x (?)
StringHasSuffixAscii 1390 1260 -9.4% 1.10x
RandomShuffleLCG2 400 368 -8.0% 1.09x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
ArrayOfGenericPOD2 713 833 +16.8% 0.86x (?)
UTF8Decode_InitDecoding 122 142 +16.4% 0.86x (?)
 
Improvement OLD NEW DELTA RATIO
NopDeinit 126600 114000 -10.0% 1.11x (?)
ArrayInClass 2505 2275 -9.2% 1.10x
ExclusivityGlobal 113 104 -8.0% 1.09x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@Catfish-Man
Copy link
Contributor

20% improvement to NSString -> Swift String -> NSString conversion round trips :) nice!

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

Was just thinking about the outliner change here. Fundamentally without the release that we were peepholing before, outlining like this is not safe. That being said, I have a solution: outline with a coroutine. This will let me keep the original initial conversion in the same spot and not have to worry about lifetime extending inappropriately.

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

I talked with Arnold about the outliner thing. I am going to change the bridge argument code to handle this case with additional safety.

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

I am going to split out the outliner change into another PR.

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 1, 2019

Actually, I am going to put this down for a little bit.

@gottesmm
Copy link
Contributor Author

Redoing the testing here.

@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

mmm... I forgot about that outliner issue.

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 90e9e0168f449a84d77ef5316c7a0a92816d319b

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubDateAccess 130 152 +16.9% 0.86x
NSStringConversion.Long 495 578 +16.8% 0.86x
NSStringConversion.Medium 239 261 +9.2% 0.92x (?)
DictionaryGroup 146 157 +7.5% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ArrayAppendGenericStructs 1460 1080 -26.0% 1.35x (?)
EqualSubstringSubstring 27 21 -22.2% 1.29x
LessSubstringSubstring 27 21 -22.2% 1.29x (?)
EqualSubstringSubstringGenericEquatable 27 21 -22.2% 1.29x
EqualSubstringString 27 21 -22.2% 1.29x
LessSubstringSubstringGenericComparable 27 21 -22.2% 1.29x
EqualStringSubstring 28 22 -21.4% 1.27x
Dictionary4 190 157 -17.4% 1.21x
NSStringConversion.Rebridge.UTF8 343 292 -14.9% 1.17x
NSStringConversion.Rebridge.LongUTF8 41 35 -14.6% 1.17x
NSStringConversion.Rebridge.Long 138 118 -14.5% 1.17x
NSStringConversion.Rebridge.Medium 136 118 -13.2% 1.15x (?)
StringComparison_longSharedPrefix 355 321 -9.6% 1.11x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
ArrayAppendOptionals 590 1440 +144.1% 0.41x (?)
NSStringConversion.Long 503 567 +12.7% 0.89x
Calculator 141 154 +9.2% 0.92x (?)
NSStringConversion.Medium 239 259 +8.4% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
LessSubstringSubstring 27 21 -22.2% 1.29x
EqualSubstringString 27 21 -22.2% 1.29x (?)
LessSubstringSubstringGenericComparable 27 21 -22.2% 1.29x
EqualSubstringSubstring 28 22 -21.4% 1.27x (?)
EqualStringSubstring 28 22 -21.4% 1.27x (?)
EqualSubstringSubstringGenericEquatable 28 22 -21.4% 1.27x (?)
NSStringConversion.Rebridge.LongUTF8 41 34 -17.1% 1.21x
SortIntPyramid 530 450 -15.1% 1.18x
NSStringConversion.Rebridge.UTF8 343 293 -14.6% 1.17x
NSStringConversion.Rebridge.Medium 138 118 -14.5% 1.17x (?)
NSStringConversion.Rebridge.Long 138 118 -14.5% 1.17x (?)
ObjectiveCBridgeStubNSDateRefAccess 196 174 -11.2% 1.13x (?)
StringComparison_longSharedPrefix 356 321 -9.8% 1.11x (?)
UTF8Decode_InitDecoding 114 103 -9.6% 1.11x (?)

Code size: -Osize

Performance: -Onone

Improvement OLD NEW DELTA RATIO
EqualSubstringSubstringGenericEquatable 31 25 -19.4% 1.24x
LessSubstringSubstringGenericComparable 31 25 -19.4% 1.24x
EqualSubstringSubstring 32 26 -18.7% 1.23x (?)
LessSubstringSubstring 32 26 -18.7% 1.23x (?)
EqualStringSubstring 32 27 -15.6% 1.19x (?)
EqualSubstringString 32 27 -15.6% 1.19x (?)
UTF8Decode_InitDecoding 137 125 -8.8% 1.10x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 90e9e0168f449a84d77ef5316c7a0a92816d319b

@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

I took a bit of a closer look at this. The code pattern that we are looking at is:

%15 = function_ref @$SSS10FoundationE19_bridgeToObjectiveCSo8NSStringCyF
%16 = apply %15(%14) : $@convention(method) (@guaranteed String) -> @owned NSString
%17 = enum $Optional<NSString>, #Optional.some!enumelt.1, %16 : $NSString
release_value %14 : $String
...
apply %objcMethod(%17, ...)  : $@convention(objc_method) (Optional<NSString> ...) ->
release_value %17 : $Optional<NSString>

From what I am reading in the code, we do not find the release_value of %14 due to better optimization. In such a case since we are going to use %14 in a guaranteed way anyways, we can just pass %14 as a guaranteed parameter without worry.

@gottesmm gottesmm force-pushed the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch from 90e9e01 to 934806d Compare December 6, 2019 01:52
@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

@swift-ci test

2 similar comments
@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

@swift-ci test

@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

@swift-ci test

@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

@swift-ci test source compatibility

1 similar comment
@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

@swift-ci test source compatibility

@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

@swift-ci benchmark

1 similar comment
@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

@swift-ci benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Dec 6, 2019

Build failed
Swift Test Linux Platform
Git Sha - 90e9e0168f449a84d77ef5316c7a0a92816d319b

@swift-ci
Copy link
Contributor

swift-ci commented Dec 6, 2019

Build failed
Swift Test OS X Platform
Git Sha - 90e9e0168f449a84d77ef5316c7a0a92816d319b

@swift-ci
Copy link
Contributor

swift-ci commented Dec 6, 2019

Performance: -O

Regression OLD NEW DELTA RATIO
FlattenListLoop 4094 5299 +29.4% 0.77x (?)
LazilyFilteredArrayContains 29500 34900 +18.3% 0.85x
Data.hash.Empty 71 77 +8.5% 0.92x (?)
RemoveWhereSwapInts 60 65 +8.3% 0.92x (?)
MapReduceClass2 37 40 +8.1% 0.93x
MapReduce 371 400 +7.8% 0.93x (?)
MapReduceAnyCollection 369 397 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
StaticArray 3 2 -33.3% 1.50x (?)
NSStringConversion.Rebridge.LongUTF8 54 46 -14.8% 1.17x (?)
NSStringConversion.Rebridge.UTF8 453 391 -13.7% 1.16x (?)
NSStringConversion.Rebridge.Medium 184 159 -13.6% 1.16x (?)
NSStringConversion.Rebridge.Long 184 159 -13.6% 1.16x (?)
CharacterLiteralsLarge 108 97 -10.2% 1.11x
FloatingPointPrinting_Float_description_small 5832 5400 -7.4% 1.08x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
FlattenListLoop 4086 4456 +9.1% 0.92x (?)
RemoveWhereMoveInts 34 37 +8.8% 0.92x (?)
Array2D 6928 7520 +8.5% 0.92x
Data.hash.Empty 71 77 +8.5% 0.92x (?)
RandomShuffleLCG2 768 832 +8.3% 0.92x
Set.isDisjoint.Int.Empty 99 107 +8.1% 0.93x (?)
RemoveWhereSwapInts 62 67 +8.1% 0.93x (?)
Set.isSubset.Int.Empty 88 95 +8.0% 0.93x
 
Improvement OLD NEW DELTA RATIO
NSStringConversion.Rebridge.Long 188 159 -15.4% 1.18x (?)
NSStringConversion.Rebridge.Medium 187 159 -15.0% 1.18x (?)
NSStringConversion.Rebridge.LongUTF8 55 47 -14.5% 1.17x (?)
NSStringConversion.Rebridge.UTF8 460 394 -14.3% 1.17x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeFromNSArrayAnyObjectToString 44800 48600 +8.5% 0.92x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

Previously we were:

1. Doing a linear scan, performing certain optimizations, and setting up a
worklist for future processing.

2. Draining the worklist of changed instructions until we reached a fix point.

The key thing here is that for (1) to be safe, we would need to not perform any
optimizations on block arguments since there was an ssumption that we would only
perform SSA-like optimizations.

I am going to be adding such optimizations soon, so it makes sense to just
convert the initial traversal to non-destructively seed the worklist and
eliminate that initial optimization pass.

This should be NFC.
@gottesmm gottesmm force-pushed the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch from 934806d to 67ff2ae Compare December 6, 2019 19:51
@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

@swift-ci test

1 similar comment
@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

@swift-ci test

@swift-ci
Copy link
Contributor

swift-ci commented Dec 6, 2019

Build failed
Swift Test Linux Platform
Git Sha - 934806d7913dcbaf1fa4c5f15f479a6e9b16abc2

@swift-ci
Copy link
Contributor

swift-ci commented Dec 6, 2019

Build failed
Swift Test OS X Platform
Git Sha - 934806d7913dcbaf1fa4c5f15f479a6e9b16abc2

@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

The only regression I could reproduce is LazilyFilteredArrayContains. The rest are noise.

@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

In terms of improvements, I am seeing that all are real improvements except for static array.

@gottesmm
Copy link
Contributor Author

gottesmm commented Dec 6, 2019

I think the difference in that lazy benchmark is a weird cache effect. The reason why I am pretty confidant of this is that:

  1. In the benchmark everything except for retain/release calls have been inlined into the benchmark. So we know that we are not calling any stdlib code directly.
  2. When I look at the assembly, the only differences are:
--- before.s	2019-12-06 15:44:03.000000000 -0800
+++ after.s	2019-12-06 15:44:20.000000000 -0800
@@ -113,14 +113,14 @@
 +0x1b2	    jmp                 "run_LazilyFilteredArrayContains(_:)+0x23"
 +0x1b7	ud2
 +0x1b9	ud2
-+0x1bb	leaq                1372206(%rip), %rdi
++0x1bb	leaq                1372214(%rip), %rdi
 +0x1c2	callq               "__swift_instantiateConcreteTypeFromMangledName"
 +0x1c7	movl                $64, %esi
 +0x1cc	movl                $7, %edx
 +0x1d1	movq                %rax, %rdi
 +0x1d4	callq               "DYLD-STUB$$swift_allocObject"
 +0x1d9	movq                %rax, %r15
-+0x1dc	movaps              1118573(%rip), %xmm0
++0x1dc	movaps              1118589(%rip), %xmm0
 +0x1e3	movups              %xmm0, 16(%rax)
 +0x1e7	movq                $0, -64(%rbp)
 +0x1ef	movabsq             $-2305843009213693952, %rax
@@ -128,7 +128,7 @@
 +0x1fd	leaq                -64(%rbp), %r13
 +0x201	movl                $29, %edi
 +0x206	callq               "DYLD-STUB$$_StringGuts.grow(_:)"
-+0x20b	leaq                1163982(%rip), %rax
++0x20b	leaq                1163998(%rip), %rax
 +0x212	movabsq             $-9223372036854775808, %rdx
 +0x21c	orq                 %rax, %rdx
 +0x21f	movq                1318170(%rip), %r13
@@ -139,7 +139,7 @@
 +0x23e	movq                %r13, %rcx
 +0x241	movq                %r12, %r8
 +0x244	callq               "DYLD-STUB$$String.write<A>(to:)"
-+0x249	leaq                1208176(%rip), %rdi
++0x249	leaq                1208192(%rip), %rdi
 +0x250	movl                $35, %esi
 +0x255	movl                $2, %edx
 +0x25a	callq               "DYLD-STUB$$StaticString.description.getter"
@@ -157,7 +157,7 @@
 +0x28d	movq                %r13, %rcx
 +0x290	movq                %r12, %r8
 +0x293	callq               "DYLD-STUB$$String.write<A>(to:)"
-+0x298	leaq                1207841(%rip), %rdi
++0x298	leaq                1207857(%rip), %rdi
 +0x29f	movl                $81, %esi
 +0x2a4	movl                $2, %edx
 +0x2a9	callq               "DYLD-STUB$$StaticString.description.getter"

I.e. some constant is offset by 16. I wonder if we need to do something to fix the alignment of globals. Not sure.

@gottesmm gottesmm merged commit 84e23cc into swiftlang:master Dec 6, 2019
@gottesmm gottesmm deleted the pr-502b85e746b20821f0d6d3d0f28a16f56fc4f853 branch December 6, 2019 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants