Skip to content

[benchmark] Extract Setup from Benchmarks #20048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Nov 5, 2018

Conversation

palimondo
Copy link
Contributor

@palimondo palimondo commented Oct 25, 2018

This PR should land in tandem with #19910, which changes the measurement method used in Benchmark_Driver to run Benchmark_O with --num-iters=1. That change would appear to increase the reported runtime for benchmarks that have measurable setup overhead, which would be no longer amortized across multiple iterations.

Where appropriate, benchmarks were modified to use setUpFunction for costlier workload initialization. In some cases, extracting parts of the workload into constants was a sufficient change that has only minor impact on the first measured sample. MapReduceClass and MapReduceClassSmall were only two benchmarks, where the changes to extract setup resulted in significantly changed runtimes that do not resemble the original values (compared to the multi-iteration-amortized setup inside run function).

Running the benchmarks locally, it looks like unmodified benchmark RecursiveOwnedParameter is reporting a change (20% improvement). Do we still have an issue with spurious performance changes when adding/removing benchmarks caused by code alignment?


The PR is structured as a series of small, incremental changes with detailed descriptions in the commit messages, it is therefore best reviewed sequentially by individual commits.

ArrayAppendStrings had setup overhead of 10ms (42%). ArrayAppendLazyMap had setup overhead of 24 μs (1%).

ArrayAppendOptionals and ArrayAppendArrayOfInt also had barely visible, small overhead of ~18μs, that was mostly hidden in measurement noise, but I’ve extracted the setup from all places that had 10 000 element array initializations, in preparation for more precise measurement in the future.
ArrayInClass had setup overhead of 88 μs (17%).
DataCount had setup overhead of 18 μs (20%).
DataSubscript had setup overhead of 18 μs (2%).
SetUpFunction wasn’t necessary, because of short initialization (18 μs for `sampleData(.medium)`), which will inflate only the initial measurement.

Runtimes of other benchmarks hide the sampleData initialization in their artificially high runtimes — most use internal multiplier of 10 000 iterations — but were changed to use the same constant data, since it was already available. The overhead will already be extracted if we go for more precise measurement with lower multipliers in the future.
Dictionary had setup overhad of 136 μs (6%).
DictionaryOfObjects had setup overhead of 616 μs (7%).
Also fixed variable naming convention (lowerCameCase).
DistinctClassFieldAccesses had setup overhead of 4 μs (14%).
Plus cosmetic code formatting fix.
IterateData has setup overhead of 480 μs (10%).

There remained strange setup overhead after extracting the data into setUpFunction, because of of-by-one error in the main loop. It should be either: `for _ 1…10*N` or: `for _ 0..<10*N`. It’s error to use 0…m*N, because this will result in `m*N + 1` iterations that will be divided by N in the reported measurement. The extra iteration then manifests as a mysterious setup overhead!
Phonebook had setup overhead of 1266 μs (7%).
PolymorphicCalls has setup overhead of 4 μs (7%).
RandomShuffleLCG2 had setup overhead of 902 μs (17%) even though it already used the setUpFunction. Turns out that copying 100k element array is measurably costly.

The only way to eliminate this overhead from measurement I could think of is to let the numbersLCG array linger around (800 kB), because shuffling the IOU version had different performance.
SortSortedStrings had setup overhead of 914 μs (30%).

Renamed [String] constants to be shorter and more descriptive. Extracted the lazy initialiation of all these constants into `setUpFunction`, for cleaner measurements.
SubstringComparable had setup overhead of 58 μs (26%).

This was a tricky modification: extracting `substrings` and `comparison` constants out of the run function surprisingly resulted in decreased performance. For some reason this configuration causes significant increase in retain/release traffic. Aliasing the constants in the run function somehow works around this deoptimization.

Also the initial split of the string into 8 substrings takes 44ms!!! (I’m suspecting some king of one-time ICU initialization?)
Sequence benchmarks that test operations on Arrays have setup overhead of 14 μs. (Up from 4 μs a year ago!) That’s just the creation of an [Int] with 2k elements from a range… This array is now extracted into a constant.

This commit also removes the .unstable tag from some CountableRange benchmarks, restoring them back to commit set of the Swift Benchmark Suite.
MapReduceClass had setup overhead fo 868 μs (7%).

Setup overhead of MapReduceClassShort was practically lost in the measurement noise from it’s artificially high base load, but it was there.

Extracting the decimal array initialization into `SetUpFunction` also takes out the cost of releasing the [NSDecimalNumber], which turns out to be about half of the measured runtime in the case of MapReduceClass benchmark. This significantly changes the reported runtimes (to about half), therfore the modified benchmarks get a new name with suffix `2`.
@palimondo
Copy link
Contributor Author

@eeckstein Please review 🙏

@palimondo palimondo changed the title Extract Setup from Benchmarks [benchmark] Extract Setup from Benchmarks Oct 25, 2018
@palimondo
Copy link
Contributor Author

palimondo commented Oct 25, 2018

Unrelated to this PR, I'm a bit concerned about the lower precision changes detected by run_smoke_bench.py. Granted, I ran it on machine that wasn't super calm -- I had text editor and Google Chrome open… but still the second iteration re-examined some 59 test and last one focused on only 4 "changed" benchmarks:

 $ ./run_smoke_bench.py -O ${PREV_SBS} ${SBS}
Testing optimization level -O
    Iteration 1 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 3, running all tests
    Iteration 1 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 3, running all tests
    Iteration 2 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 4, re-testing 59 tests
    Iteration 2 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 4, re-testing 59 tests
    Iteration 3 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 5, re-testing 19 tests
    Iteration 3 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 5, re-testing 19 tests
    Iteration 4 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 6, re-testing 12 tests
    Iteration 4 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 6, re-testing 12 tests
    Iteration 5 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 7, re-testing 9 tests
    Iteration 5 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 7, re-testing 9 tests
    Iteration 6 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 8, re-testing 7 tests
    Iteration 6 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 8, re-testing 7 tests
    Iteration 7 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 9, re-testing 7 tests
    Iteration 7 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 9, re-testing 7 tests
    Iteration 8 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 10, re-testing 7 tests
    Iteration 8 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 10, re-testing 7 tests
    Iteration 9 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 10, re-testing 7 tests
    Iteration 9 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 10, re-testing 7 tests
    Iteration 10 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 10, re-testing 7 tests
    Iteration 10 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 10, re-testing 7 tests
Logfiles written to /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7/result_O and /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc/result_O

Here's the comparison report:

$ ./compare_perf_tests.py --changes-only --single-table --old-file /Users/mondo/Developer/swift-source/build/20181025-c71d863 1dc/result_O --new-file=/Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7/result_O

TEST OLD NEW DELTA RATIO
Improvement
ArrayAppendStrings 24426 14806 -39.4% 1.65x
SortSortedStrings 3160 2270 -28.2% 1.39x
RandomShuffleLCG2 5136 4104 -20.1% 1.25x
RecursiveOwnedParameter 2341 1890 -19.3% 1.24x
ArrayInClass 534 446 -16.5% 1.20x
IterateData 4854 4164 -14.2% 1.17x
DictionaryOfObjects 9045 8286 -8.4% 1.09x (?)
DropLastAnySeqCRangeIter 21074 20050 -4.9% 1.05x
Added
DropLastArray 42 42 42
DropLastArrayLazy 52 52 52
DropLastCountableRange 17 17 17
DropLastCountableRangeLazy 17 21 18
DropWhileArray 202 202 202
MapReduceClass2 5676 5714 5691
MapReduceClassShort2 13150 13314 13220
PrefixWhileAnySeqCntRange 315 321 317
PrefixWhileAnySeqCntRangeLazy 198 199 198
PrefixWhileCountableRange 57 57 57
PrefixWhileCountableRangeLazy 55 55 55
SuffixArray 55 55 55
SuffixArrayLazy 52 52 52
SuffixCountableRange 18 18 18
SuffixCountableRangeLazy 18 18 18
Removed
MapReduceClass 12381 12580 12498
MapReduceClassShort 18956 19221 19054

I've also compared the logs from the modified (more robust) Benchmark_Driver that included #19910:

$ ./compare_perf_tests.py --single-table --changes-only --old-file /Users/mondo/Developer/swift-source/build/Ninja-ReleaseAssert/swift-ma cosx-x86_64/benchmark/logs/master/Benchmark_O-20181025212330-c71d8631dc.log --new-file /Users/mondo/Developer/swift-source/build/Ninja-ReleaseAssert/swift-mac osx-x86_64/benchmark/logs/i-just-do-eyes/Benchmark_O-20181024114556-96ff53d5c7.log

TEST OLD NEW DELTA RATIO
Regression
DataAppendDataLargeToLarge 262302 290075 +10.6% 0.90x (?)
ObjectiveCBridgeToNSSet 57129 62631 +9.6% 0.91x (?)
DataAppendBytes 16815 18095 +7.6% 0.93x (?)
DataAccessBytes 4440 4708 +6.0% 0.94x (?)
Improvement
ArrayAppendStrings 24633 14679 -40.4% 1.68x
SortSortedStrings 3163 2315 -26.8% 1.37x
RecursiveOwnedParameter 2329 1889 -18.9% 1.23x
SubstringComparable 286 232 -18.9% 1.23x
DataCount 91 75 -17.6% 1.21x
RandomShuffleLCG2 4944 4104 -17.0% 1.20x
ArrayInClass 532 445 -16.4% 1.20x
IterateData 4544 4037 -11.2% 1.13x
PrefixArray 132 119 -9.8% 1.11x
Phonebook 20516 18758 -8.6% 1.09x
PrefixWhileArrayLazy 168 154 -8.3% 1.09x
DropFirstArrayLazy 168 154 -8.3% 1.09x
PrefixArrayLazy 168 154 -8.3% 1.09x
DropFirstArray 171 158 -7.6% 1.08x
DictionaryOfObjects 9164 8489 -7.4% 1.08x
Dictionary 2050 1904 -7.1% 1.08x
RandomIntegersLCG 536 502 -6.3% 1.07x (?)
CStringShortAscii 15778 14835 -6.0% 1.06x (?)
DictionaryFilter 295405 277985 -5.9% 1.06x
DictionaryCopy 391730 369231 -5.7% 1.06x
ObjectiveCBridgeStubToNSDate2 3660 3467 -5.3% 1.06x (?)
ObjectiveCBridgeFromNSArrayAnyObjectForced 29064 27564 -5.2% 1.05x (?)
StringEdits 1428698 1355276 -5.1% 1.05x (?)
Added
DropLastArray 42 42 42 204800
DropLastArrayLazy 52 52 52 200704
DropLastCountableRange 17 17 17 192512
DropLastCountableRangeLazy 17 17 17 196608
DropWhileArray 201 202 201 208896
MapReduceClass2 5565 5897 5697 266240
MapReduceClassShort2 13510 13762 13628 225280
PrefixWhileAnySeqCntRange 315 319 316 245760
PrefixWhileAnySeqCntRangeLazy 198 198 198 249856
PrefixWhileCountableRange 57 57 57 192512
PrefixWhileCountableRangeLazy 54 55 54 196608
SuffixArray 55 55 55 212992
SuffixArrayLazy 52 52 52 196608
SuffixCountableRange 18 18 18 192512
SuffixCountableRangeLazy 18 18 18 196608
Removed
MapReduceClass 12350 12899 12638 299008
MapReduceClassShort 19091 19852 19513 217088

Once these changes land, I'll look at improving the sensitivity of smoke benchmark using the improvements from #19910.

EDIT: Thinking about this more, the smaller benchmarks could still be able to amortize the setup overhead in the 2500 μs sample-time used in smoke benchmarks, so it correctly didn't report the change. Nothing to see here — all is fine. ☺️

@eeckstein
Copy link
Contributor

@swift-ci benchmark

@swift-ci
Copy link
Contributor

!!! Couldn't read commit file !!!

@eeckstein
Copy link
Contributor

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
ArrayAppendStrings 8759 7042 -19.6% 1.24x
ArrayAppendAsciiSubstring 29708 25337 -14.7% 1.17x
SortSortedStrings 824 707 -14.2% 1.17x
IterateData 1905 1711 -10.2% 1.11x
DictionaryOfObjects 2248 2055 -8.6% 1.09x
Added
DropLastArray 11 11 11
DropLastArrayLazy 11 12 11
DropLastCountableRange 9 9 9
DropLastCountableRangeLazy 9 10 10
DropWhileArray 53 54 53
MapReduceClass2 1096 1097 1096
MapReduceClassShort2 2711 2769 2730
PrefixWhileAnySeqCntRange 141 145 142
PrefixWhileAnySeqCntRangeLazy 88 92 89
PrefixWhileCountableRange 30 31 30
PrefixWhileCountableRangeLazy 29 30 29
SuffixArray 11 12 11
SuffixArrayLazy 11 12 11
SuffixCountableRange 9 9 9
SuffixCountableRangeLazy 9 9 9
Removed
MapReduceClass 3199 3317 3239
MapReduceClassShort 4629 4722 4661

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
IterateData.o 1469 1781 +21.2% 0.82x
ArrayInClass.o 1453 1653 +13.8% 0.88x
PolymorphicCalls.o 7833 8505 +8.6% 0.92x
Phonebook.o 18152 18728 +3.2% 0.97x
MapReduce.o 24653 25413 +3.1% 0.97x
ArrayAppend.o 38982 39862 +2.3% 0.98x
SortStrings.o 104630 106606 +1.9% 0.98x
DropWhile.o 23740 24180 +1.9% 0.98x
Prefix.o 24673 25113 +1.8% 0.98x
DropFirst.o 25228 25668 +1.7% 0.98x
PrefixWhile.o 24062 24470 +1.7% 0.98x
Suffix.o 26249 26689 +1.7% 0.98x
DropLast.o 25515 25939 +1.7% 0.98x
Substring.o 27833 28257 +1.5% 0.98x
Improvement
DictTest.o 51083 22547 -55.9% 2.27x
DataBenchmarks.o 35196 26044 -26.0% 1.35x
RandomShuffle.o 3512 3152 -10.3% 1.11x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
SuffixAnyCollection 59 65 +10.2% 0.91x
Improvement
DropFirstAnyCollection 15463 165 -98.9% 93.71x
PrefixAnyCollection 15565 182 -98.8% 85.52x
SubstringComparable 21 14 -33.3% 1.50x
ArrayAppendStrings 8736 7044 -19.4% 1.24x
SortSortedStrings 872 715 -18.0% 1.22x
IterateData 1823 1538 -15.6% 1.19x
RandomShuffleLCG2 1831 1582 -13.6% 1.16x
DropLastAnyCollection 65 59 -9.2% 1.10x
ArrayAppendAsciiSubstring 27001 24952 -7.6% 1.08x
Added
DropLastArray 11 12 11
DropLastArrayLazy 11 12 11
DropLastCountableRange 12 12 12
DropLastCountableRangeLazy 11 12 11
DropWhileArray 54 55 54
MapReduceClass2 2995 3037 3018
MapReduceClassShort2 4674 4738 4696
PrefixWhileAnySeqCntRange 234 238 235
PrefixWhileAnySeqCntRangeLazy 176 180 177
PrefixWhileCountableRange 53 55 54
PrefixWhileCountableRangeLazy 35 36 36
SuffixArray 11 12 11
SuffixArrayLazy 11 12 11
SuffixCountableRange 11 12 11
SuffixCountableRangeLazy 11 12 11
Removed
MapReduceClass 3217 3340 3259
MapReduceClassShort 4654 4712 4685

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
IterateData.o 1693 1989 +17.5% 0.85x
ArrayInClass.o 1648 1848 +12.1% 0.89x
Phonebook.o 17692 18252 +3.2% 0.97x
Prefix.o 23777 24361 +2.5% 0.98x
Substring.o 20129 20569 +2.2% 0.98x
MapReduce.o 22381 22853 +2.1% 0.98x
DropFirst.o 23812 24308 +2.1% 0.98x
DropWhile.o 23420 23748 +1.4% 0.99x
Suffix.o 25937 26233 +1.1% 0.99x
ArrayAppend.o 37886 38318 +1.1% 0.99x
PolymorphicCalls.o 7353 7433 +1.1% 0.99x
PrefixWhile.o 24446 24710 +1.1% 0.99x
DropLast.o 25539 25803 +1.0% 0.99x
Improvement
DictTest.o 48437 18945 -60.9% 2.56x
DataBenchmarks.o 31565 21333 -32.4% 1.48x
RandomShuffle.o 3831 3311 -13.6% 1.16x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Improvement
SortSortedStrings 1547 875 -43.4% 1.77x
ArrayAppendStrings 10967 6278 -42.8% 1.75x
IterateData 7651 4989 -34.8% 1.53x
PrefixArray 4844 3566 -26.4% 1.36x
DropFirstArray 4826 3564 -26.2% 1.35x
Phonebook 21625 18534 -14.3% 1.17x
DropWhileArrayLazy 14201 13095 -7.8% 1.08x
Added
DropLastArray 1163 1163 1163
DropLastArrayLazy 10086 10151 10108
DropLastCountableRange 120 122 121
DropLastCountableRangeLazy 12146 12213 12181
DropWhileArray 5760 5807 5777
MapReduceClass2 28542 28644 28609
MapReduceClassShort2 39071 39247 39147
PrefixWhileAnySeqCntRange 29778 29917 29829
PrefixWhileAnySeqCntRangeLazy 18500 18564 18535
PrefixWhileCountableRange 14320 14495 14429
PrefixWhileCountableRangeLazy 18177 18279 18217
SuffixArray 1157 1159 1158
SuffixArrayLazy 10099 10211 10148
SuffixCountableRange 120 123 121
SuffixCountableRangeLazy 12146 12238 12191
Removed
MapReduceClass 29045 29129 29086
MapReduceClassShort 40439 40658 40582
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@eeckstein
Copy link
Contributor

@airspeedswift Do those (re-)added benchmark add any value for stdlib benchmarking?

@palimondo
Copy link
Contributor Author

palimondo commented Oct 27, 2018

As their original author, I’d like to point out that the goal of expanding the Sequence API benchmark coverage was to demonstrate its interaction with various concrete underlying Sequence and Collection types. It shows that the optimizer can(not) deal with all the combinations that can arise from stdlib types. Also note how it demonstrates the extremely poor performance of most parts of the subsystem in the Onone mode. (3 orders of magnitude slower in case of SuffixCountableRangeLazy!)

Benchmarks in question were removed because they were deemed unstable, which was a property of the old measurement method — not the benchmarks themselves! Adding them back only restores the completeness of the coverage for the whole family.

@palimondo
Copy link
Contributor Author

Note: The changes reported here by run_smoke_bech.py for ArrayAppendStrings, ArrayAppendAsciiSubstring, SortSortedStrings, IterateData, DictionaryOfObjects are caused by it's use of --sample-time that's too short to amortize their setup. I have verified locally that all these benchmarks should report the same runtimes pre/post this PR, if measured with the original benchmarking method with 1s sample time. Your internal script used for long-term performance tracking shouldn't see these as changes, if it is measuring with proper methodology.

Since this benchmark has been significantly modified and needs to be renamed, we can also lower the workload by a factor of 10, to keep up with the best practices.

The old benchmark that uses `NSDecimalNumber` as the tested class is renamed to `MapReduceNSDecimalNumber` and the renamed `MapReduceClass2` now newly measures Swift class `Box` that wrap an `Int`. Short versions were modified analogously.
@palimondo
Copy link
Contributor Author

Based on the comments by @airspeedswift in the forums, I've appropriated the MapReduceClass2 for measuring a Swift class and moved the old test to MapReduceNSDecimalNumber, while lowering the base workload by a factor of 10. @eeckstein, can you please re-run the benchmark?

@eeckstein
Copy link
Contributor

@swift-ci benchmark

1 similar comment
@eeckstein
Copy link
Contributor

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
ArrayAppendStrings 8667 7037 -18.8% 1.23x
SortSortedStrings 842 710 -15.7% 1.19x
ArrayAppendAsciiSubstring 29775 25287 -15.1% 1.18x
IterateData 1791 1615 -9.8% 1.11x
DictionaryOfObjects 2220 2051 -7.6% 1.08x
Added
DropLastArray 11 12 11
DropLastArrayLazy 11 12 11
DropLastCountableRange 9 9 9
DropLastCountableRangeLazy 9 10 10
DropWhileArray 53 53 53
MapReduceClass2 37 37 37
MapReduceClassShort2 196 198 197
MapReduceNSDecimalNumber 109 111 110
MapReduceNSDecimalNumberShort 270 272 271
PrefixWhileAnySeqCntRange 141 145 142
PrefixWhileAnySeqCntRangeLazy 88 90 89
PrefixWhileCountableRange 30 33 31
PrefixWhileCountableRangeLazy 29 30 29
SuffixArray 11 12 12
SuffixArrayLazy 11 12 11
SuffixCountableRange 9 9 9
SuffixCountableRangeLazy 9 9 9
Removed
MapReduceClass 3489 3802 3593
MapReduceClassShort 4629 4820 4732

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
MapReduce.o 24653 32128 +30.3% 0.77x
IterateData.o 1469 1781 +21.2% 0.82x
ArrayInClass.o 1453 1653 +13.8% 0.88x
PolymorphicCalls.o 7833 8505 +8.6% 0.92x
Phonebook.o 18152 18728 +3.2% 0.97x
ArrayAppend.o 38982 39862 +2.3% 0.98x
SortStrings.o 104630 106606 +1.9% 0.98x
DropWhile.o 23740 24180 +1.9% 0.98x
Prefix.o 24673 25113 +1.8% 0.98x
DropFirst.o 25228 25668 +1.7% 0.98x
PrefixWhile.o 24062 24470 +1.7% 0.98x
Suffix.o 26249 26689 +1.7% 0.98x
DropLast.o 25515 25939 +1.7% 0.98x
Substring.o 27833 28257 +1.5% 0.98x
Improvement
DictTest.o 51083 22547 -55.9% 2.27x
DataBenchmarks.o 35196 26044 -26.0% 1.35x
RandomShuffle.o 3512 3152 -10.3% 1.11x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
SuffixAnyCollection 60 67 +11.7% 0.90x
Improvement
DropFirstAnyCollection 15660 164 -99.0% 95.49x
PrefixAnyCollection 15757 182 -98.8% 86.58x
ArrayAppendStrings 7810 6154 -21.2% 1.27x
SortSortedStrings 876 733 -16.3% 1.20x
IterateData 1805 1541 -14.6% 1.17x
RandomShuffleLCG2 1830 1579 -13.7% 1.16x
DropLastAnyCollection 65 60 -7.7% 1.08x
DictionaryOfObjects 2484 2318 -6.7% 1.07x
Added
DropLastArray 11 12 11
DropLastArrayLazy 11 12 11
DropLastCountableRange 12 13 12
DropLastCountableRangeLazy 11 12 11
DropWhileArray 54 55 54
MapReduceClass2 54 55 54
MapReduceClassShort2 213 213 213
MapReduceNSDecimalNumber 301 331 313
MapReduceNSDecimalNumberShort 463 463 463
PrefixWhileAnySeqCntRange 234 236 235
PrefixWhileAnySeqCntRangeLazy 176 178 177
PrefixWhileCountableRange 53 54 53
PrefixWhileCountableRangeLazy 35 35 35
SuffixArray 11 12 11
SuffixArrayLazy 11 12 11
SuffixCountableRange 11 12 12
SuffixCountableRangeLazy 11 12 12
Removed
MapReduceClass 3226 3363 3273
MapReduceClassShort 4608 4679 4633

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
MapReduce.o 22381 27216 +21.6% 0.82x
IterateData.o 1693 1989 +17.5% 0.85x
ArrayInClass.o 1648 1848 +12.1% 0.89x
Phonebook.o 17692 18252 +3.2% 0.97x
Prefix.o 23777 24361 +2.5% 0.98x
Substring.o 20129 20569 +2.2% 0.98x
DropFirst.o 23812 24308 +2.1% 0.98x
DropWhile.o 23420 23748 +1.4% 0.99x
Suffix.o 25937 26233 +1.1% 0.99x
ArrayAppend.o 37886 38318 +1.1% 0.99x
PolymorphicCalls.o 7353 7433 +1.1% 0.99x
PrefixWhile.o 24446 24710 +1.1% 0.99x
DropLast.o 25539 25803 +1.0% 0.99x
Improvement
DictTest.o 48437 18945 -60.9% 2.56x
DataBenchmarks.o 31565 21333 -32.4% 1.48x
RandomShuffle.o 3831 3311 -13.6% 1.16x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Improvement
SortSortedStrings 1555 876 -43.7% 1.78x
ArrayAppendStrings 10374 6260 -39.7% 1.66x
IterateData 7722 5089 -34.1% 1.52x
PrefixArray 4802 3496 -27.2% 1.37x
DropFirstArray 4850 3763 -22.4% 1.29x
Phonebook 21485 18196 -15.3% 1.18x
PrefixWhileArray 11795 10400 -11.8% 1.13x
SubstringComparable 1586 1461 -7.9% 1.09x
Added
DropLastArray 1164 1164 1164
DropLastArrayLazy 10089 10296 10159
DropLastCountableRange 120 121 120
DropLastCountableRangeLazy 12073 12112 12099
DropWhileArray 5782 5788 5785
MapReduceClass2 2611 2663 2629
MapReduceClassShort2 3867 3943 3894
MapReduceNSDecimalNumber 2839 2852 2844
MapReduceNSDecimalNumberShort 4069 4149 4117
PrefixWhileAnySeqCntRange 31246 31390 31323
PrefixWhileAnySeqCntRangeLazy 19085 19405 19241
PrefixWhileCountableRange 15692 16154 15893
PrefixWhileCountableRangeLazy 18940 19436 19122
SuffixArray 1168 1169 1169
SuffixArrayLazy 10093 10332 10173
SuffixCountableRange 120 122 121
SuffixCountableRangeLazy 12024 12112 12067
Removed
MapReduceClass 29637 31884 30425
MapReduceClassShort 39111 39661 39346
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@palimondo
Copy link
Contributor Author

palimondo commented Oct 31, 2018

@eeckstein are you still waiting for @airspeedswift to chime in (re: re-added tests), or can we merge this as is?
My experiment with legacyFactor depends on some of these and I'd like to show you how promising that one looks. 🤓

@eeckstein
Copy link
Contributor

The legacyFactor is a great idea. Does it make sense to implement that first and use that to compensate for these benchmark changes?

@palimondo
Copy link
Contributor Author

palimondo commented Nov 1, 2018

No. As I said above, the changes reported here are just artifacts of low sample-time in run_smoke_bench. Without removing this setup overhead out of measurement my demonstration of legacyFactor is reporting bogus changes because of it.

@palimondo
Copy link
Contributor Author

palimondo commented Nov 1, 2018

If you restored the old Benchmark_Driver run based measurement as something we could ask @swift-ci to run for us (full benchmark? — as a temporary counterpoint to smoke benchmark) you would see this PR doesn’t change any reported runtimes. Or you can verify by running that locally ;-)

@eeckstein
Copy link
Contributor

@swift-ci smoke test

@eeckstein
Copy link
Contributor

For now it seems that the re-added benchmarks are not unstable. I'm merging this. If it turns out that some of those benchmarks are unstable, we can disable them again.

@eeckstein eeckstein merged commit 5f21c12 into swiftlang:master Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants