[benchmark] Extract Setup from Benchmarks #20048

palimondo · 2018-10-25T20:56:19Z

This PR should land in tandem with #19910, which changes the measurement method used in Benchmark_Driver to run Benchmark_O with --num-iters=1. That change would appear to increase the reported runtime for benchmarks that have measurable setup overhead, which would be no longer amortized across multiple iterations.

Where appropriate, benchmarks were modified to use setUpFunction for costlier workload initialization. In some cases, extracting parts of the workload into constants was a sufficient change that has only minor impact on the first measured sample. MapReduceClass and MapReduceClassSmall were only two benchmarks, where the changes to extract setup resulted in significantly changed runtimes that do not resemble the original values (compared to the multi-iteration-amortized setup inside run function).

Running the benchmarks locally, it looks like unmodified benchmark RecursiveOwnedParameter is reporting a change (20% improvement). Do we still have an issue with spurious performance changes when adding/removing benchmarks caused by code alignment?

The PR is structured as a series of small, incremental changes with detailed descriptions in the commit messages, it is therefore best reviewed sequentially by individual commits.

ArrayAppendStrings had setup overhead of 10ms (42%). ArrayAppendLazyMap had setup overhead of 24 μs (1%). ArrayAppendOptionals and ArrayAppendArrayOfInt also had barely visible, small overhead of ~18μs, that was mostly hidden in measurement noise, but I’ve extracted the setup from all places that had 10 000 element array initializations, in preparation for more precise measurement in the future.

ArrayInClass had setup overhead of 88 μs (17%).

DataCount had setup overhead of 18 μs (20%). DataSubscript had setup overhead of 18 μs (2%). SetUpFunction wasn’t necessary, because of short initialization (18 μs for `sampleData(.medium)`), which will inflate only the initial measurement. Runtimes of other benchmarks hide the sampleData initialization in their artificially high runtimes — most use internal multiplier of 10 000 iterations — but were changed to use the same constant data, since it was already available. The overhead will already be extracted if we go for more precise measurement with lower multipliers in the future.

Dictionary had setup overhad of 136 μs (6%). DictionaryOfObjects had setup overhead of 616 μs (7%). Also fixed variable naming convention (lowerCameCase).

DistinctClassFieldAccesses had setup overhead of 4 μs (14%). Plus cosmetic code formatting fix.

IterateData has setup overhead of 480 μs (10%). There remained strange setup overhead after extracting the data into setUpFunction, because of of-by-one error in the main loop. It should be either: `for _ 1…10*N` or: `for _ 0..<10*N`. It’s error to use 0…m*N, because this will result in `m*N + 1` iterations that will be divided by N in the reported measurement. The extra iteration then manifests as a mysterious setup overhead!

Phonebook had setup overhead of 1266 μs (7%).

PolymorphicCalls has setup overhead of 4 μs (7%).

RandomShuffleLCG2 had setup overhead of 902 μs (17%) even though it already used the setUpFunction. Turns out that copying 100k element array is measurably costly. The only way to eliminate this overhead from measurement I could think of is to let the numbersLCG array linger around (800 kB), because shuffling the IOU version had different performance.

SortSortedStrings had setup overhead of 914 μs (30%). Renamed [String] constants to be shorter and more descriptive. Extracted the lazy initialiation of all these constants into `setUpFunction`, for cleaner measurements.

SubstringComparable had setup overhead of 58 μs (26%). This was a tricky modification: extracting `substrings` and `comparison` constants out of the run function surprisingly resulted in decreased performance. For some reason this configuration causes significant increase in retain/release traffic. Aliasing the constants in the run function somehow works around this deoptimization. Also the initial split of the string into 8 substrings takes 44ms!!! (I’m suspecting some king of one-time ICU initialization?)

Sequence benchmarks that test operations on Arrays have setup overhead of 14 μs. (Up from 4 μs a year ago!) That’s just the creation of an [Int] with 2k elements from a range… This array is now extracted into a constant. This commit also removes the .unstable tag from some CountableRange benchmarks, restoring them back to commit set of the Swift Benchmark Suite.

MapReduceClass had setup overhead fo 868 μs (7%). Setup overhead of MapReduceClassShort was practically lost in the measurement noise from it’s artificially high base load, but it was there. Extracting the decimal array initialization into `SetUpFunction` also takes out the cost of releasing the [NSDecimalNumber], which turns out to be about half of the measured runtime in the case of MapReduceClass benchmark. This significantly changes the reported runtimes (to about half), therfore the modified benchmarks get a new name with suffix `2`.

palimondo · 2018-10-25T21:01:40Z

@eeckstein Please review 🙏

palimondo · 2018-10-25T21:29:54Z

Unrelated to this PR, I'm a bit concerned about the lower precision changes detected by run_smoke_bench.py. Granted, I ran it on machine that wasn't super calm -- I had text editor and Google Chrome open… but still the second iteration re-examined some 59 test and last one focused on only 4 "changed" benchmarks:

 $ ./run_smoke_bench.py -O ${PREV_SBS} ${SBS}
Testing optimization level -O
    Iteration 1 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 3, running all tests
    Iteration 1 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 3, running all tests
    Iteration 2 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 4, re-testing 59 tests
    Iteration 2 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 4, re-testing 59 tests
    Iteration 3 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 5, re-testing 19 tests
    Iteration 3 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 5, re-testing 19 tests
    Iteration 4 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 6, re-testing 12 tests
    Iteration 4 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 6, re-testing 12 tests
    Iteration 5 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 7, re-testing 9 tests
    Iteration 5 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 7, re-testing 9 tests
    Iteration 6 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 8, re-testing 7 tests
    Iteration 6 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 8, re-testing 7 tests
    Iteration 7 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 9, re-testing 7 tests
    Iteration 7 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 9, re-testing 7 tests
    Iteration 8 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 10, re-testing 7 tests
    Iteration 8 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 10, re-testing 7 tests
    Iteration 9 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 10, re-testing 7 tests
    Iteration 9 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 10, re-testing 7 tests
    Iteration 10 for /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7: num samples = 10, re-testing 7 tests
    Iteration 10 for /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc: num samples = 10, re-testing 7 tests
Logfiles written to /Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7/result_O and /Users/mondo/Developer/swift-source/build/20181025-c71d8631dc/result_O

Here's the comparison report:

$ ./compare_perf_tests.py --changes-only --single-table --old-file /Users/mondo/Developer/swift-source/build/20181025-c71d863 1dc/result_O --new-file=/Users/mondo/Developer/swift-source/build/20181025-96ff53d5c7/result_O

TEST	OLD	NEW	DELTA	RATIO
Improvement
ArrayAppendStrings	24426	14806	-39.4%	1.65x
SortSortedStrings	3160	2270	-28.2%	1.39x
RandomShuffleLCG2	5136	4104	-20.1%	1.25x
RecursiveOwnedParameter	2341	1890	-19.3%	1.24x
ArrayInClass	534	446	-16.5%	1.20x
IterateData	4854	4164	-14.2%	1.17x
DictionaryOfObjects	9045	8286	-8.4%	1.09x (?)
DropLastAnySeqCRangeIter	21074	20050	-4.9%	1.05x
Added
DropLastArray	42	42	42	—
DropLastArrayLazy	52	52	52	—
DropLastCountableRange	17	17	17	—
DropLastCountableRangeLazy	17	21	18	—
DropWhileArray	202	202	202	—
MapReduceClass2	5676	5714	5691	—
MapReduceClassShort2	13150	13314	13220	—
PrefixWhileAnySeqCntRange	315	321	317	—
PrefixWhileAnySeqCntRangeLazy	198	199	198	—
PrefixWhileCountableRange	57	57	57	—
PrefixWhileCountableRangeLazy	55	55	55	—
SuffixArray	55	55	55	—
SuffixArrayLazy	52	52	52	—
SuffixCountableRange	18	18	18	—
SuffixCountableRangeLazy	18	18	18	—
Removed
MapReduceClass	12381	12580	12498	—
MapReduceClassShort	18956	19221	19054	—

I've also compared the logs from the modified (more robust) Benchmark_Driver that included #19910:

$ ./compare_perf_tests.py --single-table --changes-only --old-file /Users/mondo/Developer/swift-source/build/Ninja-ReleaseAssert/swift-ma cosx-x86_64/benchmark/logs/master/Benchmark_O-20181025212330-c71d8631dc.log --new-file /Users/mondo/Developer/swift-source/build/Ninja-ReleaseAssert/swift-mac osx-x86_64/benchmark/logs/i-just-do-eyes/Benchmark_O-20181024114556-96ff53d5c7.log

TEST	OLD	NEW	DELTA	RATIO
Regression
DataAppendDataLargeToLarge	262302	290075	+10.6%	0.90x (?)
ObjectiveCBridgeToNSSet	57129	62631	+9.6%	0.91x (?)
DataAppendBytes	16815	18095	+7.6%	0.93x (?)
DataAccessBytes	4440	4708	+6.0%	0.94x (?)
Improvement
ArrayAppendStrings	24633	14679	-40.4%	1.68x
SortSortedStrings	3163	2315	-26.8%	1.37x
RecursiveOwnedParameter	2329	1889	-18.9%	1.23x
SubstringComparable	286	232	-18.9%	1.23x
DataCount	91	75	-17.6%	1.21x
RandomShuffleLCG2	4944	4104	-17.0%	1.20x
ArrayInClass	532	445	-16.4%	1.20x
IterateData	4544	4037	-11.2%	1.13x
PrefixArray	132	119	-9.8%	1.11x
Phonebook	20516	18758	-8.6%	1.09x
PrefixWhileArrayLazy	168	154	-8.3%	1.09x
DropFirstArrayLazy	168	154	-8.3%	1.09x
PrefixArrayLazy	168	154	-8.3%	1.09x
DropFirstArray	171	158	-7.6%	1.08x
DictionaryOfObjects	9164	8489	-7.4%	1.08x
Dictionary	2050	1904	-7.1%	1.08x
RandomIntegersLCG	536	502	-6.3%	1.07x (?)
CStringShortAscii	15778	14835	-6.0%	1.06x (?)
DictionaryFilter	295405	277985	-5.9%	1.06x
DictionaryCopy	391730	369231	-5.7%	1.06x
ObjectiveCBridgeStubToNSDate2	3660	3467	-5.3%	1.06x (?)
ObjectiveCBridgeFromNSArrayAnyObjectForced	29064	27564	-5.2%	1.05x (?)
StringEdits	1428698	1355276	-5.1%	1.05x (?)
Added
DropLastArray	42	42	42	204800
DropLastArrayLazy	52	52	52	200704
DropLastCountableRange	17	17	17	192512
DropLastCountableRangeLazy	17	17	17	196608
DropWhileArray	201	202	201	208896
MapReduceClass2	5565	5897	5697	266240
MapReduceClassShort2	13510	13762	13628	225280
PrefixWhileAnySeqCntRange	315	319	316	245760
PrefixWhileAnySeqCntRangeLazy	198	198	198	249856
PrefixWhileCountableRange	57	57	57	192512
PrefixWhileCountableRangeLazy	54	55	54	196608
SuffixArray	55	55	55	212992
SuffixArrayLazy	52	52	52	196608
SuffixCountableRange	18	18	18	192512
SuffixCountableRangeLazy	18	18	18	196608
Removed
MapReduceClass	12350	12899	12638	299008
MapReduceClassShort	19091	19852	19513	217088

Once these changes land, I'll look at improving the sensitivity of smoke benchmark using the improvements from #19910.

EDIT: Thinking about this more, the smaller benchmarks could still be able to amortize the setup overhead in the 2500 μs sample-time used in smoke benchmarks, so it correctly didn't report the change. Nothing to see here — all is fine. ☺️

eeckstein · 2018-10-25T22:40:06Z

@swift-ci benchmark

swift-ci · 2018-10-25T23:24:32Z

!!! Couldn't read commit file !!!

eeckstein · 2018-10-26T20:51:07Z

@swift-ci benchmark

swift-ci · 2018-10-26T21:37:10Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Improvement
ArrayAppendStrings	8759	7042	-19.6%	1.24x
ArrayAppendAsciiSubstring	29708	25337	-14.7%	1.17x
SortSortedStrings	824	707	-14.2%	1.17x
IterateData	1905	1711	-10.2%	1.11x
DictionaryOfObjects	2248	2055	-8.6%	1.09x
Added
DropLastArray	11	11	11	—
DropLastArrayLazy	11	12	11	—
DropLastCountableRange	9	9	9	—
DropLastCountableRangeLazy	9	10	10	—
DropWhileArray	53	54	53	—
MapReduceClass2	1096	1097	1096	—
MapReduceClassShort2	2711	2769	2730	—
PrefixWhileAnySeqCntRange	141	145	142	—
PrefixWhileAnySeqCntRangeLazy	88	92	89	—
PrefixWhileCountableRange	30	31	30	—
PrefixWhileCountableRangeLazy	29	30	29	—
SuffixArray	11	12	11	—
SuffixArrayLazy	11	12	11	—
SuffixCountableRange	9	9	9	—
SuffixCountableRangeLazy	9	9	9	—
Removed
MapReduceClass	3199	3317	3239	—
MapReduceClassShort	4629	4722	4661	—

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
IterateData.o	1469	1781	+21.2%	0.82x
ArrayInClass.o	1453	1653	+13.8%	0.88x
PolymorphicCalls.o	7833	8505	+8.6%	0.92x
Phonebook.o	18152	18728	+3.2%	0.97x
MapReduce.o	24653	25413	+3.1%	0.97x
ArrayAppend.o	38982	39862	+2.3%	0.98x
SortStrings.o	104630	106606	+1.9%	0.98x
DropWhile.o	23740	24180	+1.9%	0.98x
Prefix.o	24673	25113	+1.8%	0.98x
DropFirst.o	25228	25668	+1.7%	0.98x
PrefixWhile.o	24062	24470	+1.7%	0.98x
Suffix.o	26249	26689	+1.7%	0.98x
DropLast.o	25515	25939	+1.7%	0.98x
Substring.o	27833	28257	+1.5%	0.98x
Improvement
DictTest.o	51083	22547	-55.9%	2.27x
DataBenchmarks.o	35196	26044	-26.0%	1.35x
RandomShuffle.o	3512	3152	-10.3%	1.11x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
SuffixAnyCollection	59	65	+10.2%	0.91x
Improvement
DropFirstAnyCollection	15463	165	-98.9%	93.71x
PrefixAnyCollection	15565	182	-98.8%	85.52x
SubstringComparable	21	14	-33.3%	1.50x
ArrayAppendStrings	8736	7044	-19.4%	1.24x
SortSortedStrings	872	715	-18.0%	1.22x
IterateData	1823	1538	-15.6%	1.19x
RandomShuffleLCG2	1831	1582	-13.6%	1.16x
DropLastAnyCollection	65	59	-9.2%	1.10x
ArrayAppendAsciiSubstring	27001	24952	-7.6%	1.08x
Added
DropLastArray	11	12	11	—
DropLastArrayLazy	11	12	11	—
DropLastCountableRange	12	12	12	—
DropLastCountableRangeLazy	11	12	11	—
DropWhileArray	54	55	54	—
MapReduceClass2	2995	3037	3018	—
MapReduceClassShort2	4674	4738	4696	—
PrefixWhileAnySeqCntRange	234	238	235	—
PrefixWhileAnySeqCntRangeLazy	176	180	177	—
PrefixWhileCountableRange	53	55	54	—
PrefixWhileCountableRangeLazy	35	36	36	—
SuffixArray	11	12	11	—
SuffixArrayLazy	11	12	11	—
SuffixCountableRange	11	12	11	—
SuffixCountableRangeLazy	11	12	11	—
Removed
MapReduceClass	3217	3340	3259	—
MapReduceClassShort	4654	4712	4685	—

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
IterateData.o	1693	1989	+17.5%	0.85x
ArrayInClass.o	1648	1848	+12.1%	0.89x
Phonebook.o	17692	18252	+3.2%	0.97x
Prefix.o	23777	24361	+2.5%	0.98x
Substring.o	20129	20569	+2.2%	0.98x
MapReduce.o	22381	22853	+2.1%	0.98x
DropFirst.o	23812	24308	+2.1%	0.98x
DropWhile.o	23420	23748	+1.4%	0.99x
Suffix.o	25937	26233	+1.1%	0.99x
ArrayAppend.o	37886	38318	+1.1%	0.99x
PolymorphicCalls.o	7353	7433	+1.1%	0.99x
PrefixWhile.o	24446	24710	+1.1%	0.99x
DropLast.o	25539	25803	+1.0%	0.99x
Improvement
DictTest.o	48437	18945	-60.9%	2.56x
DataBenchmarks.o	31565	21333	-32.4%	1.48x
RandomShuffle.o	3831	3311	-13.6%	1.16x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Improvement
SortSortedStrings	1547	875	-43.4%	1.77x
ArrayAppendStrings	10967	6278	-42.8%	1.75x
IterateData	7651	4989	-34.8%	1.53x
PrefixArray	4844	3566	-26.4%	1.36x
DropFirstArray	4826	3564	-26.2%	1.35x
Phonebook	21625	18534	-14.3%	1.17x
DropWhileArrayLazy	14201	13095	-7.8%	1.08x
Added
DropLastArray	1163	1163	1163	—
DropLastArrayLazy	10086	10151	10108	—
DropLastCountableRange	120	122	121	—
DropLastCountableRangeLazy	12146	12213	12181	—
DropWhileArray	5760	5807	5777	—
MapReduceClass2	28542	28644	28609	—
MapReduceClassShort2	39071	39247	39147	—
PrefixWhileAnySeqCntRange	29778	29917	29829	—
PrefixWhileAnySeqCntRangeLazy	18500	18564	18535	—
PrefixWhileCountableRange	14320	14495	14429	—
PrefixWhileCountableRangeLazy	18177	18279	18217	—
SuffixArray	1157	1159	1158	—
SuffixArrayLazy	10099	10211	10148	—
SuffixCountableRange	120	123	121	—
SuffixCountableRangeLazy	12146	12238	12191	—
Removed
MapReduceClass	29045	29129	29086	—
MapReduceClassShort	40439	40658	40582	—

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

eeckstein · 2018-10-26T22:15:42Z

@airspeedswift Do those (re-)added benchmark add any value for stdlib benchmarking?

palimondo · 2018-10-27T03:41:38Z

As their original author, I’d like to point out that the goal of expanding the Sequence API benchmark coverage was to demonstrate its interaction with various concrete underlying Sequence and Collection types. It shows that the optimizer can(not) deal with all the combinations that can arise from stdlib types. Also note how it demonstrates the extremely poor performance of most parts of the subsystem in the Onone mode. (3 orders of magnitude slower in case of SuffixCountableRangeLazy!)

Benchmarks in question were removed because they were deemed unstable, which was a property of the old measurement method — not the benchmarks themselves! Adding them back only restores the completeness of the coverage for the whole family.

palimondo · 2018-10-29T15:57:14Z

Note: The changes reported here by run_smoke_bech.py for ArrayAppendStrings, ArrayAppendAsciiSubstring, SortSortedStrings, IterateData, DictionaryOfObjects are caused by it's use of --sample-time that's too short to amortize their setup. I have verified locally that all these benchmarks should report the same runtimes pre/post this PR, if measured with the original benchmarking method with 1s sample time. Your internal script used for long-term performance tracking shouldn't see these as changes, if it is measuring with proper methodology.

Since this benchmark has been significantly modified and needs to be renamed, we can also lower the workload by a factor of 10, to keep up with the best practices. The old benchmark that uses `NSDecimalNumber` as the tested class is renamed to `MapReduceNSDecimalNumber` and the renamed `MapReduceClass2` now newly measures Swift class `Box` that wrap an `Int`. Short versions were modified analogously.

palimondo · 2018-10-30T20:28:19Z

Based on the comments by @airspeedswift in the forums, I've appropriated the MapReduceClass2 for measuring a Swift class and moved the old test to MapReduceNSDecimalNumber, while lowering the base workload by a factor of 10. @eeckstein, can you please re-run the benchmark?

eeckstein · 2018-10-30T22:22:46Z

@swift-ci benchmark

eeckstein · 2018-10-30T22:52:27Z

@swift-ci benchmark

swift-ci · 2018-10-30T23:38:33Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Improvement
ArrayAppendStrings	8667	7037	-18.8%	1.23x
SortSortedStrings	842	710	-15.7%	1.19x
ArrayAppendAsciiSubstring	29775	25287	-15.1%	1.18x
IterateData	1791	1615	-9.8%	1.11x
DictionaryOfObjects	2220	2051	-7.6%	1.08x
Added
DropLastArray	11	12	11	—
DropLastArrayLazy	11	12	11	—
DropLastCountableRange	9	9	9	—
DropLastCountableRangeLazy	9	10	10	—
DropWhileArray	53	53	53	—
MapReduceClass2	37	37	37	—
MapReduceClassShort2	196	198	197	—
MapReduceNSDecimalNumber	109	111	110	—
MapReduceNSDecimalNumberShort	270	272	271	—
PrefixWhileAnySeqCntRange	141	145	142	—
PrefixWhileAnySeqCntRangeLazy	88	90	89	—
PrefixWhileCountableRange	30	33	31	—
PrefixWhileCountableRangeLazy	29	30	29	—
SuffixArray	11	12	12	—
SuffixArrayLazy	11	12	11	—
SuffixCountableRange	9	9	9	—
SuffixCountableRangeLazy	9	9	9	—
Removed
MapReduceClass	3489	3802	3593	—
MapReduceClassShort	4629	4820	4732	—

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
MapReduce.o	24653	32128	+30.3%	0.77x
IterateData.o	1469	1781	+21.2%	0.82x
ArrayInClass.o	1453	1653	+13.8%	0.88x
PolymorphicCalls.o	7833	8505	+8.6%	0.92x
Phonebook.o	18152	18728	+3.2%	0.97x
ArrayAppend.o	38982	39862	+2.3%	0.98x
SortStrings.o	104630	106606	+1.9%	0.98x
DropWhile.o	23740	24180	+1.9%	0.98x
Prefix.o	24673	25113	+1.8%	0.98x
DropFirst.o	25228	25668	+1.7%	0.98x
PrefixWhile.o	24062	24470	+1.7%	0.98x
Suffix.o	26249	26689	+1.7%	0.98x
DropLast.o	25515	25939	+1.7%	0.98x
Substring.o	27833	28257	+1.5%	0.98x
Improvement
DictTest.o	51083	22547	-55.9%	2.27x
DataBenchmarks.o	35196	26044	-26.0%	1.35x
RandomShuffle.o	3512	3152	-10.3%	1.11x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
SuffixAnyCollection	60	67	+11.7%	0.90x
Improvement
DropFirstAnyCollection	15660	164	-99.0%	95.49x
PrefixAnyCollection	15757	182	-98.8%	86.58x
ArrayAppendStrings	7810	6154	-21.2%	1.27x
SortSortedStrings	876	733	-16.3%	1.20x
IterateData	1805	1541	-14.6%	1.17x
RandomShuffleLCG2	1830	1579	-13.7%	1.16x
DropLastAnyCollection	65	60	-7.7%	1.08x
DictionaryOfObjects	2484	2318	-6.7%	1.07x
Added
DropLastArray	11	12	11	—
DropLastArrayLazy	11	12	11	—
DropLastCountableRange	12	13	12	—
DropLastCountableRangeLazy	11	12	11	—
DropWhileArray	54	55	54	—
MapReduceClass2	54	55	54	—
MapReduceClassShort2	213	213	213	—
MapReduceNSDecimalNumber	301	331	313	—
MapReduceNSDecimalNumberShort	463	463	463	—
PrefixWhileAnySeqCntRange	234	236	235	—
PrefixWhileAnySeqCntRangeLazy	176	178	177	—
PrefixWhileCountableRange	53	54	53	—
PrefixWhileCountableRangeLazy	35	35	35	—
SuffixArray	11	12	11	—
SuffixArrayLazy	11	12	11	—
SuffixCountableRange	11	12	12	—
SuffixCountableRangeLazy	11	12	12	—
Removed
MapReduceClass	3226	3363	3273	—
MapReduceClassShort	4608	4679	4633	—

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
MapReduce.o	22381	27216	+21.6%	0.82x
IterateData.o	1693	1989	+17.5%	0.85x
ArrayInClass.o	1648	1848	+12.1%	0.89x
Phonebook.o	17692	18252	+3.2%	0.97x
Prefix.o	23777	24361	+2.5%	0.98x
Substring.o	20129	20569	+2.2%	0.98x
DropFirst.o	23812	24308	+2.1%	0.98x
DropWhile.o	23420	23748	+1.4%	0.99x
Suffix.o	25937	26233	+1.1%	0.99x
ArrayAppend.o	37886	38318	+1.1%	0.99x
PolymorphicCalls.o	7353	7433	+1.1%	0.99x
PrefixWhile.o	24446	24710	+1.1%	0.99x
DropLast.o	25539	25803	+1.0%	0.99x
Improvement
DictTest.o	48437	18945	-60.9%	2.56x
DataBenchmarks.o	31565	21333	-32.4%	1.48x
RandomShuffle.o	3831	3311	-13.6%	1.16x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Improvement
SortSortedStrings	1555	876	-43.7%	1.78x
ArrayAppendStrings	10374	6260	-39.7%	1.66x
IterateData	7722	5089	-34.1%	1.52x
PrefixArray	4802	3496	-27.2%	1.37x
DropFirstArray	4850	3763	-22.4%	1.29x
Phonebook	21485	18196	-15.3%	1.18x
PrefixWhileArray	11795	10400	-11.8%	1.13x
SubstringComparable	1586	1461	-7.9%	1.09x
Added
DropLastArray	1164	1164	1164	—
DropLastArrayLazy	10089	10296	10159	—
DropLastCountableRange	120	121	120	—
DropLastCountableRangeLazy	12073	12112	12099	—
DropWhileArray	5782	5788	5785	—
MapReduceClass2	2611	2663	2629	—
MapReduceClassShort2	3867	3943	3894	—
MapReduceNSDecimalNumber	2839	2852	2844	—
MapReduceNSDecimalNumberShort	4069	4149	4117	—
PrefixWhileAnySeqCntRange	31246	31390	31323	—
PrefixWhileAnySeqCntRangeLazy	19085	19405	19241	—
PrefixWhileCountableRange	15692	16154	15893	—
PrefixWhileCountableRangeLazy	18940	19436	19122	—
SuffixArray	1168	1169	1169	—
SuffixArrayLazy	10093	10332	10173	—
SuffixCountableRange	120	122	121	—
SuffixCountableRangeLazy	12024	12112	12067	—
Removed
MapReduceClass	29637	31884	30425	—
MapReduceClassShort	39111	39661	39346	—

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

palimondo · 2018-10-31T21:21:49Z

@eeckstein are you still waiting for @airspeedswift to chime in (re: re-added tests), or can we merge this as is?
My experiment with legacyFactor depends on some of these and I'd like to show you how promising that one looks. 🤓

eeckstein · 2018-10-31T23:51:46Z

The legacyFactor is a great idea. Does it make sense to implement that first and use that to compensate for these benchmark changes?

palimondo · 2018-11-01T04:08:40Z

No. As I said above, the changes reported here are just artifacts of low sample-time in run_smoke_bench. Without removing this setup overhead out of measurement my demonstration of legacyFactor is reporting bogus changes because of it.

palimondo · 2018-11-01T04:14:49Z

If you restored the old Benchmark_Driver run based measurement as something we could ask @swift-ci to run for us (full benchmark? — as a temporary counterpoint to smoke benchmark) you would see this PR doesn’t change any reported runtimes. Or you can verify by running that locally ;-)

eeckstein · 2018-11-01T23:32:00Z

@swift-ci smoke test

eeckstein · 2018-11-05T23:16:29Z

For now it seems that the re-added benchmarks are not unstable. I'm merging this. If it turns out that some of those benchmarks are unstable, we can disable them again.

palimondo added 15 commits October 23, 2018 20:49

[benchmark] Gardening: Fix copy/paste comments

4bbb635

[benchmark] Gardening: extract tags constant

8617745

[benchmark] Extract setup from ArrayInClass

03d9841

ArrayInClass had setup overhead of 88 μs (17%).

[benchmark] Extract setup in Dictionary(OfObjects)

32003a7

Dictionary had setup overhad of 136 μs (6%). DictionaryOfObjects had setup overhead of 616 μs (7%). Also fixed variable naming convention (lowerCameCase).

[benchmark] Extr. Setup DistinctClassFieldAccesses

63143d8

DistinctClassFieldAccesses had setup overhead of 4 μs (14%). Plus cosmetic code formatting fix.

[benchmark] Extract setup from Phonebook

58a195f

Phonebook had setup overhead of 1266 μs (7%).

[benchmark] Extract setup from PolymorphicCalls

4bc41f8

PolymorphicCalls has setup overhead of 4 μs (7%).

[benchmark] Extract setup from SortSortedStrings

6d3e637

SortSortedStrings had setup overhead of 914 μs (30%). Renamed [String] constants to be shorter and more descriptive. Extracted the lazy initialiation of all these constants into `setUpFunction`, for cleaner measurements.

palimondo changed the title ~~Extract Setup from Benchmarks~~ [benchmark] Extract Setup from Benchmarks Oct 25, 2018

palimondo mentioned this pull request Nov 1, 2018

[benchmark] Legacy Factor #20212

Merged

eeckstein merged commit 5f21c12 into swiftlang:master Nov 5, 2018

[benchmark] Extract Setup from Benchmarks #20048

[benchmark] Extract Setup from Benchmarks #20048

Uh oh!

Conversation

palimondo commented Oct 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Oct 25, 2018

Uh oh!

palimondo commented Oct 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeckstein commented Oct 25, 2018

Uh oh!

swift-ci commented Oct 25, 2018

Uh oh!

eeckstein commented Oct 26, 2018

Uh oh!

swift-ci commented Oct 26, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Uh oh!

eeckstein commented Oct 26, 2018

Uh oh!

palimondo commented Oct 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Oct 29, 2018

Uh oh!

palimondo commented Oct 30, 2018

Uh oh!

eeckstein commented Oct 30, 2018

Uh oh!

eeckstein commented Oct 30, 2018

Uh oh!

swift-ci commented Oct 30, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Uh oh!

palimondo commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeckstein commented Oct 31, 2018

Uh oh!

palimondo commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeckstein commented Nov 1, 2018

Uh oh!

eeckstein commented Nov 5, 2018

Uh oh!

Uh oh!

palimondo commented Oct 25, 2018 •

edited

Loading

palimondo commented Oct 25, 2018 •

edited

Loading

palimondo commented Oct 27, 2018 •

edited

Loading

palimondo commented Oct 31, 2018 •

edited

Loading

palimondo commented Nov 1, 2018 •

edited

Loading

palimondo commented Nov 1, 2018 •

edited

Loading