[benchmark] Legacy Factor #20212

palimondo · 2018-11-01T05:37:36Z

This PR adds optional legacyFactor to the BenchmarkInfo, which allows for linear modification of constants that unnecessarily inflate the base workload of benchmarks, while maintaining the continuity of log-term benchmark tracking.

For example, if a benchmark uses for _ in N*10_000 in its run function, we could lower this to for _ in N*1_000 and adding a legacyFactor: 10 to its BenchmarkInfo.

Note that this doesn’t affect the real measurements gathered from the --verbose output. The BenchmarkDoctor has been slightly adjusted to work with these real samples, therefore Benchmark_Driver check will not flag these benchmarks for slow run time reported in the summary, if their real runtimes fall into the recommended range.

This PR also modifies benchmarks with long runtimes to use the legacyFactor and lowers their base workloads, so that they run in around the 1ms mark.

This adds optional `legacyFactor` to the `BenchmarkInfo`, which allows for linear modification of constants that unnecesarily inflate the base workload of benchmarks, while maintaining the continuity of log-term benchmark tracking. For example, if a benchmark uses `for _ in N*10_000` in its run function, we could lower this to `for _ in N*1_000` and adding a `legacyFactor: 10` to its `BenchmarkInfo`. Note that this doesn’t affect the real measurements gathered from the `--verbose` output. The `BenchmarkDoctor` has been slightly adjusted to work with these real samples, therefore `Benchmark_Driver check` will not flag these benchmarks for slow run time reported in the summary, if their real runtimes fall into the recommended range.

Lowered the base workload by a factor of 500

Lowered the base workload by a factor of 10

Lowered base workload by a factor of 10.

palimondo · 2018-11-01T05:40:01Z

@eeckstein Please run benchmark here. A have a pending commit for legacy factor of ArrayAppend*, but it depends in removing the setup overhead from PR #20048.

eeckstein · 2018-11-05T23:18:40Z

@swift-ci benchmark

swift-ci · 2018-11-06T00:03:43Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
Sim2DArray	311	399	+28.3%	0.78x
RangeIterationSigned	171	200	+17.0%	0.86x
RandomDoubleLCG	912	1057	+15.9%	0.86x
MapReduceLazyCollectionShort	31	34	+9.7%	0.91x
Improvement
BinaryFloatingPointPropertiesBinade	31	25	-19.4%	1.24x
DataCount	37	34	-8.1%	1.09x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
main.o	49053	53661	+9.4%	0.91x
DoubleWidthDivision.o	528	544	+3.0%	0.97x
ProtocolDispatch.o	765	781	+2.1%	0.98x
StringWalk.o	42055	42855	+1.9%	0.98x
CharacterLiteralsLarge.o	867	883	+1.8%	0.98x
CaptureProp.o	909	925	+1.8%	0.98x
NSStringConversion.o	953	969	+1.7%	0.98x
AnyHashableWithAClass.o	2917	2965	+1.6%	0.98x
ArrayLiteral.o	3037	3085	+1.6%	0.98x
AngryPhonebook.o	1019	1035	+1.6%	0.98x
SuperChars.o	1108	1124	+1.4%	0.99x
COWArrayGuaranteedParameterOverhead.o	1159	1175	+1.4%	0.99x
TestsUtils.o	21865	22147	+1.3%	0.99x
RandomValues.o	3760	3808	+1.3%	0.99x
Sim2DArray.o	1262	1278	+1.3%	0.99x
NSError.o	1263	1279	+1.3%	0.99x
CharacterLiteralsSmall.o	1309	1325	+1.2%	0.99x
DictionaryLiteral.o	1344	1360	+1.2%	0.99x
RecursiveOwnedParameter.o	1366	1382	+1.2%	0.99x
ExistentialPerformance.o	68717	69517	+1.2%	0.99x
FloatingPointPrinting.o	5831	5895	+1.1%	0.99x
Chars.o	1459	1475	+1.1%	0.99x
MonteCarloPi.o	1529	1545	+1.0%	0.99x
Fibonacci.o	1538	1554	+1.0%	0.99x
ChainedFilterMap.o	3085	3117	+1.0%	0.99x
SevenBoom.o	1568	1584	+1.0%	0.99x
RangeIteration.o	1578	1594	+1.0%	0.99x
ByteSwap.o	1578	1594	+1.0%	0.99x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
MapReduceLazyCollectionShort	41	85	+107.3%	0.48x
FlattenListLoop	4051	4427	+9.3%	0.92x (?)
SuffixCountableRangeLazy	11	12	+9.1%	0.92x
Array2D	6611	7207	+9.0%	0.92x
ReversedDictionary2	315	342	+8.6%	0.92x
Improvement
RangeIterationSigned	200	171	-14.5%	1.17x
RandomDoubleLCG	967	884	-8.6%	1.09x
SuffixCountableRange	12	11	-8.3%	1.09x

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
main.o	45873	50385	+9.8%	0.91x
StringWalk.o	36440	37400	+2.6%	0.97x
DoubleWidthDivision.o	609	625	+2.6%	0.97x
COWArrayGuaranteedParameterOverhead.o	1355	1387	+2.4%	0.98x
ExistentialPerformance.o	60861	62285	+2.3%	0.98x
ArrayLiteral.o	2872	2936	+2.2%	0.98x
FloatingPointPrinting.o	5072	5184	+2.2%	0.98x
InsertCharacter.o	4719	4815	+2.0%	0.98x
ProtocolDispatch.o	846	862	+1.9%	0.98x
StringBuilder.o	7030	7158	+1.8%	0.98x
CharacterLiteralsLarge.o	932	948	+1.7%	0.98x
DataBenchmarks.o	21429	21781	+1.6%	0.98x
TestsUtils.o	18345	18627	+1.5%	0.98x
OpaqueConsumingUsers.o	2089	2121	+1.5%	0.98x
NSStringConversion.o	1050	1066	+1.5%	0.98x
ObjectiveCBridgingStubs.o	8797	8925	+1.5%	0.99x
ObjectiveCNoBridgingStubs.o	7807	7919	+1.4%	0.99x
RandomValues.o	3433	3481	+1.4%	0.99x
Exclusivity.o	3468	3516	+1.4%	0.99x
RecursiveOwnedParameter.o	1281	1297	+1.2%	0.99x
ArrayOfGenericPOD.o	2778	2810	+1.2%	0.99x
CharacterLiteralsSmall.o	1412	1428	+1.1%	0.99x
Substring.o	16962	17154	+1.1%	0.99x
DictionaryLiteral.o	1493	1509	+1.1%	0.99x
AnyHashableWithAClass.o	3149	3181	+1.0%	0.99x
Improvement
ArraySetElement.o	1183	1167	-1.4%	1.01x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

palimondo · 2018-11-06T07:35:04Z

@eeckstein It looks like the demonstration of legacyFactor has been successful: the modified benchmarks reported no changes.

I'm a bit worried by the amount of other noise reported here. Any thoughts what's causing it? Why are there so many code size changes? Somethings smells 🐟🐠🐡 here...

palimondo · 2018-11-06T09:16:43Z

@eeckstein Let's land this as is and next I'll go through the whole Swift Benchmark Suite in batches and apply the legacyFactor where appropriate. This will give us much more robust and more precise measurements when using Benchmark_Driver. As a side effect, it will also make running smoke bench faster.

eeckstein · 2018-11-06T16:21:03Z

The code size changes are clear: you added a field to BenchmarkInfo. That adds to the initialization code in basically every benchmark source file.
I'm also not surprised by the reported performance differences. That can be code alignment issues again.

eeckstein · 2018-11-06T16:21:17Z

@swift-ci smoke test

palimondo · 2018-11-06T17:25:06Z

Is there something we can do to mitigate the code alignment vs. performance issue? (I have literally no idea about this…)

eeckstein · 2018-11-06T19:31:47Z

We already page-align all the benchmark modules. But that does not help if there are code changes inside the benchmark files.

palimondo added 4 commits November 1, 2018 06:24

[benchmark] Legacy factor AnyHashableWithAClass

04d1384

Lowered the base workload by a factor of 500

[benchmark] Legacy factor ArrayOf[Generic]Ref

435e55f

Lowered the base workload by a factor of 10

[benchmark] Legacy factor ArraySetElement

f121ee1

Lowered base workload by a factor of 10.

palimondo changed the title ~~WIP [benchmark] Legacy Factor~~ [benchmark] Legacy Factor Nov 6, 2018

eeckstein merged commit f64f02b into swiftlang:master Nov 6, 2018

This was referenced Nov 26, 2018

[WIP][benchmark] Flatten - Extended test family #20552

Closed

[benchmark] Janitor Duty, Legacy Factor: A-C #20861

Merged

This was referenced Dec 18, 2018

[benchmark] Janitor Duty, Legacy Factor: Return of the String #21413

Merged

[benchmark] Janitor Duty: Datalore Legacy #21516

Merged

palimondo mentioned this pull request Jan 11, 2019

[benchmark] Janitor Duty: Lexicon Legacy #21794

Merged

palimondo mentioned this pull request Jan 21, 2019

[benchmark] Janitor Duty: Thesaurus Heirloom #22026

Merged

palimondo mentioned this pull request Feb 1, 2019

[benchmark] Janitor Duty: Sisyphus Legacy #22296

Merged

palimondo mentioned this pull request Feb 11, 2019

[benchmark] Janitor Duty: Troublemaker Legacy #22508

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[benchmark] Legacy Factor #20212

[benchmark] Legacy Factor #20212

Uh oh!

palimondo commented Nov 1, 2018

Uh oh!

palimondo commented Nov 1, 2018

Uh oh!

eeckstein commented Nov 5, 2018

Uh oh!

swift-ci commented Nov 6, 2018

Uh oh!

palimondo commented Nov 6, 2018

Uh oh!

palimondo commented Nov 6, 2018 •

edited

Loading

Uh oh!

eeckstein commented Nov 6, 2018

Uh oh!

eeckstein commented Nov 6, 2018

Uh oh!

palimondo commented Nov 6, 2018

Uh oh!

eeckstein commented Nov 6, 2018

Uh oh!

Uh oh!

[benchmark] Legacy Factor #20212

[benchmark] Legacy Factor #20212

Uh oh!

Conversation

palimondo commented Nov 1, 2018

Uh oh!

palimondo commented Nov 1, 2018

Uh oh!

eeckstein commented Nov 5, 2018

Uh oh!

swift-ci commented Nov 6, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Uh oh!

palimondo commented Nov 6, 2018

Uh oh!

palimondo commented Nov 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeckstein commented Nov 6, 2018

Uh oh!

eeckstein commented Nov 6, 2018

Uh oh!

palimondo commented Nov 6, 2018

Uh oh!

eeckstein commented Nov 6, 2018

Uh oh!

Uh oh!

palimondo commented Nov 6, 2018 •

edited

Loading