Skip to content

[benchmark] Legacy Factor #20212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 6, 2018

Conversation

palimondo
Copy link
Contributor

This PR adds optional legacyFactor to the BenchmarkInfo, which allows for linear modification of constants that unnecessarily inflate the base workload of benchmarks, while maintaining the continuity of log-term benchmark tracking.

For example, if a benchmark uses for _ in N*10_000 in its run function, we could lower this to for _ in N*1_000 and adding a legacyFactor: 10 to its BenchmarkInfo.

Note that this doesn’t affect the real measurements gathered from the --verbose output. The BenchmarkDoctor has been slightly adjusted to work with these real samples, therefore Benchmark_Driver check will not flag these benchmarks for slow run time reported in the summary, if their real runtimes fall into the recommended range.

This PR also modifies benchmarks with long runtimes to use the legacyFactor and lowers their base workloads, so that they run in around the 1ms mark.

This adds optional `legacyFactor` to the `BenchmarkInfo`, which allows for linear modification of constants that unnecesarily inflate the base workload of benchmarks, while maintaining the continuity of log-term benchmark tracking.

For example, if a benchmark uses `for _ in N*10_000` in its run function, we could lower this to `for _ in N*1_000` and adding a `legacyFactor: 10` to its `BenchmarkInfo`.

Note that this doesn’t affect the real measurements gathered from the `--verbose` output. The `BenchmarkDoctor` has been slightly adjusted to work with these real samples, therefore `Benchmark_Driver check` will not flag these benchmarks for slow run time reported in the summary, if their real runtimes fall into the recommended range.
Lowered the base workload by a factor of 500
Lowered the base workload by a factor of 10
Lowered base workload by a factor of 10.
@palimondo
Copy link
Contributor Author

@eeckstein Please run benchmark here. A have a pending commit for legacy factor of ArrayAppend*, but it depends in removing the setup overhead from PR #20048.

@eeckstein
Copy link
Contributor

@swift-ci benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Nov 6, 2018

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
Sim2DArray 311 399 +28.3% 0.78x
RangeIterationSigned 171 200 +17.0% 0.86x
RandomDoubleLCG 912 1057 +15.9% 0.86x
MapReduceLazyCollectionShort 31 34 +9.7% 0.91x
Improvement
BinaryFloatingPointPropertiesBinade 31 25 -19.4% 1.24x
DataCount 37 34 -8.1% 1.09x

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
main.o 49053 53661 +9.4% 0.91x
DoubleWidthDivision.o 528 544 +3.0% 0.97x
ProtocolDispatch.o 765 781 +2.1% 0.98x
StringWalk.o 42055 42855 +1.9% 0.98x
CharacterLiteralsLarge.o 867 883 +1.8% 0.98x
CaptureProp.o 909 925 +1.8% 0.98x
NSStringConversion.o 953 969 +1.7% 0.98x
AnyHashableWithAClass.o 2917 2965 +1.6% 0.98x
ArrayLiteral.o 3037 3085 +1.6% 0.98x
AngryPhonebook.o 1019 1035 +1.6% 0.98x
SuperChars.o 1108 1124 +1.4% 0.99x
COWArrayGuaranteedParameterOverhead.o 1159 1175 +1.4% 0.99x
TestsUtils.o 21865 22147 +1.3% 0.99x
RandomValues.o 3760 3808 +1.3% 0.99x
Sim2DArray.o 1262 1278 +1.3% 0.99x
NSError.o 1263 1279 +1.3% 0.99x
CharacterLiteralsSmall.o 1309 1325 +1.2% 0.99x
DictionaryLiteral.o 1344 1360 +1.2% 0.99x
RecursiveOwnedParameter.o 1366 1382 +1.2% 0.99x
ExistentialPerformance.o 68717 69517 +1.2% 0.99x
FloatingPointPrinting.o 5831 5895 +1.1% 0.99x
Chars.o 1459 1475 +1.1% 0.99x
MonteCarloPi.o 1529 1545 +1.0% 0.99x
Fibonacci.o 1538 1554 +1.0% 0.99x
ChainedFilterMap.o 3085 3117 +1.0% 0.99x
SevenBoom.o 1568 1584 +1.0% 0.99x
RangeIteration.o 1578 1594 +1.0% 0.99x
ByteSwap.o 1578 1594 +1.0% 0.99x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
MapReduceLazyCollectionShort 41 85 +107.3% 0.48x
FlattenListLoop 4051 4427 +9.3% 0.92x (?)
SuffixCountableRangeLazy 11 12 +9.1% 0.92x
Array2D 6611 7207 +9.0% 0.92x
ReversedDictionary2 315 342 +8.6% 0.92x
Improvement
RangeIterationSigned 200 171 -14.5% 1.17x
RandomDoubleLCG 967 884 -8.6% 1.09x
SuffixCountableRange 12 11 -8.3% 1.09x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
main.o 45873 50385 +9.8% 0.91x
StringWalk.o 36440 37400 +2.6% 0.97x
DoubleWidthDivision.o 609 625 +2.6% 0.97x
COWArrayGuaranteedParameterOverhead.o 1355 1387 +2.4% 0.98x
ExistentialPerformance.o 60861 62285 +2.3% 0.98x
ArrayLiteral.o 2872 2936 +2.2% 0.98x
FloatingPointPrinting.o 5072 5184 +2.2% 0.98x
InsertCharacter.o 4719 4815 +2.0% 0.98x
ProtocolDispatch.o 846 862 +1.9% 0.98x
StringBuilder.o 7030 7158 +1.8% 0.98x
CharacterLiteralsLarge.o 932 948 +1.7% 0.98x
DataBenchmarks.o 21429 21781 +1.6% 0.98x
TestsUtils.o 18345 18627 +1.5% 0.98x
OpaqueConsumingUsers.o 2089 2121 +1.5% 0.98x
NSStringConversion.o 1050 1066 +1.5% 0.98x
ObjectiveCBridgingStubs.o 8797 8925 +1.5% 0.99x
ObjectiveCNoBridgingStubs.o 7807 7919 +1.4% 0.99x
RandomValues.o 3433 3481 +1.4% 0.99x
Exclusivity.o 3468 3516 +1.4% 0.99x
RecursiveOwnedParameter.o 1281 1297 +1.2% 0.99x
ArrayOfGenericPOD.o 2778 2810 +1.2% 0.99x
CharacterLiteralsSmall.o 1412 1428 +1.1% 0.99x
Substring.o 16962 17154 +1.1% 0.99x
DictionaryLiteral.o 1493 1509 +1.1% 0.99x
AnyHashableWithAClass.o 3149 3181 +1.0% 0.99x
Improvement
ArraySetElement.o 1183 1167 -1.4% 1.01x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@palimondo
Copy link
Contributor Author

@eeckstein It looks like the demonstration of legacyFactor has been successful: the modified benchmarks reported no changes.

I'm a bit worried by the amount of other noise reported here. Any thoughts what's causing it? Why are there so many code size changes? Somethings smells 🐟🐠🐡 here...

@palimondo palimondo changed the title WIP [benchmark] Legacy Factor [benchmark] Legacy Factor Nov 6, 2018
@palimondo
Copy link
Contributor Author

palimondo commented Nov 6, 2018

@eeckstein Let's land this as is and next I'll go through the whole Swift Benchmark Suite in batches and apply the legacyFactor where appropriate. This will give us much more robust and more precise measurements when using Benchmark_Driver. As a side effect, it will also make running smoke bench faster.

@eeckstein
Copy link
Contributor

The code size changes are clear: you added a field to BenchmarkInfo. That adds to the initialization code in basically every benchmark source file.
I'm also not surprised by the reported performance differences. That can be code alignment issues again.

@eeckstein
Copy link
Contributor

@swift-ci smoke test

@palimondo
Copy link
Contributor Author

Is there something we can do to mitigate the code alignment vs. performance issue? (I have literally no idea about this…)

@eeckstein eeckstein merged commit f64f02b into swiftlang:master Nov 6, 2018
@eeckstein
Copy link
Contributor

We already page-align all the benchmark modules. But that does not help if there are code changes inside the benchmark files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants