Skip to content

[DO NOT MERGE][benchmark] Modernize benchmark definitions #39331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Sep 15, 2021

This is an experiment in writing benchmark definitions in a way that's friendlier to actual Swift programmers.

  • Replace struct BenchmarkInfo with a more helpful protocol Benchmark
  • Stop Capitalizing Function And Parameter Names, That's Not How Swift Code Looks

The potential benefit is that it becomes easier for contributors to develop new benchmarks -- no more weird language dialect.

The risk as I see it is either

  • perturbing existing benchmarks (if they get adapted to use protocol Benchmark), or
  • diverging benchmarking definitions into an old and a new syntax, adding to confusion (if existing benchmarks are kept as is).

The basic idea with protocol Benchmark is that instead of creating a bunch of top-level functions and static lets / vars, then registering things using:

  BenchmarkInfo(
    name: "AngryPhonebook.ASCII2.Small",
    runFunction: { angryPhonebook($0*10, ascii) },
    tags: t,
    setUpFunction: { blackHole(ascii) }),
  BenchmarkInfo(
    name: "AngryPhonebook.Strasse.Small",
    runFunction: { angryPhonebook($0, strasse) },
    tags: t,
    setUpFunction: { blackHole(strasse) }),

we'd define a custom struct:

struct AngryPhonebook: Benchmark {
  let name: String
  let input: [String]
  let factor: Int

  init(name: String, input: [String], factor: Int = 1) {
    self.name = name
    self.input = input
    self.factor = factor
  }

  var tags: Tags { [.validation, .api, .String] }

  func run(iterations: Int) {
    ...
  }
}

and simply register instances of it as individual benchmarks:

  AngryPhonebook(
    name: "AngryPhonebook.ASCII2.Small",
    input: ascii,
    factor: 10),
  AngryPhonebook(
    name: "AngryPhonebook.Strasse.Small",
    input: strasse),
  AngryPhonebook(
    name: "AngryPhonebook.Armenian.Small",
    input: armenian),
  AngryPhonebook(
    name: "AngryPhonebook.Cyrillic.Small",
    input: cyrillic),

This gets rid of the closure stuff, and encourages encapsulating the preprocessed input data in the struct itself, preventing the preprocessing from leaking into the measurement. Currently we need to remember to do this by manually triggering swift_onces in the setUpFunction -- this is much too tricky.

@lorentey
Copy link
Member Author

Cc @glessard @eeckstein @gottesmm What do you think?

@lorentey
Copy link
Member Author

I'm particularly worried that this would disturb -Onone performance.

@swift-ci benchmark

@glessard
Copy link
Contributor

I like it. I look forward to the first benchmark output.

@swift-ci
Copy link
Contributor

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
Ackermann 299 992 +231.8% 0.30x
DictionaryOfAnyHashableStrings_insert 2940 5572 +89.5% 0.53x (?)
DictionaryKeysContainsNative 21 26 +23.8% 0.81x (?)
DictionaryKeysContainsCocoa 27 31 +14.8% 0.87x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListFlatMap 6871 4745 -30.9% 1.45x (?)
String.data.Medium 113 98 -13.3% 1.15x (?)
String.data.LargeUnicode 118 108 -8.5% 1.09x (?)
RandomShuffleLCG2 480 448 -6.7% 1.07x (?)
OpenClose 45 42 -6.7% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
Ackermann.o 1828 2603 +42.4% 0.70x
AngryPhonebook.o 8951 11941 +33.4% 0.75x
main.o 52761 60416 +14.5% 0.87x
DriverUtils.o 117639 120287 +2.3% 0.98x
 
Improvement OLD NEW DELTA RATIO
TestsUtils.o 25435 24789 -2.5% 1.03x

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
Ackermann 251 1166 +364.5% 0.22x
FlattenListLoop 1629 2552 +56.7% 0.64x (?)
FlattenListFlatMap 4355 6135 +40.9% 0.71x (?)
PointerArithmetics 31400 37100 +18.2% 0.85x
ReversedDictionary2 315 371 +17.8% 0.85x
ReversedArray2 200 228 +14.0% 0.88x
ConvertFloatingPoint.MockFloat64Exactly 8 9 +12.5% 0.89x
DataCountSmall 28 31 +10.7% 0.90x (?)
BinaryFloatingPointPropertiesBinade 28 31 +10.7% 0.90x (?)
DataCreateEmpty 200 220 +10.0% 0.91x
CreateObjects 20 22 +10.0% 0.91x
DataSubscriptSmall 31 34 +9.7% 0.91x
DataCountMedium 31 34 +9.7% 0.91x (?)
FloatingPointPrinting_Float_description_uniform 4800 5200 +8.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ProtocolDispatch 371 342 -7.8% 1.08x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
Ackermann.o 1662 2622 +57.8% 0.63x
AngryPhonebook.o 8227 11696 +42.2% 0.70x
DriverUtils.o 111849 115377 +3.2% 0.97x
 
Improvement OLD NEW DELTA RATIO
main.o 48064 10504 -78.1% 4.58x
TestsUtils.o 19338 18500 -4.3% 1.05x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
Ackermann 2479 3570 +44.0% 0.69x
LineSink.scalars.alpha 321 348 +8.4% 0.92x (?)
ObjectiveCBridgeStubToNSDateRef 4700 5080 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
StringToDataLargeUnicode 7200 6100 -15.3% 1.18x (?)
StringToDataMedium 6700 6100 -9.0% 1.10x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@lorentey
Copy link
Member Author

lorentey commented Sep 16, 2021

Ah, too bad. AngryPhonebook results weren't perturbed at all, which is encouraging, but Ackermann was, which is bad. On top of it all, there are huge code size changes that likely prevent us from adapting existing benchmarks.

@lorentey
Copy link
Member Author

lorentey commented Sep 16, 2021

(None of this would matter if we ran the same benchmark code across multiple compiler versions, but that's not how we do things yet.)

@lorentey
Copy link
Member Author

Applying Swift naming conventions seems like a good idea whether or not we replace/augment BenchmarkInfo.

In #39336, I'm updating the entire benchmark suite to read a little bit more like regular Swift code.

@eeckstein
Copy link
Contributor

@lorentey That looks great!
I'm in favor of converting all benchmarks to this new scheme. So that we can get rid of the old BenchmarkInfo.
The one-time code size changes don't matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants