[benchmark] Fix spurious benchmark regressions #39340

lorentey · 2021-09-17T00:04:43Z

The benchmark driver accidentally fails to set the SWIFT_DETERMINISTIC_HASHING environment variable for its subprocesses, which means that hashed collections have far more performance variance than expected.

I think the recent plague of flaky benchmark results (especially DictionaryOfAnyHashableStrings_insert) has been due to this issue.

Additionally, update CMake to postprocess the benchmark executables with swift-darwin-postprocess.py to work around a dyld issue on macOS Monterey that makes the benchmark executable fail with spurious unknown selector exceptions.

…ue on macOS 12

lorentey · 2021-09-17T00:05:03Z

@swift-ci benchmark

lorentey · 2021-09-17T00:05:57Z

@swift-ci smoke test

swift-ci · 2021-09-17T01:15:20Z

Performance (x86_64): -O

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	1581	1386	-12.3%	1.14x (?)
DataCreateMedium	1500	1400	-6.7%	1.07x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
DriverUtils.o	117639	122081	+3.8%	0.96x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	2461	3944	+60.3%	0.62x (?)
FlattenListLoop	935	1386	+48.2%	0.67x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
DriverUtils.o	111849	115911	+3.6%	0.96x

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubToNSStringRef	161	178	+10.6%	0.90x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

lorentey · 2021-09-17T02:40:54Z

Ah, it looks like in smoke tests Linux bots are just running the benchmark executable directly.

21:31:01 + /home/buildnode/jenkins/workspace/swift-PR-Linux-smoke-test/branch-main/buildbot_linux/benchmarks-linux-x86_64/bin/Benchmark_Onone --num-iters=1 XorLoop
21:31:01 DriverUtils/DriverUtils.swift:706: Fatal error: Benchmark runs require deterministic hashing to be enabled.
21:31:01 
21:31:01 This prevents spurious regressions in hashed collection performance.
21:31:01 You can do this by setting the SWIFT_DETERMINISTIC_HASHING environment
21:31:01 variable to 1.
21:31:01 
21:31:01 If you know what you're doing, you can disable this check by passing
21:31:01 the option '--allow-nondeterministic-hashing to the benchmarking executable.
21:31:01 Current stack trace:
21:31:01 0    libswiftCore.so                    0x00007f69fee47dc0 swift_reportError + 50
21:31:01 1    libswiftCore.so                    0x00007f69feec2310 _swift_stdlib_reportFatalErrorInFile + 112
21:31:01 2    libswiftCore.so                    0x00007f69febcc802 <unavailable> + 1419266
21:31:01 3    libswiftCore.so                    0x00007f69febcc623 <unavailable> + 1418787
21:31:01 4    libswiftCore.so                    0x00007f69febcb170 _assertionFailure(_:_:file:line:flags:) + 414
21:31:01 5    Benchmark_Onone                    0x0000557c6a4c484f <unavailable> + 1194063
21:31:01 6    Benchmark_Onone                    0x0000557c6a6c7007 <unavailable> + 3301383
21:31:01 7    libc.so.6                          0x00007f69fd08b750 __libc_start_main + 240
21:31:01 8    Benchmark_Onone                    0x0000557c6a3d7029 <unavailable> + 221225
21:31:01 ('-- Warning: {}', 'Host toolchain could not locate a compiler to build swift-driver. (Try `--skip-early-swift-driver`)')

…in regular smoke tests

lorentey · 2021-09-17T03:24:56Z

@swift-ci benchmark

lorentey · 2021-09-17T03:25:03Z

@swift-ci smoke test

swift-ci · 2021-09-17T03:56:00Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	3910	4427	+13.2%	0.88x (?)

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	1565	1387	-11.4%	1.13x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
DriverUtils.o	117639	122081	+3.8%	0.96x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	936	1386	+48.1%	0.68x (?)
FlattenListFlatMap	2592	3076	+18.7%	0.84x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
DriverUtils.o	111849	115911	+3.6%	0.96x

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubToNSStringRef	160	178	+11.2%	0.90x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

glessard

Nice find.

lorentey added 3 commits September 16, 2021 16:57

[benchmark] Trap if deterministic hashing isn't enabled

6cf798c

[benchmark] Benchmark_Driver: Correctly set SWIFT_DETERMINISTIC_HASHING

2fbf391

[benchmark] Use swift-darwin-postprocess.py to work around dyld 4 iss…

ec0fbc0

…ue on macOS 12

lorentey requested review from glessard and eeckstein September 17, 2021 00:04

lorentey changed the title ~~[benchmark] Fix benchmark environment~~ [benchmark] Fix spurious benchmark regressions Sep 17, 2021

lorentey added 2 commits September 16, 2021 20:23

[benchmark] Set SWIFT_DETERMINISTIC_HASHING while running benchmarks …

e73ef1f

…in regular smoke tests

[benchmark] Document deterministic hashing requirement

386ae58

glessard approved these changes Sep 17, 2021

View reviewed changes

lorentey merged commit 0b656b6 into swiftlang:main Sep 17, 2021

lorentey deleted the fix-benchmarks branch September 17, 2021 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[benchmark] Fix spurious benchmark regressions #39340

[benchmark] Fix spurious benchmark regressions #39340

Uh oh!

lorentey commented Sep 17, 2021

Uh oh!

lorentey commented Sep 17, 2021

Uh oh!

lorentey commented Sep 17, 2021

Uh oh!

swift-ci commented Sep 17, 2021

Uh oh!

lorentey commented Sep 17, 2021 •

edited

Loading

Uh oh!

lorentey commented Sep 17, 2021

Uh oh!

lorentey commented Sep 17, 2021

Uh oh!

swift-ci commented Sep 17, 2021

Uh oh!

glessard left a comment

Uh oh!

Uh oh!

[benchmark] Fix spurious benchmark regressions #39340

[benchmark] Fix spurious benchmark regressions #39340

Uh oh!

Conversation

lorentey commented Sep 17, 2021

Uh oh!

lorentey commented Sep 17, 2021

Uh oh!

lorentey commented Sep 17, 2021

Uh oh!

swift-ci commented Sep 17, 2021

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

lorentey commented Sep 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentey commented Sep 17, 2021

Uh oh!

lorentey commented Sep 17, 2021

Uh oh!

swift-ci commented Sep 17, 2021

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

glessard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lorentey commented Sep 17, 2021 •

edited

Loading