Skip to content

[benchmark] Fix spurious benchmark regressions #39340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 17, 2021

Conversation

lorentey
Copy link
Member

The benchmark driver accidentally fails to set the SWIFT_DETERMINISTIC_HASHING environment variable for its subprocesses, which means that hashed collections have far more performance variance than expected.

I think the recent plague of flaky benchmark results (especially DictionaryOfAnyHashableStrings_insert) has been due to this issue.

Additionally, update CMake to postprocess the benchmark executables with swift-darwin-postprocess.py to work around a dyld issue on macOS Monterey that makes the benchmark executable fail with spurious unknown selector exceptions.

@lorentey
Copy link
Member Author

@swift-ci benchmark

@lorentey
Copy link
Member Author

@swift-ci smoke test

@lorentey lorentey changed the title [benchmark] Fix benchmark environment [benchmark] Fix spurious benchmark regressions Sep 17, 2021
@swift-ci
Copy link
Contributor

Performance (x86_64): -O

Improvement OLD NEW DELTA RATIO
FlattenListLoop 1581 1386 -12.3% 1.14x (?)
DataCreateMedium 1500 1400 -6.7% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
DriverUtils.o 117639 122081 +3.8% 0.96x

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 2461 3944 +60.3% 0.62x (?)
FlattenListLoop 935 1386 +48.2% 0.67x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
DriverUtils.o 111849 115911 +3.6% 0.96x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubToNSStringRef 161 178 +10.6% 0.90x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@lorentey
Copy link
Member Author

lorentey commented Sep 17, 2021

Ah, it looks like in smoke tests Linux bots are just running the benchmark executable directly.

21:31:01 + /home/buildnode/jenkins/workspace/swift-PR-Linux-smoke-test/branch-main/buildbot_linux/benchmarks-linux-x86_64/bin/Benchmark_Onone --num-iters=1 XorLoop
21:31:01 DriverUtils/DriverUtils.swift:706: Fatal error: Benchmark runs require deterministic hashing to be enabled.
21:31:01 
21:31:01 This prevents spurious regressions in hashed collection performance.
21:31:01 You can do this by setting the SWIFT_DETERMINISTIC_HASHING environment
21:31:01 variable to 1.
21:31:01 
21:31:01 If you know what you're doing, you can disable this check by passing
21:31:01 the option '--allow-nondeterministic-hashing to the benchmarking executable.
21:31:01 Current stack trace:
21:31:01 0    libswiftCore.so                    0x00007f69fee47dc0 swift_reportError + 50
21:31:01 1    libswiftCore.so                    0x00007f69feec2310 _swift_stdlib_reportFatalErrorInFile + 112
21:31:01 2    libswiftCore.so                    0x00007f69febcc802 <unavailable> + 1419266
21:31:01 3    libswiftCore.so                    0x00007f69febcc623 <unavailable> + 1418787
21:31:01 4    libswiftCore.so                    0x00007f69febcb170 _assertionFailure(_:_:file:line:flags:) + 414
21:31:01 5    Benchmark_Onone                    0x0000557c6a4c484f <unavailable> + 1194063
21:31:01 6    Benchmark_Onone                    0x0000557c6a6c7007 <unavailable> + 3301383
21:31:01 7    libc.so.6                          0x00007f69fd08b750 __libc_start_main + 240
21:31:01 8    Benchmark_Onone                    0x0000557c6a3d7029 <unavailable> + 221225
21:31:01 ('-- Warning: {}', 'Host toolchain could not locate a compiler to build swift-driver. (Try `--skip-early-swift-driver`)')

@lorentey
Copy link
Member Author

@swift-ci benchmark

@lorentey
Copy link
Member Author

@swift-ci smoke test

@swift-ci
Copy link
Contributor

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 3910 4427 +13.2% 0.88x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListLoop 1565 1387 -11.4% 1.13x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
DriverUtils.o 117639 122081 +3.8% 0.96x

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
FlattenListLoop 936 1386 +48.1% 0.68x (?)
FlattenListFlatMap 2592 3076 +18.7% 0.84x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
DriverUtils.o 111849 115911 +3.6% 0.96x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubToNSStringRef 160 178 +11.2% 0.90x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

Copy link
Contributor

@glessard glessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find.

@lorentey lorentey merged commit 0b656b6 into swiftlang:main Sep 17, 2021
@lorentey lorentey deleted the fix-benchmarks branch September 17, 2021 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants