Skip to content

[benchmark] Add new benchmark for floating-point conversion #33801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 5, 2020

Conversation

xwu
Copy link
Collaborator

@xwu xwu commented Sep 4, 2020

This PR adds a new benchmark for floating-point conversion.

Eventually, it will test more combinations, but for now, it looks to quantify the difference between concrete conversions from Double to Double and generic conversions from Double to Double in anticipation of upcoming PRs.

@xwu
Copy link
Collaborator Author

xwu commented Sep 4, 2020

@swift-ci Please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Sep 4, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
CharacterLiteralsLarge 97 108 +11.3% 0.90x (?)
ObjectiveCBridgeStubToNSStringRef 111 122 +9.9% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListFlatMap 6666 5925 -11.1% 1.13x (?)
 
Added MIN MAX MEAN MAX_RSS
ConvertFloatingPoint.ConcreteDoubleToDouble 23 24 23
ConvertFloatingPoint.GenericDoubleToDouble 3104 3415 3209

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubToNSStringRef 114 133 +16.7% 0.86x (?)
Breadcrumbs.UTF16ToIdxRange.longASCII 29 33 +13.8% 0.88x
ObjectiveCBridgeStubDateAccess 228 257 +12.7% 0.89x
 
Improvement OLD NEW DELTA RATIO
FlattenListLoop 2512 1641 -34.7% 1.53x (?)
FlattenListFlatMap 6683 5115 -23.5% 1.31x (?)
StringFromLongWholeSubstring 5 4 -20.0% 1.25x
 
Added MIN MAX MEAN MAX_RSS
ConvertFloatingPoint.ConcreteDoubleToDouble 23 24 23
ConvertFloatingPoint.GenericDoubleToDouble 3102 3211 3140

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
DictionaryBridgeToObjC_Access 943 1076 +14.1% 0.88x (?)
 
Improvement OLD NEW DELTA RATIO
String.data.Medium 122 104 -14.8% 1.17x (?)
NSStringConversion.MutableCopy.Rebridge 982 901 -8.2% 1.09x (?)
DictionaryKeysContainsCocoa 51 47 -7.8% 1.09x (?)
 
Added MIN MAX MEAN MAX_RSS
ConvertFloatingPoint.ConcreteDoubleToDouble 1357 1456 1390
ConvertFloatingPoint.GenericDoubleToDouble 2893 3013 2950

Code size: -swiftlibs

Benchmark Check Report
⚠️🔤 ConvertFloatingPoint.ConcreteDoubleToDouble name is composed of 6 words.
Split ConvertFloatingPoint.ConcreteDoubleToDouble name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️🔤 ConvertFloatingPoint.ConcreteDoubleToDouble name is 43 characters long.
Benchmark name should not be longer than 40 characters.
⚠️🔤 ConvertFloatingPoint.GenericDoubleToDouble name is composed of 6 words.
Split ConvertFloatingPoint.GenericDoubleToDouble name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️🔤 ConvertFloatingPoint.GenericDoubleToDouble name is 42 characters long.
Benchmark name should not be longer than 40 characters.
⚠️ ConvertFloatingPoint.GenericDoubleToDouble execution took at least 3104 μs.
Decrease the workload of ConvertFloatingPoint.GenericDoubleToDouble by a factor of 4 (10), to be less than 1000 μs.
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@xwu xwu marked this pull request as ready for review September 4, 2020 19:05
@xwu
Copy link
Collaborator Author

xwu commented Sep 4, 2020

@swift-ci smoke test

@xwu
Copy link
Collaborator Author

xwu commented Sep 4, 2020

@swift-ci smoke test macOS platform

@troughton
Copy link
Contributor

troughton commented Sep 4, 2020

Thanks for picking this up. Just as a matter of naming: I think it would also make sense for this same benchmark to eventually test generic conversions between integer types, so something like GenericNumericConversions may be a better fit.

@xwu
Copy link
Collaborator Author

xwu commented Sep 4, 2020

@troughton That's a good thought. For a thorough set of tests, though, we'll need to define mock types here, so I expect it'll be a very long file even just with floating-point types. I think I'd rather leave integer benchmarks to a separate file.

@xwu
Copy link
Collaborator Author

xwu commented Sep 4, 2020

@swift-ci smoke test linux platform

@xwu xwu merged commit 3f84fe4 into swiftlang:master Sep 5, 2020
@varungandhi-apple
Copy link
Contributor

This benchmark (or at least, parts of it) should not be built for armv7. https://ci.swift.org/job/swift-PR-osx/23595/consoleText

FAILED: benchmark/Onone-armv7-apple-ios8.0/FloatingPointConversion.o 

/Users/buildnode/jenkins/workspace/swift-PR-osx/branch-master/swift/benchmark/single-source/FloatingPointConversion.swift:64:17: error: 'Float80' is unavailable: Float80 is not available on target platform.
  init(_ value: Float80) { self.init(_Value(value)) }
                ^~~~~~~
Swift.Float80:2:23: note: 'Float80' has been explicitly marked unavailable here
@frozen public struct Float80 {
                      ^

@eeckstein
Copy link
Contributor

@xwu for optimizations where we expect the generated code to be trivial it is better to add a lit test than a micro benchmark.
In this case we expect the compiler to eliminate everything for those Float->Float conversions.
A micro-benchmark is not really the best option for validating this: if the code generation regresses it can still be missed if the change is not that big. Also, benchmarks are not mandatory to run in PR tests.

So, I looked at #33803 (nice work, BTW!) and found that the compiler still produces unnecessary code.
I fixed that deficiency in #33839.
I also added a lit test for testing the floating point conversion (test/SILOptimizer/floating_point_conversion.swift), which is a better test for your optimization than these benchmarks.

@xwu
Copy link
Collaborator Author

xwu commented Sep 7, 2020

@eeckstein That's fantastic; thanks for your follow-up PR in #33839!

I have another optimization and refactoring in #33826 that I hope and expect will pass this new lit test as well. It emits code that's 3x faster, comparable to your results in #33839, despite not having the benefit of your changes in that PR. I expected the improvement to be possible given the disparity between concrete performance and generic performance after landing #33803--but now I'm curious as to how the improvement was possible without tweaking the compiler!

This particular PR here was just a start to the benchmarks; I created mock wrapper types in #33821 to test further optimizations with custom floating-point types, which I did not expect would result in code that could be optimized to the point of being trivial. It's looking like all of it might optimize pretty well, though, so I do wonder as to the future of these benchmarks.

@eeckstein
Copy link
Contributor

@xwu Sounds good. Feel free to add more tests in floating_point_conversion.swift.
The ConvertFloatingPoint.MockFloat64ToDouble benchmark makes sense.

I still recommend to remove the ConvertFloatingPoint.GenericDoubleToDouble/ConcreteDoubleToDouble benchmarks. They don't add much value, because the lit test already checks that the optimization is optimal.
In general, we should try to keep our set of benchmarks as small as possible to reduce the benchmark test time.
Alternatively you can add the ".skip" tag, so that the benchmarks are still available, but are excluded from the standard benchmark run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants