-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Add another fast path for generic floating-point conversion #33826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @troughton |
@swift-ci smoke test |
@swift-ci benchmark |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Wow. I did not expect that. Wow. |
As written, the approach is sound. The main question I have is whether it makes sense to simply use an Also, would that then allow us to get rid of the switch added in the previous change? |
I think the I have already refactored to merge both switches into one that matches on encoding parameters only, which actually might be faster (it appeared, with the previous change, that there was some difference between concrete and generic I've changed the fast path condition from I'll run the benchmarks again to see where this all shakes out. |
} | ||
} | ||
self = Self._convert(from: value).value | ||
#endif | ||
case (8, 7): |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
e32aae2
to
f560ecf
Compare
@swift-ci test |
@swift-ci benchmark |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
// input.swift
func convert<
T: BinaryFloatingPoint, U: BinaryFloatingPoint
>(_ value: T, to: U.Type) -> U {
U(value)
}
func testDoubleToDouble(_ x: Double) -> Double {
return convert(x, to: Double.self)
}
func testDoubleToFloat(_ x: Double) -> Float {
return convert(x, to: Float.self)
} The SIL generated for conversion between known stdlib types is perfect:
...for |
Can we factor the !isNaN out of the switch? |
@stephentyrone Yeah, I'm not totally satisfied with the control flow here, so I'm going to try something. I don't think there's any harm--even theoretical--in preserving the encoding of any NaN values as-is between two types with the same exponent and significand bit width; there is the It makes the whole implementation a lot more concise. I'll have to see if it affects code generation though. |
f560ecf
to
643834e
Compare
@swift-ci test |
@swift-ci benchmark |
Code generation for standard library types is good. @swift-ci is being a little recalcitrant today. |
@swift-ci benchmark |
@swift-ci test |
@xwu CI outage. Should be restored now. |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci test Windows platform |
@stephentyrone Thoughts on the latest (and simplest) iteration? |
Seems reasonable to me. I would like to have a cleaner solution, but this is an acceptable stopgap. We have to do due diligence on the regressions still. I'm fairly certain that they're just noise, but it would be good to disassemble the worst few and make sure there isn't a real problem somehow caused by this change (I can help with this). |
Sure, your help with that would be great. The String/Substring tests oscillate between 22 and 29 μs (see earlier iterations of the benchmarks), but it would be worth confirming that we're not missing anything. |
@stephentyrone Disassembly of |
@swift-ci test Windows platform |
Based on my reading of IEEE 754, for any given w (exponent bit width) and p (precision, or significand bit width + 1), there can be one and only one encoding of any finite value [edit: or ±∞] in a binary interchange format with those parameters.
Therefore, we can use
init(sign:exponentBitPattern:significandBitPattern:)
to convert from any binary floating-point type with w and p parameters that match a known standard library type's parameters to the latter type. From there, we can use the existing protocol requirements to convert toSelf
.In a perfect world, since everything is inlinable, the compiler would be able to see that essentially the whole conversion involves bit shifting values in opposite directions and converting between unsigned integer types that are type aliases of each other, turning everything into no-ops. But I would imagine that there is some overhead here that can't quite be eliminated.
There is a companion PR (#33821) with benchmarks which [edit: has been merged] to test how well this fast path does the job. [edit: Either the benchmarks are too-good-to-be-true, or the compiler is really doing a fantastic job of inlining.]