-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Add faster 32-bit and 64-bit binade, nextUp, ulp #15021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci Please smoke benchmark |
guard _fastPath(isFinite) else { return .nan } | ||
if _fastPath(isNormal) { | ||
let bitPattern_ = bitPattern & ${Self}.infinity.bitPattern | ||
return ${Self}(bitPattern: bitPattern_) / 0x1p${SignificandBitCount} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would tend to write this as * 0x1p-${SignificandBitCount}
because I think of it that way, but there shouldn't be any perf difference, since llvm knows how to optimize a known power-of-two divide into a multiply. Probably worth checking that this optimization happens even at -Onone to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. For greatest certainty, I'll just write it out. It's just as clear to the reader in any case.
// On arm, skip positive subnormal values. | ||
if _slowPath(x.bitPattern & (-${Self}.infinity).bitPattern == 0) { | ||
return .leastNormalMagnitude | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is implemented for arm
worries me. I would do:
if (x == 0) {
return .leastNonzeroMagnitude
}
There's no formal logic in IEEE 754 for flushing subnormals, so there's no real spec to go off of, but I prefer to think of them as non-canonical encodings of zero. Implementing them as I describe here does that, and also has the happy side-effect of working correctly for free if someone makes an arm32 platform where subnormals are supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that interpretation. It does change existing behavior though, as negative subnormal values currently have nextUp evaluating to -0 on arm32. But if you're fine with changing the interpretation, I do like the new way better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that it's the only semi-coherent way to make sense of these cases.
let increment = Int${bits}(bitPattern: x.bitPattern) &>> ${bits - 1} | 1 | ||
let bitPattern_ = x.bitPattern &+ UInt${bits}(bitPattern: increment) | ||
#if arch(arm) | ||
// On arm, flush negative subnormal values to -0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you follow my suggestion above, you can eliminate this, since all subnormals will have already been handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subnormal result of incrementing -.leastNormalMagnitude
still needs to be flushed to -0, but that can be written differently and in a way that is future-proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should fall out for free. 1
will be subtracted from the bit pattern, which will produce a negative subnormal, which will be treated as zero by whatever consumes it (as a "non canonical encoding of zero"). Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's defensible since it's technically correct, but I think it'd be rather more elegant not to produce the negative subnormal in the first place; the end user may inspect its bit pattern and we might as well give a canonical encoding of negative zero because we're nice. The cost to doing so would be minimal and it can be future-proof too.
[Edit: ...and the symmetry of the conditional statements is so darn satisfying to look at.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
@@ -625,7 +625,20 @@ extension ${Self}: BinaryFloatingPoint { | |||
/// almost never is. | |||
@_inlineable // FIXME(sil-serialize-all) | |||
public var ulp: ${Self} { | |||
if !isFinite { return ${Self}.nan } | |||
%if bits == 32 or bits == 64: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use bits != 80
for all of these instead of == 32 or ==64
so that we get the fast implementation for hypothetical future Float16
/ Float128
automatically.
Build comment file:Optimized (O)Regression (2)
Improvement (58)
No Changes (321)
Unoptimized (Onone)Regression (5)
Improvement (3)
No Changes (373)
Hardware Overview
|
// On arm, flush positive subnormal values to 0. | ||
return 0 | ||
#else | ||
return .leastNonzeroMagnitude |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your points below, I'm going to delete the compilation condition and return .leastNormalMagnitude * 0x1p-${SignificandBitCount}
here.
@swift-ci Please smoke test |
There's a few comments that I'd like to tweak, but they're very minor and you've already kicked off testing, so I'll just do them in a separate commit after this lands. LGTM. |
The performance of
binade
,nextUp
, andulp
can be improved for concrete types, and potentially by quite a bit.Here, the bit pattern of 32-bit and 64-bit values is directly manipulated to produce the desired result. As
Float80
values do not have abitPattern
property (and do not use an implicit leading bit), similar manipulations require separate implementations not incorporated here; instead, the previous algorithms are preserved for that type.Resolves SR-6960.