Skip to content

[stdlib] Fix exact floating point to integer cast overflow detection #16960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 16, 2018

Conversation

mundaym
Copy link
Contributor

@mundaym mundaym commented Jun 2, 2018

The round trip approach for detecting exact conversions has two
problems. Firstly, rounding during the conversions might mean that
a round trip succeeds even if the floating point value and integer
value are different. For example Int32.max-1 cannot be represented
by a Float but the conversion succeeds because the rounding happens
to be symmetric. Secondly, LLVM specifies that the result of a
sitofp or uitofp instruction is undefined if it overflows. This
means that the result is platform-dependent and therefore whether or
not the round trip succeeds is also platform dependent.

This commit uses integer arithmetic to calculate whether the
conversion is exact or not in a platform independent manner.

This should resolve SR-6322.

This commit should probably add some additional tests. Any ideas
where best to add them?

@AnnaZaks AnnaZaks requested a review from natecook1000 June 27, 2018 05:03
@natecook1000
Copy link
Member

@mundaym Thanks for looking at this — do you have tests that show failures under the existing code? We have tests for these kinds of conversions in a few different places, plus parallel ones in swift-corelibs-foundation:

Copy link
Contributor

@stephentyrone stephentyrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a step in the right direction, but the detail about bit-widths should be addressed, and we should have some tests added to exercise these cases.

% if srcBits >= SignificandBitCount:
guard let roundTrip = ${That}(exactly: self),
roundTrip == value else {
% if srcBits > SignificandBitCount:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check isn't quite as fine-grained as it should be, though it doesn't matter for Float or Double and any of the usual integer sizes. Binary floating-point can represent any integer with <= SignificandBitCount + 1 bits (+1 because of the implicit bit), and srcBits doesn't account for signed/unsigned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Should I also make the code resilient to short exponents that cannot encode the full range of the integer type?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's useful for reference, I implemented some of the routines for generic conversion in the other direction (integer to floating-point) as well as floating-point to floating-point:

#13782

// The magnitude of the integer must be representable by the significand.
// Omit trailing zeros as they will be encoded by the exponent.
let absv = value.magnitude
let mask: ${That}.Magnitude = ~((1 << (${SignificandBitCount}+1)) - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly concerned about the point I called out above in light of this; specifically, I'm worried about the case where SignificandBitCount+1 == srcBits.

@mundaym mundaym force-pushed the s390x-overflow-fix branch from c6d1492 to 53a97e7 Compare July 12, 2018 15:39
@mundaym
Copy link
Contributor Author

mundaym commented Jul 12, 2018

Thanks for the comments. After reviewing this PR again I don't think it is related to SR-6322 like I originally thought. In fact, the example given in that issue produces the correct result on Ubuntu 16.04 on IBM Z and on a Skylake macOS machine, so it is possible that the issue as stated is already fixed in master (I haven't got an Ubuntu x86 machine set up to try it there yet).

Given that this PR just fixes test failures on IBM Z I don't think it needs any extra tests anymore (a test would need to check that we weren't relying on undefined behavior and I'm not sure if that's possible).

The float to int conversion change is enough to fix the issues we are seeing, but I've also included the int to float code anyway since it is slightly more robust than the original and no longer requires a round trip. I can drop those changes from this PR if you'd prefer though.

@mundaym mundaym changed the title [stdlib] Fix exact integer to floating point conversion overflow dete… [stdlib] Fix exact floating point to integer cast overflow detection Jul 12, 2018
Copy link
Collaborator

@xwu xwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent effort on this. It's not trivial to actually test for some of these issues because there aren't concrete builtin types that exhibit the necessary properties, but I think the algorithm you've written is incorrect in several cases. Please take a look and see what you think.

self._value = Builtin.fpto${u}i_FPIEEE${FloatBits}_${BuiltinName}(source._value)
if ${FloatType}(self) != source {
if _slowPath(source <= ${str(lower)}.0 || source >= ${str(upper)}.0) {
// out of bounds (including infinities)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please capitalize and punctuate all comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@@ -3296,10 +3296,15 @@ public struct ${Self}
/// - Parameter source: A floating-point value to convert to an integer.
@_transparent
public init?(exactly source: ${FloatType}) {
self._value = Builtin.fpto${u}i_FPIEEE${FloatBits}_${BuiltinName}(source._value)
if ${FloatType}(self) != source {
if _slowPath(source <= ${str(lower)}.0 || source >= ${str(upper)}.0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why is this <= and >= here, whereas it's < and > respectively in the preceding method? (In other words, why can't a value equal to ${str(lower)}.0 be converted exactly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bounds are exclusive so the bounds themselves are not valid integers. For example, the lower bound of an unsigned integer is -1. The code in the preceding method is performing the opposite check so the comparisons are inverted (i.e. > instead of <= and < instead of >=).

% # representable in floating point (assuming a large enough exponent)
% # because only one bit is set. All other possible magnitudes can be
% # represented using one bit less than the original source type.
% dstBits += 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this to clarify the logic here? It's not dstBits that gains one if the source type is signed; rather, the maximum representable positive source value uses one fewer bit if the source is signed. Also, the implicit leading one can remain implicit if we continue to use SignificandBitCount explicitly:

%  srcBits = self_ty.bits # No change.
%  srcBitsExcludingSign = srcBits - 1 if self_ty.is_signed else srcBits
%  # No need to define `dstBits`.

/* ... */

We can also tighten up the logic so as to check, rather than assume, a sufficient number of exponent bits:

%    if srcBitsExcludingSign < SignificandBitCount and srcBits < 2 ** (ExponentBitCount - 1):
  @available(*, message: "Converting ${That} to ${Self} will always succeed.")
/* ... */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the implicit bit in the count. It makes the mental juggling easier in my opinion. You're right that it shouldn't really be called dstBits though. I'll try renaming it to something like maxBitLength that makes the intent a bit clearer.

I'll add the exponent size check to the conditions.

@@ -1599,17 +1608,26 @@ extension ${Self} {
/// can't be represented exactly, the result is `nil`.
///
/// - Parameter value: The integer to represent as a floating-point value.
% if srcBits < SignificandBitCount:
% if srcBits <= dstBits:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't seem right; at least, it's not behavior preserving even for unsigned values. srcBits <= dstBits is equivalent to srcBits <= SignificandBitCount + 1, which is equivalent to srcBits - 1 <= SignificandBitCount, which isn't the same as srcBits < SignificandBitCount.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is deliberate. The old code was doing unnecessary checks. By making this condition correct we can simplify the exactness checking code (by avoiding the possible overflow).

For example, the maximum value that can be represented with an unsigned 24-bit integer (or a signed 25-bit integer) is 16777215 (0xffffff). This value can be encoded exactly by a Float32 which has a 23-bit significand (24-bit including the implicit bit). Therefore we can represent any possible value of an unsigned integer exactly if the length of the integer is less than or equal to the number of significand bits plus one. Assuming the exponent is large enough of course.

// The magnitude of the integer must be representable by the significand.
// Omit trailing zeros that can be encoded by the exponent.
let absv = value.magnitude
let mask: ${That}.Magnitude = ~((1 << (${SignificandBitCount} + 1)) - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned as to what happens here if srcBits == SignificandBitCount (assuming you meant to preserve the original condition srcBits >= SignificandBitCount modulo refactoring). It appears this calculation would not be correct because it'd overflow. Rather, I think you'd want to construct the mask by right-shifting ${That}.Magnitude.max the appropriate number of bits. (Note that we now have the guarantee that Magnitude is unsigned as of a recent commit.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check above excludes the srcBits == SignificandBitCount case, so overflow can't occur here.

% if srcBits > dstBits:
// The magnitude of the integer must be representable by the significand.
// Omit trailing zeros that can be encoded by the exponent.
let absv = value.magnitude
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It doesn't really matter for local variables, but typical style is to avoid uncommon abbreviations and to use camelCase. Maybe absoluteValue or magnitude_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

return nil
}
// Check that the required exponent is in range.
let maxexp = (1 << (${ExponentBitCount} - 1)) - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if we could tighten the GYB condition not to assume that the maximum exponent is representable as an integer. If this overflows, we'll erroneously return nil when in fact the maximum exponent is more than enough to accommodate the integer.

}
// Check that the required exponent is in range.
let maxexp = (1 << (${ExponentBitCount} - 1)) - 1
if _slowPath(absv >> maxexp > 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right, I don't think. If a value has fewer ones than SignificandBitCount, but exactly the number of trailing zeros so that absv >> maxexp == 1, then this algorithm will return an "exact" value that's inexact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here. absv hasn't been shifted by the number of trailing zeros, so this check should be enough to check that it is in range.

For example the largest finite value a 32-bit Float can represent is 0xffffff << 104. (0xffffff << 104) >> 127 == 1 and so passes the check. Higher values (e.g. 0x1000000 << 104) are equal to 2 or more when shifted right by 127 or do not have enough trailing zeros and will be caught by the earlier check.

That said I do think we need to hoist this out of its current position so we can do the exponent check separately.

@xwu
Copy link
Collaborator

xwu commented Jul 12, 2018

I think, alternatively, if you'd like to address the underlying bug only without essentially writing all of this code from scratch, it'd suffice to check after roundtripping that:

let m = value.magnitude
let requiredBits = ${That}.Magnitude.bitWidth
  &- m.leadingZeroBitCount &- m.trailingZeroBitCount
guard requiredBits == significandWidth + 1 else { return nil }

@mundaym
Copy link
Contributor Author

mundaym commented Jul 13, 2018

Thanks for the review, I'll make the adjustments.

Out of interest what is the preferred swift style with regards to guard vs if statements? I noticed in your proposal above (and in other parts of this codebase) you use a guard-else statement, rather than an if which coming from other languages seems strange to me, especially when combined with _slowPath or _fastPath. Do people generally use guard-else when returning from a function early?

@mundaym mundaym force-pushed the s390x-overflow-fix branch from 53a97e7 to 7ac96f7 Compare July 13, 2018 12:35
@xwu
Copy link
Collaborator

xwu commented Jul 13, 2018

Early return is more clearly spelled out with guard...else because the body must exit the scope. Therefore, if that (i.e., early exit) is the intent of the condition being tested, then one can make it explicit to the reader by opting for guard.

Incidentally, fptoui and fptosi can produce undefined values, but uitofp and sitofp cannot; the result is rounded using the default rounding mode if exact representation isn't possible. Therefore, we can rely on the result of the conversion already computed to determine if "exactness" was preserved where it helps readability or performance.

I'll give further comments later--other responsibilities for the moment :)

LLVM specifies that the result of fptosi and fptoui instructions are
undefined if the target type cannot represent the value exactly. On
IBM Z (s390x) these instructions currently saturate when overflow
occurs. This means that the round-trip used to detect overflow
succeeds in situations where the conversion is not actually exact.

For example, casting Int32.max to a Float32 via a sitofp instruction
results in Int32.max + 1. This is inexact. However if we then convert
back to an Int32 via a fptosi instruction the result is clamped to
Int32.max and so the round trip has resulted in the same value. We
therefore cannot rely on round trips alone to verify the exactness
of this cast portably.

This commit modifies the conversion routines so that they do not
rely on undefined behavior and avoid using round trips in general.
@mundaym mundaym force-pushed the s390x-overflow-fix branch from 7ac96f7 to 66b751a Compare July 16, 2018 12:13
@mundaym
Copy link
Contributor Author

mundaym commented Jul 16, 2018

Thanks @xwu. I've modified the PR to use guard statements where appropriate.

@stephentyrone
Copy link
Contributor

@swift-ci please smoke test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants