-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[locale][num_get] Improve Stage 2 of string to float conversion #65168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2 "Stage 2" of num_get::do_get() depends on "a check ... to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1". Previously this was a very simple check whether the next character was in a set of allowed characters. This could lead to Stage 2 accumulating character sequences such as "1.2f" and passing them to strtold (Stage 3). https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3 Stage 3 can fail, however, if the entire character sequence from Stage 2 is not used in the conversion. For example, the "f" in "1.2f" is not used. https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.4 As a result, parsing a sequence like "1.2f" would return value 0.0 with failbit set. This change improves the checks made in Stage 2, determining what is passed to Stage 3. * Hex digits are only considered valid if "0x" has been seen * INFINITY value is recognised * Characters in INFINITY and NAN are only valid in sequence. This is done by checking one character backwards, which has obvious limitations. * New tests are added. The old ones are preserved but refactored. Differential Revision: https://reviews.llvm.org/D99091
a69e54d
to
18800a8
Compare
It seems possible to ship this without breaking the ABI. And assuming the new behavior doesn't start to reject meaningful previously accepted code then I think we should ship it retroactively. |
LWG2381 (not implemented in libc++ yet) intendedly rejects Inf and NaN. Should we restrict the handling of Inf and NaN to some extension mode? |
I also don't see why this has to be an ABI break. What is the reason?
IMO we should just implement the LWG issue and not have an extension mode. |
I'm more than happy to drop this PR if it's not valid anymore. I was just trying to salvage it from https://reviews.llvm.org/D99091, but honestly I'm not very fluent in this part of the library so progress has been almost non-existent, and this is a bit low on my priority list. I'll drop this. |
…7948) This PR implements [LWG2381](https://cplusplus.github.io/LWG/issue2381) by rejecting `'i'`, `'I'`, `'n'`, `'N'` in FP parsing, as inf and NaN are intendedly rejected by that LWG issue. The source character array used for parsing is `"0123456789abcdefABCDEFxX+-pPiInN"`, whose first 26 or 28 characters are used for parsing integers or floating-point values respectively. Previously, libc++ used 32 characters, including `'i'`, `'I'`, `'n'`, `'N'`, for FP parsing, which was inconsistent with LWG2381. This PR also replaces magic numbers 26 and 28 (formerly 32) with named constants. Drive-by change: when the first character (possibly after the leading `'+'` or `'-'`) is not a decimal digit but an acceptable character (e.g., `'p'` or `'e'`), the character is not accumulated now (per Stage 2 in [facet.num.get.virtuals]/3). #65168 may be rendered invalid, see #65168 (comment). Apple back-deployment targets remain broken, likely due to dylib. XFAIL is marked in related tests. --------- Co-authored-by: Mark de Wever <[email protected]>
…7948) This PR implements [LWG2381](https://cplusplus.github.io/LWG/issue2381) by rejecting `'i'`, `'I'`, `'n'`, `'N'` in FP parsing, as inf and NaN are intendedly rejected by that LWG issue. The source character array used for parsing is `"0123456789abcdefABCDEFxX+-pPiInN"`, whose first 26 or 28 characters are used for parsing integers or floating-point values respectively. Previously, libc++ used 32 characters, including `'i'`, `'I'`, `'n'`, `'N'`, for FP parsing, which was inconsistent with LWG2381. This PR also replaces magic numbers 26 and 28 (formerly 32) with named constants. Drive-by change: when the first character (possibly after the leading `'+'` or `'-'`) is not a decimal digit but an acceptable character (e.g., `'p'` or `'e'`), the character is not accumulated now (per Stage 2 in [facet.num.get.virtuals]/3). #65168 may be rendered invalid, see llvm/llvm-project#65168 (comment). Apple back-deployment targets remain broken, likely due to dylib. XFAIL is marked in related tests. --------- Co-authored-by: Mark de Wever <[email protected]> NOKEYCHECK=True GitOrigin-RevId: 3e15c97fa3812993bdc319827a5c6d867b765ae8
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2
"Stage 2" of num_get::do_get() depends on "a check ... to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1". Previously this was a very simple check whether the next character was in a set of allowed characters. This could lead to Stage 2 accumulating character sequences such as "1.2f" and passing them to strtold (Stage 3).
https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3
Stage 3 can fail, however, if the entire character sequence from Stage 2 is not used in the conversion. For example, the "f" in "1.2f" is not used. https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.4
As a result, parsing a sequence like "1.2f" would return value 0.0 with failbit set.
This change improves the checks made in Stage 2, determining what is passed to Stage 3.
Hex digits are only considered valid if "0x" has been seen
INFINITY value is recognised
Characters in INFINITY and NAN are only valid in sequence. This is done by checking one character backwards, which has obvious limitations.
New tests are added. The old ones are preserved but refactored.
Differential Revision: https://reviews.llvm.org/D99091