[NFC][libc] Clarifies underscores in n-char-sequence. #102193

mordante · 2024-08-06T18:29:38Z

The C standard specifies
n-char-sequence:
digit
nondigit
n-char-sequence digit
n-char-sequence nondigit

nondigit is specified as one of:
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

This means nondigit includes the underscore character. This patch clarifies this status in the comments and the test.

Note C17 specifies n-char-sequence for NaN() as optional, and an empty sequence is not a valid n-char-sequence. However the current comment has the same effect as using the pedantic wording. So I left that part unchanged.

The C standard specifies n-char-sequence: digit nondigit n-char-sequence digit n-char-sequence nondigit nondigit is specified as one of: _ a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z This means nondigit includes the underscore character. This patch clarifies this status in the comments and the test. Note C17 specifies n-char-sequence for NaN() as optional, and an empty sequence is not a valid n-char-sequence. However the current comment has the same effect as using the pedantic wording. So I left that part unchanged.

llvmbot · 2024-08-06T18:30:11Z

@llvm/pr-subscribers-libc

Author: Mark de Wever (mordante)

Changes

The C standard specifies
n-char-sequence:
digit
nondigit
n-char-sequence digit
n-char-sequence nondigit

nondigit is specified as one of:
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

This means nondigit includes the underscore character. This patch clarifies this status in the comments and the test.

Note C17 specifies n-char-sequence for NaN() as optional, and an empty sequence is not a valid n-char-sequence. However the current comment has the same effect as using the pedantic wording. So I left that part unchanged.

Full diff: https://github.com/llvm/llvm-project/pull/102193.diff

2 Files Affected:

(modified) libc/src/__support/str_to_float.h (+2-4)
(modified) libc/test/src/stdlib/strtof_test.cpp (+1-1)

diff --git a/libc/src/__support/str_to_float.h b/libc/src/__support/str_to_float.h
index c72bc1f957dc37..c76119021cc45b 100644
--- a/libc/src/__support/str_to_float.h
+++ b/libc/src/__support/str_to_float.h
@@ -1160,13 +1160,11 @@ LIBC_INLINE StrToNumResult<T> strtofloatingpoint(const char *__restrict src) {
       index += 3;
       StorageType nan_mantissa = 0;
       // this handles the case of `NaN(n-character-sequence)`, where the
-      // n-character-sequence is made of 0 or more letters and numbers in any
-      // order.
+      // n-character-sequence is made of 0 or more letters, numbers, or
+      // underscore characters in any order.
       if (src[index] == '(') {
         size_t left_paren = index;
         ++index;
-        // Apparently it's common for underscores to also be accepted. No idea
-        // why, but it's causing fuzz failures.
         while (isalnum(src[index]) || src[index] == '_')
           ++index;
         if (src[index] == ')') {
diff --git a/libc/test/src/stdlib/strtof_test.cpp b/libc/test/src/stdlib/strtof_test.cpp
index d7991745b69e6c..6a716c956291cc 100644
--- a/libc/test/src/stdlib/strtof_test.cpp
+++ b/libc/test/src/stdlib/strtof_test.cpp
@@ -200,7 +200,7 @@ TEST_F(LlvmLibcStrToFTest, NaNWithParenthesesValidSequenceInvalidNumberTests) {
   run_test("NaN(1a)", 7, 0x7fc00000);
   run_test("NaN(asdf)", 9, 0x7fc00000);
   run_test("NaN(1A1)", 8, 0x7fc00000);
-  run_test("NaN(why_does_this_work)", 23, 0x7fc00000);
+  run_test("NaN(underscores_are_ok)", 23, 0x7fc00000);
   run_test(
       "NaN(1234567890qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM_)",
       68, 0x7fc00000);

michaelrj-google

LGTM, thanks for the patch

mordante requested a review from michaelrj-google August 6, 2024 18:29

llvmbot added the libc label Aug 6, 2024

michaelrj-google approved these changes Aug 6, 2024

View reviewed changes

mordante merged commit 1cbd25f into llvm:main Aug 12, 2024
8 checks passed

mordante deleted the review/underscores_in_n_char_sequence branch August 12, 2024 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NFC][libc] Clarifies underscores in n-char-sequence. #102193

[NFC][libc] Clarifies underscores in n-char-sequence. #102193

Uh oh!

mordante commented Aug 6, 2024

Uh oh!

llvmbot commented Aug 6, 2024

Uh oh!

michaelrj-google left a comment

Uh oh!

Uh oh!

Uh oh!

[NFC][libc] Clarifies underscores in n-char-sequence. #102193

[NFC][libc] Clarifies underscores in n-char-sequence. #102193

Uh oh!

Conversation

mordante commented Aug 6, 2024

Uh oh!

llvmbot commented Aug 6, 2024

Uh oh!

michaelrj-google left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!