Skip to content

[NFC][libc] Clarifies underscores in n-char-sequence. #102193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

mordante
Copy link
Member

@mordante mordante commented Aug 6, 2024

The C standard specifies
n-char-sequence:
digit
nondigit
n-char-sequence digit
n-char-sequence nondigit

nondigit is specified as one of:
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

This means nondigit includes the underscore character. This patch clarifies this status in the comments and the test.

Note C17 specifies n-char-sequence for NaN() as optional, and an empty sequence is not a valid n-char-sequence. However the current comment has the same effect as using the pedantic wording. So I left that part unchanged.

The C standard specifies
  n-char-sequence:
    digit
    nondigit
    n-char-sequence digit
    n-char-sequence nondigit

nondigit is specified as one of:
  _ a b c d e f g h i j k l m
    n o p q r s t u v w x y z
    A B C D E F G H I J K L M
    N O P Q R S T U V W X Y Z

This means nondigit includes the underscore character. This patch clarifies
this status in the comments and the test.

Note C17 specifies n-char-sequence for NaN() as optional, and an empty sequence
is not a valid n-char-sequence. However the current comment has the same effect
as using the pedantic wording. So I left that part unchanged.
@llvmbot
Copy link
Member

llvmbot commented Aug 6, 2024

@llvm/pr-subscribers-libc

Author: Mark de Wever (mordante)

Changes

The C standard specifies
n-char-sequence:
digit
nondigit
n-char-sequence digit
n-char-sequence nondigit

nondigit is specified as one of:
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

This means nondigit includes the underscore character. This patch clarifies this status in the comments and the test.

Note C17 specifies n-char-sequence for NaN() as optional, and an empty sequence is not a valid n-char-sequence. However the current comment has the same effect as using the pedantic wording. So I left that part unchanged.


Full diff: https://github.com/llvm/llvm-project/pull/102193.diff

2 Files Affected:

  • (modified) libc/src/__support/str_to_float.h (+2-4)
  • (modified) libc/test/src/stdlib/strtof_test.cpp (+1-1)
diff --git a/libc/src/__support/str_to_float.h b/libc/src/__support/str_to_float.h
index c72bc1f957dc37..c76119021cc45b 100644
--- a/libc/src/__support/str_to_float.h
+++ b/libc/src/__support/str_to_float.h
@@ -1160,13 +1160,11 @@ LIBC_INLINE StrToNumResult<T> strtofloatingpoint(const char *__restrict src) {
       index += 3;
       StorageType nan_mantissa = 0;
       // this handles the case of `NaN(n-character-sequence)`, where the
-      // n-character-sequence is made of 0 or more letters and numbers in any
-      // order.
+      // n-character-sequence is made of 0 or more letters, numbers, or
+      // underscore characters in any order.
       if (src[index] == '(') {
         size_t left_paren = index;
         ++index;
-        // Apparently it's common for underscores to also be accepted. No idea
-        // why, but it's causing fuzz failures.
         while (isalnum(src[index]) || src[index] == '_')
           ++index;
         if (src[index] == ')') {
diff --git a/libc/test/src/stdlib/strtof_test.cpp b/libc/test/src/stdlib/strtof_test.cpp
index d7991745b69e6c..6a716c956291cc 100644
--- a/libc/test/src/stdlib/strtof_test.cpp
+++ b/libc/test/src/stdlib/strtof_test.cpp
@@ -200,7 +200,7 @@ TEST_F(LlvmLibcStrToFTest, NaNWithParenthesesValidSequenceInvalidNumberTests) {
   run_test("NaN(1a)", 7, 0x7fc00000);
   run_test("NaN(asdf)", 9, 0x7fc00000);
   run_test("NaN(1A1)", 8, 0x7fc00000);
-  run_test("NaN(why_does_this_work)", 23, 0x7fc00000);
+  run_test("NaN(underscores_are_ok)", 23, 0x7fc00000);
   run_test(
       "NaN(1234567890qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM_)",
       68, 0x7fc00000);

Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the patch

@mordante mordante merged commit 1cbd25f into llvm:main Aug 12, 2024
8 checks passed
@mordante mordante deleted the review/underscores_in_n_char_sequence branch August 12, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants