Skip to content

[libcxx] Work around picolibc argv handling in tests. #127662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 21, 2025

Conversation

statham-arm
Copy link
Collaborator

@statham-arm statham-arm commented Feb 18, 2025

This fixes some test failures when the libcxx tests are run against an up-to-date picolibc on embedded Arm, because those tests depend on an unsupported locale but the hasAnyLocale preliminary check wrongly concluded that the locale was supported.

hasAnyLocale passes a set of locale strings to a test program via the command line, and checks if the libc under test reports that any of the locales can be successfully set via setlocale(). In some invocations one of the locale names contains a space, e.g. the Windows-style locale name "English_United States.1252".

Unfortunately picolibc's crt0, when running under Arm semihosting, fetches the single command string from the host and then splits it up at spaces without implementing any kind of quoting. So it simply isn't possible to get a space into an argv word. As a result, we end up testing for the locale (in this example) "English_United". In up-to-date versions of picolibc, this is actually accepted, since it contains no objectionable character set specification (or indeed any at all). So the lit check wrongly concludes that libc supports that locale, and enables some locale tests, which fail.

This patch works around the issue entirely within hasAnyLocale(), by abandoning the use of argv completely, and instead encoding the list of locales to check as an array of strings inside the test program.

This fixes some test failures when the libcxx tests are run against an
up-to-date picolibc on embedded Arm, because those tests depend on an
unsupported locale but the `hasAnyLocale` preliminary check wrongly
concluded that the locale _was_ supported.

`hasAnyLocale` passes a set of locale strings to a test program via
the command line, and checks if the libc under test reports that any
of the locals can be successfully set via setlocale(). In some
invocations one of the locale names contains a space, e.g. the
Windows-style locale name "English_United States.1252".

Unfortunately picolibc's crt0, when running under Arm semihosting,
fetches the single command string from the host and then splits it up
at spaces without implementing any kind of quoting. So it simply isn't
possible to get a space into an argv word. As a result, we end up
testing for the locale (in this example) "English_United". In
up-to-date versions of picolibc, this is actually accepted, since it
contains no objectionable character set specification (or indeed any
at all). So the lit check wrongly concludes that libc supports that
locale, and enables some locale tests, which fail.

This patch works around the issue entirely within `hasAnyLocale()`, by
applying URL %-encoding to each locale name to remove spaces and any
likely command-line quote characters. Then the test program (included
as a string literal within the same Python function) undoes that
quoting. This should make no difference in cases where the argv
quoting worked anyway: the quoting and unquoting is unnecessary in
that situation, but should still be reliable. But in cases where argv
handling is extremely simplistic, this works around it.
@statham-arm statham-arm requested a review from a team as a code owner February 18, 2025 16:29
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Feb 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 18, 2025

@llvm/pr-subscribers-libcxx

Author: Simon Tatham (statham-arm)

Changes

This fixes some test failures when the libcxx tests are run against an up-to-date picolibc on embedded Arm, because those tests depend on an unsupported locale but the hasAnyLocale preliminary check wrongly concluded that the locale was supported.

hasAnyLocale passes a set of locale strings to a test program via the command line, and checks if the libc under test reports that any of the locals can be successfully set via setlocale(). In some invocations one of the locale names contains a space, e.g. the Windows-style locale name "English_United States.1252".

Unfortunately picolibc's crt0, when running under Arm semihosting, fetches the single command string from the host and then splits it up at spaces without implementing any kind of quoting. So it simply isn't possible to get a space into an argv word. As a result, we end up testing for the locale (in this example) "English_United". In up-to-date versions of picolibc, this is actually accepted, since it contains no objectionable character set specification (or indeed any at all). So the lit check wrongly concludes that libc supports that locale, and enables some locale tests, which fail.

This patch works around the issue entirely within hasAnyLocale(), by applying URL %-encoding to each locale name to remove spaces and any likely command-line quote characters. Then the test program (included as a string literal within the same Python function) undoes that quoting. This should make no difference in cases where the argv quoting worked anyway: the quoting and unquoting is unnecessary in that situation, but should still be reliable. But in cases where argv handling is extremely simplistic, this works around it.


Full diff: https://github.com/llvm/llvm-project/pull/127662.diff

1 Files Affected:

  • (modified) libcxx/utils/libcxx/test/dsl.py (+26-1)
diff --git a/libcxx/utils/libcxx/test/dsl.py b/libcxx/utils/libcxx/test/dsl.py
index 7ff4be0ee7cf9..81db5d099e903 100644
--- a/libcxx/utils/libcxx/test/dsl.py
+++ b/libcxx/utils/libcxx/test/dsl.py
@@ -274,14 +274,38 @@ def hasAnyLocale(config, locales):
     %{exec} -- this means that the command may be executed on a remote host
     depending on the %{exec} substitution.
     """
+
+    # Some locale names contain spaces, and some embedded use cases
+    # (in particular picolibc under Arm semihosting) can't support
+    # spaces in argv words. Work around this by applying URL-style
+    # %-encoding to the locale names, and undoing it in our test
+    # program.
+    percent_encode = lambda s: ''.join(
+        "%{:02x}".format(c) if c in b' %\\"\'' or c >= 128 else chr(c)
+        for c in s.encode())
+
     program = """
     #include <stddef.h>
     #if defined(_LIBCPP_VERSION) && !_LIBCPP_HAS_LOCALIZATION
       int main(int, char**) { return 1; }
     #else
+      #include <stdlib.h>
       #include <locale.h>
       int main(int argc, char** argv) {
         for (int i = 1; i < argc; i++) {
+          // %-decode argv[i] in place
+          for (char *p = argv[i], *q = argv[i]; *p; p++) {
+            if (*p == '%') {
+              char hex[3];
+              hex[0] = *++p;
+              hex[1] = *++p;
+              hex[2] = '\\0';
+              *q++ = (char)strtoul(hex, NULL, 0);
+            } else {
+              *q++ = *p;
+            }
+          }
+          // Try to set the locale
           if (::setlocale(LC_ALL, argv[i]) != NULL) {
             return 0;
           }
@@ -290,7 +314,8 @@ def hasAnyLocale(config, locales):
       }
     #endif
   """
-    return programSucceeds(config, program, args=[shlex.quote(l) for l in locales])
+    return programSucceeds(config, program, args=[
+        shlex.quote(percent_encode(l)) for l in locales])
 
 
 @_memoizeExpensiveOperation(lambda c, flags="": (c.substitutions, c.environment, flags))

Copy link

github-actions bot commented Feb 18, 2025

✅ With the latest revision this PR passed the Python code formatter.

@philnik777
Copy link
Contributor

This rather sounds like a problem with the semihosting environment to me. Why is this worked around here instead?

@statham-arm
Copy link
Collaborator Author

The alternative would be to rewrite the Arm semihosting definition to mandate a specific quoting scheme for the command line string, and get everyone to sign up to it:

  • picolibc would have to dequote and split the semihosting command string according to the new rules
    • so would every other libc that compiles in a semihosting mode
  • everything that constructs a semihosting command line would have to implement the quoting side
    • qemu
    • other simulators, free and proprietary
    • anything that uses semihosting to debug on real hardware dev boards over a network or serial line
  • everyone who has already defined some other semantics for the semihosting command string would have to work around breakage introduced by this change, or introduce a mode flag

I appreciate that it's always worth asking "why can't we fix this properly rather than adding a workaround?", but in this case I really think the cure would be worse than the disease. Also, implementing that fix would require a lot of separate software projects to all agree to do it, most of which neither Arm nor LLVM has any control over.

This is currently affecting Arm's downstream embedded toolchain built on picolibc + libcxx, but we expect that LLVM's own picolibc-based testing will run into it once it updates picolibc past some recent locale support patches.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very clear description of the issue, this helps understand the (otherwise slightly obscure) patch. I have a suggestion for fixing this that might make the code less surprising.

program = """
#include <stddef.h>
#if defined(_LIBCPP_VERSION) && !_LIBCPP_HAS_LOCALIZATION
int main(int, char**) { return 1; }
#else
#include <stdlib.h>
#include <locale.h>
int main(int argc, char** argv) {
for (int i = 1; i < argc; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, perhaps we could hardcode the locales we're testing in the program itself, and then run that program without any argument?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I've done that, and updated the PR description to describe the alternative fix (which I hope means that the updated description will be used as the final commit message after squash-and-merge).

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comment addressed!

@ldionne ldionne added the pending-ci Merging the PR is only pending completion of CI label Feb 20, 2025
@statham-arm statham-arm merged commit 30ae485 into llvm:main Feb 21, 2025
80 checks passed
@statham-arm statham-arm deleted the libcxx-locale-argv-quoting branch February 21, 2025 09:08
simpal01 added a commit to simpal01/arm-toolchain that referenced this pull request Feb 28, 2025
…27096 and #127662

We need to incorporate the following upstreamed patches into the 20.x branch.
So applying these as patch files.

[compiler-rt] Add support for big endian for Arm's __negdf2vfp -
(cherry picked from llvm/llvm-project#127096)

[compiler-rt] Fix tests of _aeabi(idivmod|uidivmod|uldivmod) to support big endian -
(cherry picked from llvm/llvm-project#126277)

[libcxx] Work around picolibc argv handling in tests -
(cherry picked from llvm/llvm-project#127662)
simpal01 added a commit to arm/arm-toolchain that referenced this pull request Feb 28, 2025
…27096 and #127662 (#140)

We need to incorporate the following upstreamed patches into the 20.x
branch. So applying these as patch files.

[compiler-rt] Add support for big endian for Arm's __negdf2vfp - (cherry
picked from llvm/llvm-project#127096)

[compiler-rt] Fix tests of _aeabi(idivmod|uidivmod|uldivmod) to support
big endian - (cherry picked from
llvm/llvm-project#126277)

[libcxx] Work around picolibc argv handling in tests - (cherry picked
from llvm/llvm-project#127662)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. pending-ci Merging the PR is only pending completion of CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants