-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[libcxx] Work around picolibc argv handling in tests. #127662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libcxx] Work around picolibc argv handling in tests. #127662
Conversation
This fixes some test failures when the libcxx tests are run against an up-to-date picolibc on embedded Arm, because those tests depend on an unsupported locale but the `hasAnyLocale` preliminary check wrongly concluded that the locale _was_ supported. `hasAnyLocale` passes a set of locale strings to a test program via the command line, and checks if the libc under test reports that any of the locals can be successfully set via setlocale(). In some invocations one of the locale names contains a space, e.g. the Windows-style locale name "English_United States.1252". Unfortunately picolibc's crt0, when running under Arm semihosting, fetches the single command string from the host and then splits it up at spaces without implementing any kind of quoting. So it simply isn't possible to get a space into an argv word. As a result, we end up testing for the locale (in this example) "English_United". In up-to-date versions of picolibc, this is actually accepted, since it contains no objectionable character set specification (or indeed any at all). So the lit check wrongly concludes that libc supports that locale, and enables some locale tests, which fail. This patch works around the issue entirely within `hasAnyLocale()`, by applying URL %-encoding to each locale name to remove spaces and any likely command-line quote characters. Then the test program (included as a string literal within the same Python function) undoes that quoting. This should make no difference in cases where the argv quoting worked anyway: the quoting and unquoting is unnecessary in that situation, but should still be reliable. But in cases where argv handling is extremely simplistic, this works around it.
@llvm/pr-subscribers-libcxx Author: Simon Tatham (statham-arm) ChangesThis fixes some test failures when the libcxx tests are run against an up-to-date picolibc on embedded Arm, because those tests depend on an unsupported locale but the
Unfortunately picolibc's crt0, when running under Arm semihosting, fetches the single command string from the host and then splits it up at spaces without implementing any kind of quoting. So it simply isn't possible to get a space into an argv word. As a result, we end up testing for the locale (in this example) "English_United". In up-to-date versions of picolibc, this is actually accepted, since it contains no objectionable character set specification (or indeed any at all). So the lit check wrongly concludes that libc supports that locale, and enables some locale tests, which fail. This patch works around the issue entirely within Full diff: https://github.com/llvm/llvm-project/pull/127662.diff 1 Files Affected:
diff --git a/libcxx/utils/libcxx/test/dsl.py b/libcxx/utils/libcxx/test/dsl.py
index 7ff4be0ee7cf9..81db5d099e903 100644
--- a/libcxx/utils/libcxx/test/dsl.py
+++ b/libcxx/utils/libcxx/test/dsl.py
@@ -274,14 +274,38 @@ def hasAnyLocale(config, locales):
%{exec} -- this means that the command may be executed on a remote host
depending on the %{exec} substitution.
"""
+
+ # Some locale names contain spaces, and some embedded use cases
+ # (in particular picolibc under Arm semihosting) can't support
+ # spaces in argv words. Work around this by applying URL-style
+ # %-encoding to the locale names, and undoing it in our test
+ # program.
+ percent_encode = lambda s: ''.join(
+ "%{:02x}".format(c) if c in b' %\\"\'' or c >= 128 else chr(c)
+ for c in s.encode())
+
program = """
#include <stddef.h>
#if defined(_LIBCPP_VERSION) && !_LIBCPP_HAS_LOCALIZATION
int main(int, char**) { return 1; }
#else
+ #include <stdlib.h>
#include <locale.h>
int main(int argc, char** argv) {
for (int i = 1; i < argc; i++) {
+ // %-decode argv[i] in place
+ for (char *p = argv[i], *q = argv[i]; *p; p++) {
+ if (*p == '%') {
+ char hex[3];
+ hex[0] = *++p;
+ hex[1] = *++p;
+ hex[2] = '\\0';
+ *q++ = (char)strtoul(hex, NULL, 0);
+ } else {
+ *q++ = *p;
+ }
+ }
+ // Try to set the locale
if (::setlocale(LC_ALL, argv[i]) != NULL) {
return 0;
}
@@ -290,7 +314,8 @@ def hasAnyLocale(config, locales):
}
#endif
"""
- return programSucceeds(config, program, args=[shlex.quote(l) for l in locales])
+ return programSucceeds(config, program, args=[
+ shlex.quote(percent_encode(l)) for l in locales])
@_memoizeExpensiveOperation(lambda c, flags="": (c.substitutions, c.environment, flags))
|
✅ With the latest revision this PR passed the Python code formatter. |
This rather sounds like a problem with the semihosting environment to me. Why is this worked around here instead? |
The alternative would be to rewrite the Arm semihosting definition to mandate a specific quoting scheme for the command line string, and get everyone to sign up to it:
I appreciate that it's always worth asking "why can't we fix this properly rather than adding a workaround?", but in this case I really think the cure would be worse than the disease. Also, implementing that fix would require a lot of separate software projects to all agree to do it, most of which neither Arm nor LLVM has any control over. This is currently affecting Arm's downstream embedded toolchain built on picolibc + libcxx, but we expect that LLVM's own picolibc-based testing will run into it once it updates picolibc past some recent locale support patches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the very clear description of the issue, this helps understand the (otherwise slightly obscure) patch. I have a suggestion for fixing this that might make the code less surprising.
libcxx/utils/libcxx/test/dsl.py
Outdated
program = """ | ||
#include <stddef.h> | ||
#if defined(_LIBCPP_VERSION) && !_LIBCPP_HAS_LOCALIZATION | ||
int main(int, char**) { return 1; } | ||
#else | ||
#include <stdlib.h> | ||
#include <locale.h> | ||
int main(int argc, char** argv) { | ||
for (int i = 1; i < argc; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, perhaps we could hardcode the locales we're testing in the program itself, and then run that program without any argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I've done that, and updated the PR description to describe the alternative fix (which I hope means that the updated description will be used as the final commit message after squash-and-merge).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with comment addressed!
…27096 and #127662 We need to incorporate the following upstreamed patches into the 20.x branch. So applying these as patch files. [compiler-rt] Add support for big endian for Arm's __negdf2vfp - (cherry picked from llvm/llvm-project#127096) [compiler-rt] Fix tests of _aeabi(idivmod|uidivmod|uldivmod) to support big endian - (cherry picked from llvm/llvm-project#126277) [libcxx] Work around picolibc argv handling in tests - (cherry picked from llvm/llvm-project#127662)
…27096 and #127662 (#140) We need to incorporate the following upstreamed patches into the 20.x branch. So applying these as patch files. [compiler-rt] Add support for big endian for Arm's __negdf2vfp - (cherry picked from llvm/llvm-project#127096) [compiler-rt] Fix tests of _aeabi(idivmod|uidivmod|uldivmod) to support big endian - (cherry picked from llvm/llvm-project#126277) [libcxx] Work around picolibc argv handling in tests - (cherry picked from llvm/llvm-project#127662)
This fixes some test failures when the libcxx tests are run against an up-to-date picolibc on embedded Arm, because those tests depend on an unsupported locale but the
hasAnyLocale
preliminary check wrongly concluded that the locale was supported.hasAnyLocale
passes a set of locale strings to a test program via the command line, and checks if the libc under test reports that any of the locales can be successfully set via setlocale(). In some invocations one of the locale names contains a space, e.g. the Windows-style locale name "English_United States.1252".Unfortunately picolibc's crt0, when running under Arm semihosting, fetches the single command string from the host and then splits it up at spaces without implementing any kind of quoting. So it simply isn't possible to get a space into an argv word. As a result, we end up testing for the locale (in this example) "English_United". In up-to-date versions of picolibc, this is actually accepted, since it contains no objectionable character set specification (or indeed any at all). So the lit check wrongly concludes that libc supports that locale, and enables some locale tests, which fail.
This patch works around the issue entirely within
hasAnyLocale()
, by abandoning the use of argv completely, and instead encoding the list of locales to check as an array of strings inside the test program.