-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[-Wunterminated-string-initialization] Handle C string literals ending with explicit '\0' #143487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…g with explicit '\0' In C, a char array needs no "nonstring" attribute, if its initializer is a string literal that 1) explicitly ends with '\0' and 2) fits in the array after a possible truncation. For example `char a[4] = "ABC\0"; // fine, needs no "nonstring" attr` rdar://152506883
@llvm/pr-subscribers-clang Author: Ziqing Luo (ziqingluo-90) ChangesIn C, a char array needs no "nonstring" attribute, if its initializer is a string literal that 1) explicitly ends with '\0' and 2) fits in the array after a possible truncation. For example rdar://152506883 Full diff: https://github.com/llvm/llvm-project/pull/143487.diff 2 Files Affected:
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index da56225b2f926..f7592688e0327 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -260,6 +260,11 @@ static void CheckStringInit(Expr *Str, QualType &DeclT, const ArrayType *AT,
diag::ext_initializer_string_for_char_array_too_long)
<< Str->getSourceRange();
else if (StrLength - 1 == ArrayLen) {
+ // If the string literal is null-terminated explicitly, e.g., `char a[4] =
+ // "ABC\0"`, there should be no warn:
+ if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens()))
+ if (SL->isOrdinary() && SL->getBytes().back() == 0)
+ return;
// If the entity being initialized has the nonstring attribute, then
// silence the "missing nonstring" diagnostic. If there's no entity,
// check whether we're initializing an array of arrays; if so, walk the
diff --git a/clang/test/Sema/attr-nonstring_safe.c b/clang/test/Sema/attr-nonstring_safe.c
new file mode 100644
index 0000000000000..3ea441e033dba
--- /dev/null
+++ b/clang/test/Sema/attr-nonstring_safe.c
@@ -0,0 +1,28 @@
+// RUN: %clang_cc1 -fsyntax-only -verify -Wunterminated-string-initialization %s -x c
+// RUN: %clang_cc1 -fsyntax-only -verify -Wunterminated-string-initialization %s -x c++
+
+
+// In C, the following examples are fine:
+#if __cplusplus
+char foo[3] = "fo\0"; // expected-error {{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}
+
+struct S {
+ char buf[3];
+ char fub[3];
+} s = { "ba\0", "bo\0" }; // expected-error 2{{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}
+
+signed char scfoo[3] = "fo\0"; // expected-error {{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}
+unsigned char ucfoo[3] = "fo\0"; // expected-error {{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}
+
+#else
+//expected-no-diagnostics
+char foo[3] = "fo\0";
+
+struct S {
+ char buf[3];
+ char fub[3];
+} s = { "ba\0", "bo\0" };
+
+signed char scfoo[3] = "fo\0";
+unsigned char ucfoo[3] = "fo\0";
+#endif
|
clang/lib/Sema/SemaInit.cpp
Outdated
@@ -260,6 +260,11 @@ static void CheckStringInit(Expr *Str, QualType &DeclT, const ArrayType *AT, | |||
diag::ext_initializer_string_for_char_array_too_long) | |||
<< Str->getSourceRange(); | |||
else if (StrLength - 1 == ArrayLen) { | |||
// If the string literal is null-terminated explicitly, e.g., `char a[4] = | |||
// "ABC\0"`, there should be no warn: | |||
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens())) | |
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens()); SL && SL->isOrdinary() && SL->getBytes().back() == 0) |
|
||
|
||
// In C, the following examples are fine: | ||
#if __cplusplus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if __cplusplus | |
#ifdef __cplusplus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of duplicating the tests and using #ifdef
, instead change the run line for -x c++
to e.g. -verify=cxx,expected
and then replace all instances of expected-error
w/ cxx-error
(but leave the existing expected-no-diagnostics
).
clang/lib/Sema/SemaInit.cpp
Outdated
@@ -260,6 +260,11 @@ static void CheckStringInit(Expr *Str, QualType &DeclT, const ArrayType *AT, | |||
diag::ext_initializer_string_for_char_array_too_long) | |||
<< Str->getSourceRange(); | |||
else if (StrLength - 1 == ArrayLen) { | |||
// If the string literal is null-terminated explicitly, e.g., `char a[4] = | |||
// "ABC\0"`, there should be no warn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// "ABC\0"`, there should be no warn: | |
// "ABC\0"`, there should be no warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests for other encodings.
Also, needs a release note.
clang/lib/Sema/SemaInit.cpp
Outdated
// If the string literal is null-terminated explicitly, e.g., `char a[4] = | ||
// "ABC\0"`, there should be no warn: | ||
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens())) | ||
if (SL->isOrdinary() && SL->getBytes().back() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this encoding-aware? what about if this is a 2nd byte in a multi-byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this means the functionality only works for char
and not other string types. I would imagine we'd want to also support: wchar_t foo[4] = { L'f', L'o', L'o', L'\0' };
and other variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! In general, I agree that this is a case where we likely don't want the diagnostic or to require the attribute.
clang/lib/Sema/SemaInit.cpp
Outdated
// If the string literal is null-terminated explicitly, e.g., `char a[4] = | ||
// "ABC\0"`, there should be no warn: | ||
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens())) | ||
if (SL->isOrdinary() && SL->getBytes().back() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this means the functionality only works for char
and not other string types. I would imagine we'd want to also support: wchar_t foo[4] = { L'f', L'o', L'o', L'\0' };
and other variants.
Thank you all for valuable feedbacks. I have improved the PR by
|
@shafik, @erichkeane @AaronBallman @tbaederr @zwuis |
|
||
|
||
#ifdef __cplusplus | ||
// C++ is stricter so the following cases should be warned about: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want the same test cases between C and C++, the only thing that should be different is the expected diagnostic comments. e.g.,
char foo3[3] = "fo\0"; // cxx-error {{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}
is fine in C because the diagnostic will only be checked in C++ mode.
The only code that should be in the #if
would be things like the typedefs for C.
clang/lib/Sema/SemaInit.cpp
Outdated
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens()); | ||
SL && SL->getLength() > 0 && | ||
SL->getCodeUnit(SL->getLength() - 1) == 0) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the early return suppresses warn_initializer_string_for_char_array_too_long_for_cpp warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/28591 Here is the relevant piece of the build log for the reference
|
@ziqingluo-90 Hello and greetings form the UK! Are you aware that this change caused the following build bot failure? : Are you looking into a fix? The bot has been red for some time now. If a quick fix is not possible please consider reverting the change until one can be found. Thanks in advance, |
…ls ending with explicit '\0' (#143487)" This reverts commit 9903c19. Caused the following buildbot failure: https://lab.llvm.org/buildbot/#/builders/144/builds/28591 Please fix before recommitting.
My apologies but I've had to revert this change to get the build bot green again. Reverted in 30de98c |
…g with explicit '\0' (llvm#143487) In C, a char array needs no "nonstring" attribute, if its initializer is a string literal that 1) explicitly ends with '\0' and 2) fits in the array after a possible truncation. For example `char a[4] = "ABC\0"; // fine, needs no "nonstring" attr` rdar://152506883
…ls ending with explicit '\0' (llvm#143487)" This reverts commit 9903c19. Caused the following buildbot failure: https://lab.llvm.org/buildbot/#/builders/144/builds/28591 Please fix before recommitting.
…als ending with explicit '\0' (#143487)" In C, a char array needs no "nonstring" attribute, if its initializer is a string literal that 1) explicitly ends with '\0' and 2) fits in the array after a possible truncation. For example `char a[4] = "ABC\0"; // fine, needs no "nonstring" attr` rdar://152506883 This reland disables the test for linux so that it will not block the buildbot: https://lab.llvm.org/buildbot/#/builders/144/builds/28591.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/31310 Here is the relevant piece of the build log for the reference
|
I don't think the failure is relevant. |
…g with explicit '\0' (llvm#143487) In C, a char array needs no "nonstring" attribute, if its initializer is a string literal that 1) explicitly ends with '\0' and 2) fits in the array after a possible truncation. For example `char a[4] = "ABC\0"; // fine, needs no "nonstring" attr` rdar://152506883
…ls ending with explicit '\0' (llvm#143487)" This reverts commit 9903c19. Caused the following buildbot failure: https://lab.llvm.org/buildbot/#/builders/144/builds/28591 Please fix before recommitting.
…als ending with explicit '\0' (llvm#143487)" In C, a char array needs no "nonstring" attribute, if its initializer is a string literal that 1) explicitly ends with '\0' and 2) fits in the array after a possible truncation. For example `char a[4] = "ABC\0"; // fine, needs no "nonstring" attr` rdar://152506883 This reland disables the test for linux so that it will not block the buildbot: https://lab.llvm.org/buildbot/#/builders/144/builds/28591.
In C, a char array needs no "nonstring" attribute, if its initializer is a string literal that 1) explicitly ends with '\0' and 2) fits in the array after a possible truncation.
For example
char a[4] = "ABC\0"; // fine, needs no "nonstring" attr
rdar://152506883