Skip to content

[-Wunterminated-string-initialization] Handle C string literals ending with explicit '\0' #143487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 26, 2025

Conversation

ziqingluo-90
Copy link
Contributor

In C, a char array needs no "nonstring" attribute, if its initializer is a string literal that 1) explicitly ends with '\0' and 2) fits in the array after a possible truncation.

For example
char a[4] = "ABC\0"; // fine, needs no "nonstring" attr

rdar://152506883

…g with explicit '\0'

In C, a char array needs no "nonstring" attribute, if its initializer
is a string literal that 1) explicitly ends with '\0' and 2) fits in
the array after a possible truncation.

For example
`char a[4] = "ABC\0"; // fine, needs no "nonstring" attr`

rdar://152506883
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jun 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2025

@llvm/pr-subscribers-clang

Author: Ziqing Luo (ziqingluo-90)

Changes

In C, a char array needs no "nonstring" attribute, if its initializer is a string literal that 1) explicitly ends with '\0' and 2) fits in the array after a possible truncation.

For example
char a[4] = "ABC\0"; // fine, needs no "nonstring" attr

rdar://152506883


Full diff: https://github.com/llvm/llvm-project/pull/143487.diff

2 Files Affected:

  • (modified) clang/lib/Sema/SemaInit.cpp (+5)
  • (added) clang/test/Sema/attr-nonstring_safe.c (+28)
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index da56225b2f926..f7592688e0327 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -260,6 +260,11 @@ static void CheckStringInit(Expr *Str, QualType &DeclT, const ArrayType *AT,
              diag::ext_initializer_string_for_char_array_too_long)
           << Str->getSourceRange();
     else if (StrLength - 1 == ArrayLen) {
+      // If the string literal is null-terminated explicitly, e.g., `char a[4] =
+      // "ABC\0"`, there should be no warn:
+      if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens()))
+        if (SL->isOrdinary() && SL->getBytes().back() == 0)
+          return;
       // If the entity being initialized has the nonstring attribute, then
       // silence the "missing nonstring" diagnostic. If there's no entity,
       // check whether we're initializing an array of arrays; if so, walk the
diff --git a/clang/test/Sema/attr-nonstring_safe.c b/clang/test/Sema/attr-nonstring_safe.c
new file mode 100644
index 0000000000000..3ea441e033dba
--- /dev/null
+++ b/clang/test/Sema/attr-nonstring_safe.c
@@ -0,0 +1,28 @@
+// RUN: %clang_cc1 -fsyntax-only -verify -Wunterminated-string-initialization %s -x c
+// RUN: %clang_cc1 -fsyntax-only -verify -Wunterminated-string-initialization %s -x c++
+
+
+// In C, the following examples are fine:
+#if __cplusplus
+char foo[3] = "fo\0"; // expected-error {{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}
+
+struct S {
+  char buf[3];
+  char fub[3];
+} s = { "ba\0", "bo\0" }; // expected-error 2{{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}
+
+signed char scfoo[3] = "fo\0"; // expected-error {{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}
+unsigned char ucfoo[3] = "fo\0"; // expected-error {{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}
+
+#else
+//expected-no-diagnostics
+char foo[3] = "fo\0";
+
+struct S {
+  char buf[3];
+  char fub[3];
+} s = { "ba\0", "bo\0" };
+
+signed char scfoo[3] = "fo\0";
+unsigned char ucfoo[3] = "fo\0";
+#endif

@@ -260,6 +260,11 @@ static void CheckStringInit(Expr *Str, QualType &DeclT, const ArrayType *AT,
diag::ext_initializer_string_for_char_array_too_long)
<< Str->getSourceRange();
else if (StrLength - 1 == ArrayLen) {
// If the string literal is null-terminated explicitly, e.g., `char a[4] =
// "ABC\0"`, there should be no warn:
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens()))
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens()); SL && SL->isOrdinary() && SL->getBytes().back() == 0)



// In C, the following examples are fine:
#if __cplusplus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if __cplusplus
#ifdef __cplusplus

Copy link
Member

@Sirraide Sirraide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating the tests and using #ifdef, instead change the run line for -x c++ to e.g. -verify=cxx,expected and then replace all instances of expected-error w/ cxx-error (but leave the existing expected-no-diagnostics).

@@ -260,6 +260,11 @@ static void CheckStringInit(Expr *Str, QualType &DeclT, const ArrayType *AT,
diag::ext_initializer_string_for_char_array_too_long)
<< Str->getSourceRange();
else if (StrLength - 1 == ArrayLen) {
// If the string literal is null-terminated explicitly, e.g., `char a[4] =
// "ABC\0"`, there should be no warn:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// "ABC\0"`, there should be no warn:
// "ABC\0"`, there should be no warning.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for other encodings.

Also, needs a release note.

// If the string literal is null-terminated explicitly, e.g., `char a[4] =
// "ABC\0"`, there should be no warn:
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens()))
if (SL->isOrdinary() && SL->getBytes().back() == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this encoding-aware? what about if this is a 2nd byte in a multi-byte?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this means the functionality only works for char and not other string types. I would imagine we'd want to also support: wchar_t foo[4] = { L'f', L'o', L'o', L'\0' }; and other variants.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! In general, I agree that this is a case where we likely don't want the diagnostic or to require the attribute.

// If the string literal is null-terminated explicitly, e.g., `char a[4] =
// "ABC\0"`, there should be no warn:
if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens()))
if (SL->isOrdinary() && SL->getBytes().back() == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this means the functionality only works for char and not other string types. I would imagine we'd want to also support: wchar_t foo[4] = { L'f', L'o', L'o', L'\0' }; and other variants.

@ziqingluo-90
Copy link
Contributor Author

Thank you all for valuable feedbacks. I have improved the PR by

  • supporting different character encodings
  • adding more test cases
  • updating the release notes

@ziqingluo-90 ziqingluo-90 requested a review from ahatanak June 14, 2025 01:10
@ziqingluo-90
Copy link
Contributor Author

@shafik, @erichkeane @AaronBallman @tbaederr @zwuis
I believe all feedback has been addressed. Could you give it another look when you are free?



#ifdef __cplusplus
// C++ is stricter so the following cases should be warned about:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want the same test cases between C and C++, the only thing that should be different is the expected diagnostic comments. e.g.,

char foo3[3] = "fo\0"; // cxx-error {{initializer-string for char array is too long, array size is 3 but initializer has size 4 (including the null terminating character)}}

is fine in C because the diagnostic will only be checked in C++ mode.

The only code that should be in the #if would be things like the typedefs for C.

if (const auto *SL = dyn_cast<StringLiteral>(Str->IgnoreParens());
SL && SL->getLength() > 0 &&
SL->getCodeUnit(SL->getLength() - 1) == 0)
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the early return suppresses warn_initializer_string_for_char_array_too_long_for_cpp warnings?

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ziqingluo-90 ziqingluo-90 merged commit 9903c19 into llvm:main Jun 26, 2025
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 26, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-sie-ubuntu-fast running on sie-linux-worker while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/28591

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Sema/attr-nonstring_safe.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/lib/clang/21/include -nostdsysteminc -fsyntax-only -verify=compat,expected -Wunterminated-string-initialization /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Sema/attr-nonstring_safe.c -x c # RUN: at line 1
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/lib/clang/21/include -nostdsysteminc -fsyntax-only -verify=compat,expected -Wunterminated-string-initialization /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Sema/attr-nonstring_safe.c -x c
error: diagnostics with 'error' severity seen but not expected: 
  File /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Sema/attr-nonstring_safe.c Line 31: use of undeclared identifier 'u'
  File /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Sema/attr-nonstring_safe.c Line 34: use of undeclared identifier 'U'
  File /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Sema/attr-nonstring_safe.c Line 43: use of undeclared identifier 'u'
  File /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Sema/attr-nonstring_safe.c Line 44: use of undeclared identifier 'U'
error: diagnostics with 'warning' severity expected but not seen: 
  File /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Sema/attr-nonstring_safe.c Line 31 'compat-warning': initializer-string for character array is too long for C++, array size is 3 but initializer has size 4 (including the null terminating character)
  File /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/Sema/attr-nonstring_safe.c Line 34 'compat-warning': initializer-string for character array is too long for C++, array size is 3 but initializer has size 4 (including the null terminating character)
6 errors generated.

--

********************


@TomWeaver18
Copy link
Contributor

@ziqingluo-90 Hello and greetings form the UK!

Are you aware that this change caused the following build bot failure? :
https://lab.llvm.org/buildbot/#/builders/144/builds/28591

Are you looking into a fix? The bot has been red for some time now.

If a quick fix is not possible please consider reverting the change until one can be found.

Thanks in advance,
Tom W

TomWeaver18 pushed a commit that referenced this pull request Jun 26, 2025
…ls ending with explicit '\0' (#143487)"

This reverts commit 9903c19.

Caused the following buildbot failure:
https://lab.llvm.org/buildbot/#/builders/144/builds/28591

Please fix before recommitting.
@TomWeaver18
Copy link
Contributor

My apologies but I've had to revert this change to get the build bot green again.

Reverted in 30de98c

anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
…g with explicit '\0' (llvm#143487)

In C, a char array needs no "nonstring" attribute, if its initializer is
a string literal that 1) explicitly ends with '\0' and 2) fits in the
array after a possible truncation.

For example
`char a[4] = "ABC\0"; // fine, needs no "nonstring" attr`

rdar://152506883
anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
…ls ending with explicit '\0' (llvm#143487)"

This reverts commit 9903c19.

Caused the following buildbot failure:
https://lab.llvm.org/buildbot/#/builders/144/builds/28591

Please fix before recommitting.
ziqingluo-90 added a commit that referenced this pull request Jun 27, 2025
…als ending with explicit '\0' (#143487)"

In C, a char array needs no "nonstring" attribute, if its initializer is
a string literal that 1) explicitly ends with '\0' and 2) fits in the
array after a possible truncation.

For example
`char a[4] = "ABC\0"; // fine, needs no "nonstring" attr`

rdar://152506883

This reland disables the test for linux so that it will not block the
buildbot: https://lab.llvm.org/buildbot/#/builders/144/builds/28591.
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 27, 2025

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building clang at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/31310

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: tools/llvm-exegesis/RISCV/rvv/filter.test' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
/b/1/llvm-x86_64-debian-dylib/build/bin/llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=inverse_throughput --opcode-name=PseudoVNCLIPU_WX_M1_MASK     --riscv-filter-config='vtype = {VXRM: rod, AVL: VLMAX, SEW: e(8|16), Policy: ta/mu}' --max-configs-per-opcode=1000 --min-instructions=10 | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test # RUN: at line 1
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=inverse_throughput --opcode-name=PseudoVNCLIPU_WX_M1_MASK '--riscv-filter-config=vtype = {VXRM: rod, AVL: VLMAX, SEW: e(8|16), Policy: ta/mu}' --max-configs-per-opcode=1000 --min-instructions=10
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test
PseudoVNCLIPU_WX_M1_MASK: Failed to produce any snippet via: instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, for uses, one unique register for each position
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test

--

********************


@ziqingluo-90
Copy link
Contributor Author

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building clang at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/31310

Here is the relevant piece of the build log for the reference

I don't think the failure is relevant.

rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
…g with explicit '\0' (llvm#143487)

In C, a char array needs no "nonstring" attribute, if its initializer is
a string literal that 1) explicitly ends with '\0' and 2) fits in the
array after a possible truncation.

For example
`char a[4] = "ABC\0"; // fine, needs no "nonstring" attr`

rdar://152506883
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
…ls ending with explicit '\0' (llvm#143487)"

This reverts commit 9903c19.

Caused the following buildbot failure:
https://lab.llvm.org/buildbot/#/builders/144/builds/28591

Please fix before recommitting.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
…als ending with explicit '\0' (llvm#143487)"

In C, a char array needs no "nonstring" attribute, if its initializer is
a string literal that 1) explicitly ends with '\0' and 2) fits in the
array after a possible truncation.

For example
`char a[4] = "ABC\0"; // fine, needs no "nonstring" attr`

rdar://152506883

This reland disables the test for linux so that it will not block the
buildbot: https://lab.llvm.org/buildbot/#/builders/144/builds/28591.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.