Skip to content

[clang] Do not serialize function definitions without a body #121550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

alejandro-alvarez-sonarsource
Copy link
Contributor

An instantiated templated function definition may not have a body due to parsing errors inside the templated function. When serializing, an assert is triggered inside ASTRecordWriter::AddFunctionDefinition.

The instantiation may happen on an intermediate module.

The test case was reduced from mp-units.

An instantiated templated function definition may not have a body due
to parsing errors inside the templated function. When serializing, an
assert is tripped inside `ASTRecordWriter::AddFunctionDefinition` when
building with assertions enabled.

The instantiation may happen on an intermediate module.

The test case was reduced from `mp-units`.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Jan 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 3, 2025

@llvm/pr-subscribers-clang-modules

Author: Alejandro Álvarez Ayllón (alejandro-alvarez-sonarsource)

Changes

An instantiated templated function definition may not have a body due to parsing errors inside the templated function. When serializing, an assert is triggered inside ASTRecordWriter::AddFunctionDefinition.

The instantiation may happen on an intermediate module.

The test case was reduced from mp-units.


Full diff: https://github.com/llvm/llvm-project/pull/121550.diff

2 Files Affected:

  • (modified) clang/lib/Serialization/ASTWriter.cpp (+4-2)
  • (added) clang/test/Modules/missing-body-in-import.cpp (+42)
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 4a6027943072c0..36b9e3e2ba1720 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -6227,8 +6227,10 @@ void ASTWriter::WriteDeclUpdatesBlocks(ASTContext &Context,
     // Add a trailing update record, if any. These must go last because we
     // lazily load their attached statement.
     if (!GeneratingReducedBMI || !CanElideDeclDef(D)) {
-      if (HasUpdatedBody) {
-        const auto *Def = cast<FunctionDecl>(D);
+      assert(!(HasUpdatedBody && HasAddedVarDefinition) &&
+             "Declaration can not be both a FunctionDecl and a VarDecl");
+      if (const auto *Def = dyn_cast<FunctionDecl>(D);
+          HasUpdatedBody && Def->doesThisDeclarationHaveABody()) {
         Record.push_back(UPD_CXX_ADDED_FUNCTION_DEFINITION);
         Record.push_back(Def->isInlined());
         Record.AddSourceLocation(Def->getInnerLocStart());
diff --git a/clang/test/Modules/missing-body-in-import.cpp b/clang/test/Modules/missing-body-in-import.cpp
new file mode 100644
index 00000000000000..b52ebba15087a3
--- /dev/null
+++ b/clang/test/Modules/missing-body-in-import.cpp
@@ -0,0 +1,42 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: cd %t
+
+// RUN: %clang_cc1 -std=c++23 mod1.cppm -emit-module-interface -o mod1.pcm -fallow-pcm-with-compiler-errors -verify
+// RUN: %clang_cc1 -std=c++23 mod2.cppm -emit-module-interface -o mod2.pcm -fmodule-file=mod1=mod1.pcm -verify -fallow-pcm-with-compiler-errors
+// RUN: %clang_cc1 -std=c++23 mod3.cppm -emit-module-interface -o mod3.pcm -fmodule-file=mod1=mod1.pcm -fmodule-file=mod2=mod2.pcm -verify -fallow-pcm-with-compiler-errors
+// RUN: %clang_cc1 -std=c++23 main.cpp -fmodule-file=mod1=mod1.pcm -fmodule-file=mod2=mod2.pcm -fmodule-file=mod3=mod3.pcm -verify -fallow-pcm-with-compiler-errors -ast-dump-all
+
+//--- mod1.cppm
+export module mod1;
+
+export template <unsigned N, unsigned M>
+class A {
+public:
+  constexpr A(const char[], const char[]) {
+    auto x = BrokenExpr; // expected-error {{use of undeclared identifier 'BrokenExpr'}}
+  }
+};
+
+export template<A<1,1> NTTP>
+struct B {};
+
+template < unsigned N, unsigned M >
+A(const char (&)[N], const char (&)[M]) -> A< 1, 1 >;
+
+//--- mod2.cppm
+export module mod2;
+import mod1;
+
+struct C: B <A{"a", "b"}> { // expected-error {{non-type template argument is not a constant expression}}
+  constexpr C(int a) { }
+};
+
+//--- mod3.cppm
+// expected-no-diagnostics
+export module mod3;
+export import mod2;
+
+//--- main.cpp
+// expected-no-diagnostics
+import mod3; // no crash

@llvmbot
Copy link
Member

llvmbot commented Jan 3, 2025

@llvm/pr-subscribers-clang

Author: Alejandro Álvarez Ayllón (alejandro-alvarez-sonarsource)

Changes

An instantiated templated function definition may not have a body due to parsing errors inside the templated function. When serializing, an assert is triggered inside ASTRecordWriter::AddFunctionDefinition.

The instantiation may happen on an intermediate module.

The test case was reduced from mp-units.


Full diff: https://github.com/llvm/llvm-project/pull/121550.diff

2 Files Affected:

  • (modified) clang/lib/Serialization/ASTWriter.cpp (+4-2)
  • (added) clang/test/Modules/missing-body-in-import.cpp (+42)
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 4a6027943072c0..36b9e3e2ba1720 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -6227,8 +6227,10 @@ void ASTWriter::WriteDeclUpdatesBlocks(ASTContext &Context,
     // Add a trailing update record, if any. These must go last because we
     // lazily load their attached statement.
     if (!GeneratingReducedBMI || !CanElideDeclDef(D)) {
-      if (HasUpdatedBody) {
-        const auto *Def = cast<FunctionDecl>(D);
+      assert(!(HasUpdatedBody && HasAddedVarDefinition) &&
+             "Declaration can not be both a FunctionDecl and a VarDecl");
+      if (const auto *Def = dyn_cast<FunctionDecl>(D);
+          HasUpdatedBody && Def->doesThisDeclarationHaveABody()) {
         Record.push_back(UPD_CXX_ADDED_FUNCTION_DEFINITION);
         Record.push_back(Def->isInlined());
         Record.AddSourceLocation(Def->getInnerLocStart());
diff --git a/clang/test/Modules/missing-body-in-import.cpp b/clang/test/Modules/missing-body-in-import.cpp
new file mode 100644
index 00000000000000..b52ebba15087a3
--- /dev/null
+++ b/clang/test/Modules/missing-body-in-import.cpp
@@ -0,0 +1,42 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: cd %t
+
+// RUN: %clang_cc1 -std=c++23 mod1.cppm -emit-module-interface -o mod1.pcm -fallow-pcm-with-compiler-errors -verify
+// RUN: %clang_cc1 -std=c++23 mod2.cppm -emit-module-interface -o mod2.pcm -fmodule-file=mod1=mod1.pcm -verify -fallow-pcm-with-compiler-errors
+// RUN: %clang_cc1 -std=c++23 mod3.cppm -emit-module-interface -o mod3.pcm -fmodule-file=mod1=mod1.pcm -fmodule-file=mod2=mod2.pcm -verify -fallow-pcm-with-compiler-errors
+// RUN: %clang_cc1 -std=c++23 main.cpp -fmodule-file=mod1=mod1.pcm -fmodule-file=mod2=mod2.pcm -fmodule-file=mod3=mod3.pcm -verify -fallow-pcm-with-compiler-errors -ast-dump-all
+
+//--- mod1.cppm
+export module mod1;
+
+export template <unsigned N, unsigned M>
+class A {
+public:
+  constexpr A(const char[], const char[]) {
+    auto x = BrokenExpr; // expected-error {{use of undeclared identifier 'BrokenExpr'}}
+  }
+};
+
+export template<A<1,1> NTTP>
+struct B {};
+
+template < unsigned N, unsigned M >
+A(const char (&)[N], const char (&)[M]) -> A< 1, 1 >;
+
+//--- mod2.cppm
+export module mod2;
+import mod1;
+
+struct C: B <A{"a", "b"}> { // expected-error {{non-type template argument is not a constant expression}}
+  constexpr C(int a) { }
+};
+
+//--- mod3.cppm
+// expected-no-diagnostics
+export module mod3;
+export import mod2;
+
+//--- main.cpp
+// expected-no-diagnostics
+import mod3; // no crash

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks correct but I'd like @ChuanqiXu9 to confirm

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, previously we didn't allow -fallow-pcm-with-compiler-errors so we didn't meet such problems. Out of curiosity (not a blocking issue), why do you want this feature for C++20 modules?

Comment on lines 6232 to 6233
if (const auto *Def = dyn_cast<FunctionDecl>(D);
HasUpdatedBody && Def->doesThisDeclarationHaveABody()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, it looks better to not set HasUpdatedBody if the declaration doesn't have a body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested.

@alejandro-alvarez-sonarsource
Copy link
Contributor Author

Out of curiosity (not a blocking issue), why do you want this feature for C++20 modules?

Sanitizers, linters and indexers have to deal with code using non supported C++ language features or extensions. This is also true for C++20 modules. While conceivably one may implement module support without going through the intermediate PCM, to keep the semantics close to what the compiler does, the most straightforward is to do what the compiler does and go throught the same codepath as when actually compiling 😃
And also, they would be able to not having to reparse everything and leverage one of the benefits of C++20 modules.

@ChuanqiXu9
Copy link
Member

Out of curiosity (not a blocking issue), why do you want this feature for C++20 modules?

Sanitizers, linters and indexers have to deal with code using non supported C++ language features or extensions. This is also true for C++20 modules. While conceivably one may implement module support without going through the intermediate PCM, to keep the semantics close to what the compiler does, the most straightforward is to do what the compiler does and go throught the same codepath as when actually compiling 😃 And also, they would be able to not having to reparse everything and leverage one of the benefits of C++20 modules.

Just to make sure I understand things correctly, you're saying:

// some.cc
import mod;
...

You hope the linter doesn't give up some.cc if mod contains some errors? If yes, that makes sense to me.

But why Sanitizers? IIRC, they should only work if the project compiles. Or are you saying the static analyzer?

@alejandro-alvarez-sonarsource
Copy link
Contributor Author

You hope the linter doesn't give up some.cc if mod contains some errors? If yes, that makes sense to me.

Precisely.

But why Sanitizers? IIRC, they should only work if the project compiles. Or are you saying the static analyzer?

Sorry, my mind slipped there. Indeed I meant analyzers.

if (Kind == UPD_CXX_ADDED_FUNCTION_DEFINITION) {
assert(isa<FunctionDecl>(D) && "expected FunctionDecl");
HasUpdatedBody = dyn_cast<FunctionDecl>(D)->hasBody();
} else if (Kind == UPD_CXX_ADDED_VAR_DEFINITION)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (Kind == UPD_CXX_ADDED_VAR_DEFINITION)
} else if (Kind == UPD_CXX_ADDED_VAR_DEFINITION) {

Please see LLVM's policy for the use of { : https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

we hope the style is consistent with the if-else branches

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

if (Kind == UPD_CXX_ADDED_FUNCTION_DEFINITION) {
assert(isa<FunctionDecl>(D) && "expected FunctionDecl");
HasUpdatedBody = dyn_cast<FunctionDecl>(D)->hasBody();
} else if (Kind == UPD_CXX_ADDED_VAR_DEFINITION)
HasAddedVarDefinition = true;
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto,

Suggested change
else
else {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits

else if (Kind == UPD_CXX_ADDED_VAR_DEFINITION)
if (Kind == UPD_CXX_ADDED_FUNCTION_DEFINITION) {
assert(isa<FunctionDecl>(D) && "expected FunctionDecl");
HasUpdatedBody = cast<FunctionDecl>(D)->hasBody();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to skip this on the last iteration. We should use doesThisDeclarationHaveABody instead of hasBody. Since hasBody may be more expensive.

And also it is still slightly better to not set UPD_CXX_ADDED_FUNCTION_DEFINITION if the function doesn't have a body. We can check this in CompletedImplicitDefinition and VariableDefinitionInstantiated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! I want the get it right, not to rush it :D

Changed, although I imagine you meant FunctionDefinitionInstantiated (not VariableDefinitionInstantiated)

Copy link

github-actions bot commented Jan 6, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

We can land this after you fix the formatting issue. I generally use the following command to format my patch:

git diff -U0 --no-color --relative HEAD^ | clang/tools/clang-format/clang-format-diff.py -p1 -i

feel free to use other methods

@@ -7230,6 +7230,9 @@ void ASTWriter::CompletedImplicitDefinition(const FunctionDecl *D) {
if (!D->isFromASTFile())
return; // Declaration not imported from PCH.

if (!D->doesThisDeclarationHaveABody())
return; // The function definition may not have a body due to parsing errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to see the comments above the if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved

@alejandro-alvarez-sonarsource
Copy link
Contributor Author

Thanks for the review! Would you mind doing the merge? I do not have permission.

@ChuanqiXu9
Copy link
Member

Thanks for the review! Would you mind doing the merge? I do not have permission.

I am just waiting the CI to be green : )

@ChuanqiXu9 ChuanqiXu9 merged commit a13bcf3 into llvm:main Jan 6, 2025
8 checks passed
paulhuggett pushed a commit to paulhuggett/llvm-project that referenced this pull request Jan 7, 2025
…1550)

An instantiated templated function definition may not have a body due to
parsing errors inside the templated function. When serializing, an
assert is triggered inside `ASTRecordWriter::AddFunctionDefinition`.

The instantiation may happen on an intermediate module.

The test case was reduced from `mp-units`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants