Skip to content

[clang] Fix ASTWriter crash after merging named enums #114240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,7 @@ Bug Fixes in This Version

- Clang now outputs correct values when #embed data contains bytes with negative
signed char values (#GH102798).
- Fixed a crash when merging named enumerations in modules (#GH114240).
- Fixed rejects-valid problem when #embed appears in std::initializer_list or
when it can affect template argument deduction (#GH122306).
- Fix crash on code completion of function calls involving partial order of function templates
Expand Down
8 changes: 7 additions & 1 deletion clang/include/clang/Sema/Sema.h
Original file line number Diff line number Diff line change
Expand Up @@ -4008,7 +4008,7 @@ class Sema final : public SemaBase {
/// Perform ODR-like check for C/ObjC when merging tag types from modules.
/// Differently from C++, actually parse the body and reject / error out
/// in case of a structural mismatch.
bool ActOnDuplicateDefinition(Decl *Prev, SkipBodyInfo &SkipBody);
bool ActOnDuplicateDefinition(Scope *S, Decl *Prev, SkipBodyInfo &SkipBody);

typedef void *SkippedDefinitionContext;

Expand Down Expand Up @@ -4132,6 +4132,12 @@ class Sema final : public SemaBase {
void MergeTypedefNameDecl(Scope *S, TypedefNameDecl *New,
LookupResult &OldDecls);

/// CleanupMergedEnum - We have just merged the decl 'New' by making another
/// definition visible.
/// This method performs any necessary cleanup on the parser state to discard
/// child nodes from newly parsed decl we are retiring.
void CleanupMergedEnum(Scope *S, Decl *New);

/// MergeFunctionDecl - We just parsed a function 'New' from
/// declarator D which has the same name and scope as a previous
/// declaration 'Old'. Figure out how to resolve this situation,
Expand Down
2 changes: 1 addition & 1 deletion clang/lib/Parse/ParseDecl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5652,7 +5652,7 @@ void Parser::ParseEnumSpecifier(SourceLocation StartLoc, DeclSpec &DS,
Decl *D = SkipBody.CheckSameAsPrevious ? SkipBody.New : TagDecl;
ParseEnumBody(StartLoc, D);
if (SkipBody.CheckSameAsPrevious &&
!Actions.ActOnDuplicateDefinition(TagDecl, SkipBody)) {
!Actions.ActOnDuplicateDefinition(getCurScope(), TagDecl, SkipBody)) {
DS.SetTypeSpecError();
return;
}
Expand Down
3 changes: 2 additions & 1 deletion clang/lib/Parse/ParseDeclCXX.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2350,7 +2350,8 @@ void Parser::ParseClassSpecifier(tok::TokenKind TagTokKind,
// Parse the definition body.
ParseStructUnionBody(StartLoc, TagType, cast<RecordDecl>(D));
if (SkipBody.CheckSameAsPrevious &&
!Actions.ActOnDuplicateDefinition(TagOrTempResult.get(), SkipBody)) {
!Actions.ActOnDuplicateDefinition(getCurScope(),
TagOrTempResult.get(), SkipBody)) {
DS.SetTypeSpecError();
return;
}
Expand Down
30 changes: 17 additions & 13 deletions clang/lib/Sema/SemaDecl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2564,18 +2564,7 @@ void Sema::MergeTypedefNameDecl(Scope *S, TypedefNameDecl *New,
// Make the old tag definition visible.
makeMergedDefinitionVisible(Hidden);

// If this was an unscoped enumeration, yank all of its enumerators
// out of the scope.
if (isa<EnumDecl>(NewTag)) {
Scope *EnumScope = getNonFieldDeclScope(S);
for (auto *D : NewTag->decls()) {
auto *ED = cast<EnumConstantDecl>(D);
assert(EnumScope->isDeclScope(ED));
EnumScope->RemoveDecl(ED);
IdResolver.RemoveDecl(ED);
ED->getLexicalDeclContext()->removeDecl(ED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super confident about this change, is it related or could it be separate? @AaronBallman

Copy link
Contributor Author

@michael-jabbour-sonarsource michael-jabbour-sonarsource Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising the concern. I am also not entirely sure, but I chose to remove it for two reasons:

  1. This removeDecl() call causes an incorrect ODR violation diagnostic when -ast-dump-all is provided (which I am using in my regression test to expect the AST nodes after merging):

    In module 'ModB' imported from <source>:1:
    In module 'ModA' imported from ./ModBFile.h:2:
    ./shared.h:6:9: error: 'MyEnum3' has different definitions in different modules; definition in module 
    'ModA.ModAFile1' first difference is enum with 1 element
        6 | typedef enum { MyVal_C } MyEnum3;
          |         ^~~~~~~~~~~~~~~~
    ./shared.h:6:9: note: but in 'ModB' found enum with 0 elements
        6 | typedef enum { MyVal_C } MyEnum3;
          |         ^~~~~~~~~~~~~~~~
    1 error generated.
    

    This incorrect ODR violation can be observed before this patch. CE link: here.

  2. I am also not entirely sure if performing the call in the named enum case (the new case I am introducing) is safe. I can understand why it is important to remove the constant from IdResolver and the Enum context, but I don't know if removing the constant from the enum decl doesn't cause problems in some cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, as far as I can see, for EnumDeclConstants created by the parser, the DeclContext is always the containing Enum itself:

return EnumConstantDecl::Create(Context, Enum, IdLoc, Id, EltTy,
Val, EnumVal);

Therefore, after the refactoring I did, I think that the call can be simplified to:

ED->removeDecl(ECD);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to dig deeper to answer the points I mentioned (especially point 2) if desired, but I would need to allocate some time (maybe at the end of the week).

Any pointers/hints/thoughts would be appreciated.

Copy link
Contributor Author

@michael-jabbour-sonarsource michael-jabbour-sonarsource Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look into this today, and I could see that some other ODR checks aren't emitted unless -ast-dump-all is provided. For example, in this CE example, I have the enum E defined with different enumerators, and I don't get a diagnostic unless I have -Xclang -ast-dump-all on the command line (removing it, somehow removes the diagnostic).

While I still don't have a concrete answer about the points I mentioned in my previous response, it seems to me that this is a separate bug in the ODR check implementation, and that ideally, we should get the diagnostic without the -ast-dump-all.

If my conclusion above is correct, I would argue that dropping this call to removeDecl is necessary for these ODR checks to work (once the bug above is fixed).

I may try to dig deeper later into why the diagnostics are not emitted without -ast-dump-all. I am curious about your thoughts here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilya-biryukov, this PR is deletes a statement ED->getLexicalDeclContext()->removeDecl(ED); which I think is okay. Is there an easy way to check on your infrastructure if we did not miss some obscure code example which proves us wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, happy to run this over our codebase. Will report back after a day or two.

Copy link
Contributor

@ilya-biryukov ilya-biryukov Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to circle back and report that the tests are running right now, but may not finish before the end of the working day.

Sorry for a slight delay. We should definitely know if this breaks anything tomorrow, but maybe even later today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues on our side with this change. So LG!

Copy link
Contributor Author

@michael-jabbour-sonarsource michael-jabbour-sonarsource Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all very much for checking this. I will be updating the failing test in a few hours or tomorrow.

}
}
CleanupMergedEnum(S, NewTag);
}
}

Expand Down Expand Up @@ -2652,6 +2641,19 @@ void Sema::MergeTypedefNameDecl(Scope *S, TypedefNameDecl *New,
notePreviousDefinition(Old, New->getLocation());
}

void Sema::CleanupMergedEnum(Scope *S, Decl *New) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is a property of merging. Can we find a common path when merging definitions to put that logic there? Is MergeTypedefNameDecl not called for the motivating example?

Copy link
Contributor Author

@michael-jabbour-sonarsource michael-jabbour-sonarsource Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MergeTypedefNameDecl not called for the motivating example?

Unfortunately, it seems to me that MergeTypedefNameDecl is only called in the typedef to anonymous enum case (MyEnum3 in the test case I am adding). For the rest of the enum cases, the closest I found was Sema::ActOnDuplicateDefinition, which is called when merging all tags (and this is where I am adding the new call to CleanupMergedEnum).

Can we find a common path when merging definitions to put that logic there?

I could only see that merging enums for C and Obj-C crashes in this case (C++ works differently, see here), and I found Sema::ActOnDuplicateDefinition during my investigation to be the function that handles merging tags.

I am not aware of a central place for merging definitions in general. Could you provide some hints on what should I look for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that Sema::ActOnDuplicateDefinition maybe our best bet at the moment.

// If this was an unscoped enumeration, yank all of its enumerators
// out of the scope.
if (auto *ED = dyn_cast<EnumDecl>(New); ED && !ED->isScoped()) {
Scope *EnumScope = getNonFieldDeclScope(S);
for (auto *ECD : ED->enumerators()) {
assert(EnumScope->isDeclScope(ECD));
EnumScope->RemoveDecl(ECD);
IdResolver.RemoveDecl(ECD);
}
}
}

/// DeclhasAttr - returns true if decl Declaration already has the target
/// attribute.
static bool DeclHasAttr(const Decl *D, const Attr *A) {
Expand Down Expand Up @@ -18347,12 +18349,14 @@ void Sema::ActOnTagStartDefinition(Scope *S, Decl *TagD) {
AddPushedVisibilityAttribute(Tag);
}

bool Sema::ActOnDuplicateDefinition(Decl *Prev, SkipBodyInfo &SkipBody) {
bool Sema::ActOnDuplicateDefinition(Scope *S, Decl *Prev,
SkipBodyInfo &SkipBody) {
if (!hasStructuralCompatLayout(Prev, SkipBody.New))
return false;

// Make the previous decl visible.
makeMergedDefinitionVisible(SkipBody.Previous);
CleanupMergedEnum(S, SkipBody.New);
return true;
}

Expand Down
111 changes: 111 additions & 0 deletions clang/test/Modules/modules-merge-enum.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
// RUN: rm -rf %t
// RUN: split-file %s %t


// Expect no crash
// RUN: %clang_cc1 -fmodules -fmodules-cache-path=%t/modcache -fmodule-map-file=%t/module.modulemap %t/source.m

// Add -ast-dump-all to check that the AST nodes are merged correctly.
// RUN: %clang_cc1 -fmodules -fmodules-cache-path=%t/modcache -fmodule-map-file=%t/module.modulemap %t/source.m -ast-dump-all 2>&1 | FileCheck %s


//--- shared.h
// This header is shared between two modules, but it's not a module itself.
// The enums defined here are parsed in both modules, and merged while building ModB.

typedef enum MyEnum1 { MyVal_A } MyEnum1;
// CHECK: |-EnumDecl 0x{{.*}} imported in ModA.ModAFile1 <undeserialized declarations> MyEnum1
// CHECK-NEXT: | |-also in ModB
// CHECK-NEXT: | `-EnumConstantDecl 0x{{.*}} imported in ModA.ModAFile1 referenced MyVal_A 'int'
// CHECK-NEXT: |-TypedefDecl 0x{{.*}} imported in ModA.ModAFile1 hidden MyEnum1 'enum MyEnum1'
// CHECK-NEXT: | `-ElaboratedType 0x{{.*}} 'enum MyEnum1' sugar imported
// CHECK-NEXT: | `-EnumType 0x{{.*}} 'enum MyEnum1' imported
// CHECK-NEXT: | `-Enum 0x{{.*}} 'MyEnum1'


enum MyEnum2 { MyVal_B };
// CHECK: |-EnumDecl 0x{{.*}} imported in ModA.ModAFile1 <undeserialized declarations> MyEnum2
// CHECK-NEXT: | |-also in ModB
// CHECK-NEXT: | `-EnumConstantDecl 0x{{.*}} imported in ModA.ModAFile1 referenced MyVal_B 'int'


typedef enum { MyVal_C } MyEnum3;
// CHECK: |-EnumDecl 0x{{.*}} imported in ModA.ModAFile1 <undeserialized declarations>
// CHECK-NEXT: | |-also in ModB
// CHECK-NEXT: | `-EnumConstantDecl 0x{{.*}} imported in ModA.ModAFile1 referenced MyVal_C 'int'
// CHECK-NEXT: |-TypedefDecl 0x{{.*}} imported in ModA.ModAFile1 hidden MyEnum3 'enum MyEnum3':'MyEnum3'
// CHECK-NEXT: | `-ElaboratedType 0x{{.*}} 'enum MyEnum3' sugar imported
// CHECK-NEXT: | `-EnumType 0x{{.*}} 'MyEnum3' imported
// CHECK-NEXT: | `-Enum 0x{{.*}}

struct MyStruct {
enum MyEnum5 { MyVal_D } Field;
};

// CHECK: |-RecordDecl 0x{{.*}} imported in ModA.ModAFile1 <undeserialized declarations> struct MyStruct definition
// CHECK-NEXT: | |-also in ModB
// CHECK-NEXT: | |-EnumDecl 0x{{.*}} imported in ModA.ModAFile1 <undeserialized declarations> MyEnum5
// CHECK-NEXT: | | |-also in ModB
// CHECK-NEXT: | | `-EnumConstantDecl 0x{{.*}} imported in ModA.ModAFile1 referenced MyVal_D 'int'
// CHECK-NEXT: | `-FieldDecl 0x{{.*}} imported in ModA.ModAFile1 hidden Field 'enum MyEnum5'

// In this case, no merging happens on the EnumDecl in Objective-C, and ASTWriter writes both EnumConstantDecls when building ModB.
enum { MyVal_E };
// CHECK: |-EnumDecl 0x{{.*}} imported in ModA.ModAFile1 hidden <undeserialized declarations>
// CHECK-NEXT: | `-EnumConstantDecl 0x{{.*}} imported in ModA.ModAFile1 hidden MyVal_E 'int'


// Redeclarations coming from ModB.
// CHECK: |-TypedefDecl 0x{{.*}} prev 0x{{.*}} imported in ModB MyEnum1 'enum MyEnum1'
// CHECK-NEXT: | `-ElaboratedType 0x{{.*}} 'enum MyEnum1' sugar imported
// CHECK-NEXT: | `-EnumType 0x{{.*}} 'enum MyEnum1' imported
// CHECK-NEXT: | `-Enum 0x{{.*}} 'MyEnum1'

// CHECK: |-EnumDecl 0x{{.*}} prev 0x{{.*}} imported in ModB <undeserialized declarations>
// CHECK-NEXT: | |-also in ModB
// CHECK-NEXT: | `-EnumConstantDecl 0x{{.*}} imported in ModB MyVal_C 'int'
// CHECK-NEXT: |-TypedefDecl 0x{{.*}} prev 0x{{.*}} imported in ModB MyEnum3 'enum MyEnum3':'MyEnum3'
// CHECK-NEXT: | `-ElaboratedType 0x{{.*}} 'enum MyEnum3' sugar imported
// CHECK-NEXT: | `-EnumType 0x{{.*}} 'MyEnum3' imported
// CHECK-NEXT: | `-Enum 0x{{.*}}

// CHECK: |-EnumDecl 0x{{.*}} imported in ModB <undeserialized declarations>
// CHECK-NEXT: | `-EnumConstantDecl 0x{{.*}} first 0x{{.*}} imported in ModB referenced MyVal_E 'int'



//--- module.modulemap
module ModA {
module ModAFile1 {
header "ModAFile1.h"
export *
}
module ModAFile2 {
header "ModAFile2.h"
export *
}
}
// The goal of writing ModB is to test that ASTWriter can handle the merged AST nodes correctly.
// For example, a stale declaration in IdResolver can cause an assertion failure while writing the identifier table.
module ModB {
header "ModBFile.h"
export *
}

//--- ModAFile1.h
#include "shared.h"

//--- ModAFile2.h
// Including this file, triggers loading of the module ModA with nodes coming ModAFile1.h being hidden.

//--- ModBFile.h
// ModBFile depends on ModAFile2.h only.
#include "ModAFile2.h"
// Including shared.h here causes Sema to merge the AST nodes from shared.h with the hidden ones from ModA.
#include "shared.h"


//--- source.m
#include "ModBFile.h"

int main() { return MyVal_A + MyVal_B + MyVal_C + MyVal_D + MyVal_E; }