Skip to content

[TBAA] Generate tbaa.struct single field with char tag for unions. #84370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Mar 7, 2024

At the moment,distinct fields for each union member are generated. When copying a union, we don't know which union member is active, so there's no benefit from recording the different fields. It can result in converting tbaa.struct fields to incorrect tbaa nodes when extracting fields.

At the moment,distinct fields for each union member are generated. When
copying a union, we don't know which union member is active, so there's
no benefit from recording the different fields. It can also result in
converting tbaa.struct fields to incorrect tbaa nodes when extracting
fields.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Mar 7, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 7, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Florian Hahn (fhahn)

Changes

At the moment,distinct fields for each union member are generated. When copying a union, we don't know which union member is active, so there's no benefit from recording the different fields. It can result in converting tbaa.struct fields to incorrect tbaa nodes when extracting fields.


Full diff: https://github.com/llvm/llvm-project/pull/84370.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenTBAA.cpp (+8)
  • (modified) clang/test/CodeGen/tbaa-struct.cpp (+3-5)
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 8a081612193978..69fb1e11edc7dc 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -286,6 +286,14 @@ CodeGenTBAA::CollectFields(uint64_t BaseOffset,
   /* Things not handled yet include: C++ base classes, bitfields, */
 
   if (const RecordType *TTy = QTy->getAs<RecordType>()) {
+    if (TTy->isUnionType()) {
+      uint64_t Size = Context.getTypeSizeInChars(QTy).getQuantity();
+      llvm::MDNode *TBAAType = getChar();
+      llvm::MDNode *TBAATag = getAccessTagInfo(TBAAAccessInfo(TBAAType, Size));
+      Fields.push_back(
+          llvm::MDBuilder::TBAAStructField(BaseOffset, Size, TBAATag));
+      return true;
+    }
     const RecordDecl *RD = TTy->getDecl()->getDefinition();
     if (RD->hasFlexibleArrayMember())
       return false;
diff --git a/clang/test/CodeGen/tbaa-struct.cpp b/clang/test/CodeGen/tbaa-struct.cpp
index 63e4097946448e..9b4b7415142d93 100644
--- a/clang/test/CodeGen/tbaa-struct.cpp
+++ b/clang/test/CodeGen/tbaa-struct.cpp
@@ -191,7 +191,7 @@ void copy12(UnionMember2 *a1, UnionMember2 *a2) {
 // (offset, size) = (0,1) char; (4,2) short; (8,4) int; (12,1) char; (16,4) int; (20,4) int
 // CHECK-OLD: [[TS2]] = !{i64 0, i64 1, !{{.*}}, i64 4, i64 2, !{{.*}}, i64 8, i64 4, !{{.*}}, i64 12, i64 1, !{{.*}}, i64 16, i64 4, {{.*}}, i64 20, i64 4, {{.*}}}
 // (offset, size) = (0,8) char; (0,2) char; (4,8) char
-// CHECK-OLD: [[TS3]] = !{i64 0, i64 8, !{{.*}}, i64 0, i64 2, !{{.*}}, i64 4, i64 8, !{{.*}}}
+// CHECK-OLD: [[TS3]] = !{i64 0, i64 12, [[TAG_CHAR]]}
 // CHECK-OLD: [[TS4]] = !{i64 0, i64 1, [[TAG_CHAR]], i64 1, i64 1, [[TAG_CHAR]], i64 2, i64 1, [[TAG_CHAR]]}
 // CHECK-OLD: [[TS5]] = !{i64 0, i64 1, [[TAG_CHAR]], i64 4, i64 1, [[TAG_CHAR]], i64 5, i64 1, [[TAG_CHAR]]}
 // CHECK-OLD: [[TS6]] = !{i64 0, i64 2, [[TAG_CHAR]], i64 2, i64 1, [[TAG_CHAR]], i64 8, i64 8, [[TAG_DOUBLE:!.+]]}
@@ -199,10 +199,8 @@ void copy12(UnionMember2 *a1, UnionMember2 *a2) {
 // CHECK-OLD  [[DOUBLE]] = !{!"double", [[CHAR]], i64 0}
 // CHECK-OLD: [[TS7]] = !{i64 0, i64 1, [[TAG_CHAR]], i64 1, i64 1, [[TAG_CHAR]], i64 2, i64 1, [[TAG_CHAR]], i64 3, i64 1, [[TAG_CHAR]], i64 4, i64 1, [[TAG_CHAR]], i64 8, i64 8, [[TAG_DOUBLE]], i64 16, i64 1, [[TAG_CHAR]]}
 // CHECK-OLD: [[TS8]] = !{i64 0, i64 4, [[TAG_CHAR]], i64 8, i64 8, [[TAG_DOUBLE]]}
-// CHECK-OLD: [[TS9]] = !{i64 0, i64 8, [[TAG_DOUBLE]], i64 0, i64 4, [[TAG_FLOAT:!.+]], i64 8, i64 4, [[TAG_INT]]}
-// CHECK-OLD: [[TAG_FLOAT]]  = !{[[FLOAT:!.+]], [[FLOAT]], i64 0}
-// CHECK-OLD: [[FLOAT]] = !{!"float", [[CHAR]], i64 0}
-// CHECK-OLD: [[TS10]] = !{i64 0, i64 4, [[TAG_INT]], i64 8, i64 8, [[TAG_DOUBLE]], i64 8, i64 4, [[TAG_FLOAT:!.+]]}
+// CHECK-OLD: [[TS9]] = !{i64 0, i64 8, [[TAG_CHAR]], i64 8, i64 4, [[TAG_INT]]}
+// CHECK-OLD: [[TS10]] = !{i64 0, i64 4, [[TAG_INT]], i64 8, i64 8, [[TAG_CHAR]]}
 
 // CHECK-NEW-DAG: [[TYPE_char:!.*]] = !{{{.*}}, i64 1, !"omnipotent char"}
 // CHECK-NEW-DAG: [[TAG_char]] = !{[[TYPE_char]], [[TYPE_char]], i64 0, i64 0}

@fhahn
Copy link
Contributor Author

fhahn commented Mar 11, 2024

ping :) (apologies for early ping, but I'd like to bump this as this is a potential mis-compile I think(

@dobbelaj-snps
Copy link
Contributor

Do you have a testcase that shows how this goes wrong without this change ?

@fhahn
Copy link
Contributor Author

fhahn commented Mar 11, 2024

Do you have a testcase that shows how this goes wrong without this change ?

I am not 100% if the type-based aliasing rules allow aliasing a union access containing a float and a plain pointer to float, but I am thinking about something like the below, where we remove the first store to f, even though f may alias the union if the float is selected. Again, not sure if that's 100% within the C/C++ type-based aliasing rules, but it is what I could construct easily.

Irrespective of this it should show how the TBAA tags for the load of u incorrectly claims it is loading an int.

union U2 {
  int d;
  float f;
};

struct UnionMember2 {
  int p;
  U2 u;
};

void use(int, U2);
void copy12( UnionMember2 *a2, float *f) {
  *f = 1.0;
  UnionMember2 a1 = *a2;
  int p = a1.p;
  U2 u = a1.u;
  *f = 2.0;
  use(p, u);
}

https://clang.godbolt.org/z/Y6PPWEo69

Copy link
Contributor

@dobbelaj-snps dobbelaj-snps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@fhahn fhahn merged commit 5feaef6 into llvm:main Mar 11, 2024
@fhahn fhahn deleted the tbaa-struct-union branch March 11, 2024 19:40
@efriedma-quic
Copy link
Collaborator

This seems fine.

In some cases, we could maybe prove certain bytes are padding regardless of the current representation, but probably not worth the effort to implement.

fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 11, 2024
…lvm#84370)

At the moment,distinct fields for each union member are generated. When
copying a union, we don't know which union member is active, so there's
no benefit from recording the different fields. It can result in
converting tbaa.struct fields to incorrect tbaa nodes when extracting
fields.

PR: llvm#84370

(cherry-picked from 5feaef6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants