-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[Clang][Sema] Fix malformed AST for anonymous class access in template. #90842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clang][Sema] Fix malformed AST for anonymous class access in template. #90842
Conversation
# Observed erroneous behavior Prior to this change, a `MemberExpr` that accesses an anonymous class might have a prvalue as its base (even though C++ mandates that the base of a `MemberExpr` must be a glvalue), if the code containing the `MemberExpr` was in a template. Here's an example on [godbolt](https://godbolt.org/z/Gz1Mer9oz) (that is essentially identical to the new test this patch adds). This example sets up a struct containing an anonymous struct: ```cxx struct S { struct { int i; }; }; ``` It then accesses the member `i` using the expression `S().i`. When we do this in a non-template function, we get the following AST: ``` `-ExprWithCleanups <col:10, col:14> 'int' `-ImplicitCastExpr <col:10, col:14> 'int' <LValueToRValue> `-MemberExpr <col:10, col:14> 'int' xvalue .i 0xbdcb3c0 `-MemberExpr <col:10, col:14> 'S::(anonymous struct at line:2:3)' xvalue .S::(anonymous struct at line:2:3) 0xbdcb488 `-MaterializeTemporaryExpr <col:10, col:12> 'S' xvalue `-CXXTemporaryObjectExpr <col:10, col:12> 'S' 'void () noexcept' zeroing ``` As expected, the AST contains a `MaterializeTemporarExpr` to materialize the prvalue `S()` before accessing its members. When we perform this access in a function template (that doesn't actually even use its template parameter), the AST for the template itself looks the same as above. However, the AST for an instantiation of the template looks different: ``` `-ExprWithCleanups <col:10, col:14> 'int' `-ImplicitCastExpr <col:10, col:14> 'int' <LValueToRValue> `-MemberExpr <col:10, col:14> 'int' xvalue .i 0xbdcb3c0 `-MaterializeTemporaryExpr <col:10, col:14> 'S::(anonymous struct at line:2:3)' xvalue `-MemberExpr <col:10, col:14> 'S::(anonymous struct at line:2:3)' .S::(anonymous struct at line:2:3) 0xbdcb488 `-CXXTemporaryObjectExpr <col:10, col:12> 'S' 'void () noexcept' zeroing ``` Note how the inner `MemberExpr` (the one accessing the anonymous struct) acts on a prvalue. Interestingly, this does not appear to cause any problems for CodeGen, probably because CodeGen is set up to deal with `MemberExpr`s on rvalues in C. However, it does cause issues in the dataflow framework, which only supports C++ today and expects the base of a `MemberExpr` to be a glvalue. Beyond the issues with the dataflow framework, I think this issue should be fixed because it goes contrary to what the C++ standard mandates, and the AST produced for the non-template case indicates that we want to follow the C++ rules here. # Reasons for erroneous behavior Here's why we're getting this malformed AST. First of all, `TreeTransform` [strips any `MaterializeTemporaryExpr`s](https://github.com/llvm/llvm-project/blob/cd132dcbeb0fc79fd657bd5e0a8e9244c3fb5da6/clang/lib/Sema/TreeTransform.h#L14853) from the AST. It is therefore up to [`TreeTransform::RebuildMemberExpr()`](https://github.com/llvm/llvm-project/blob/cd132dcbeb0fc79fd657bd5e0a8e9244c3fb5da6/clang/lib/Sema/TreeTransform.h#L2853) to recreate a `MaterializeTemporaryExpr` if needed. In the [general case](https://github.com/llvm/llvm-project/blob/cd132dcbeb0fc79fd657bd5e0a8e9244c3fb5da6/clang/lib/Sema/TreeTransform.h#L2915), it does this: It calls `Sema::BuildMemberReferenceExpr()`, which ensures that the base is a glvalue by [materializing a temporary](https://github.com/llvm/llvm-project/blob/cd132dcbeb0fc79fd657bd5e0a8e9244c3fb5da6/clang/lib/Sema/SemaExprMember.cpp#L1016) if needed. However, when `TreeTransform::RebuildMemberExpr()` encounters an anonymous class, it [calls `Sema::BuildFieldReferenceExpr()`](https://github.com/llvm/llvm-project/blob/cd132dcbeb0fc79fd657bd5e0a8e9244c3fb5da6/clang/lib/Sema/TreeTransform.h#L2880), which, unlike `Sema::BuildMemberReferenceExpr()`, does not make sure that the base is a glvalue. # Proposed fix I considered several possible ways to fix this issue: - Add logic to `Sema::BuildFieldReferenceExpr()` that materializes a temporary if needed. This appears to work, but it feels like the fix is in the wrong place: - AFAIU, other callers of `Sema::BuildFieldReferenceExpr()` don't need this logic. - The issue is caused by `TreeTransform` removing the `MaterializeTemporaryExpr`, so it seems the fix should also be in `TreeTransform` - Materialize the temporary directly in `TreeTransform::RebuildMemberExpr()` if needed (within the case that deals with anonymous classes). This would work, too, but it would duplicate logic that already exists in `Sema::BuildMemberReferenceExpr()` (which we leverage for the general case). - Use `Sema::BuildMemberReferenceExpr()` instead of `Sema::BuildFieldReferenceExpr()` for the anonymous class case, so that it also uses the existing logic for materializing the temporary. This is the option I've decided to go with here. There's a slight wrinkle in that we create a `LookupResult` that claims we looked up the unnamed field for the anonymous class -- even though we would obviously never be able to look up an unnamed field. I think this is defensible and still better than the other alternatives, but I would welcome feedback on this from others who know the code better.
@llvm/pr-subscribers-clang Author: None (martinboehme) ChangesObserved erroneous behaviorPrior to this change, a Here's an example on godbolt (that is This example sets up a struct containing an anonymous struct: struct S {
struct {
int i;
};
}; It then accesses the member When we do this in a non-template function, we get the following AST:
As expected, the AST contains a When we perform this access in a function template (that doesn't actually even
Note how the inner Interestingly, this does not appear to cause any problems for CodeGen, probably Beyond the issues with the dataflow framework, I think this issue should be Reasons for erroneous behaviorHere's why we're getting this malformed AST. First of all, It is therefore up to Proposed fixI considered several possible ways to fix this issue:
Full diff: https://github.com/llvm/llvm-project/pull/90842.diff 3 Files Affected:
diff --git a/clang/lib/Sema/TreeTransform.h b/clang/lib/Sema/TreeTransform.h
index f47bc219e6fa32..9c800c5705c7c6 100644
--- a/clang/lib/Sema/TreeTransform.h
+++ b/clang/lib/Sema/TreeTransform.h
@@ -2876,10 +2876,21 @@ class TreeTransform {
return ExprError();
Base = BaseResult.get();
+ // We want to use `BuildMemberReferenceExpr()` so we can use its logic
+ // that materializes `Base` into a temporary if it's a prvalue.
+ // To do so, we need to create a `LookupResult` for `Member`, even though
+ // it's an unnamed field (that we could never actually have looked up).
+ // This small hack seems preferable to duplicating the logic for
+ // materializing the temporary.
+ LookupResult R(getSema(), MemberNameInfo, Sema::LookupMemberName);
+ R.addDecl(Member);
+ R.resolveKind();
+
CXXScopeSpec EmptySS;
- return getSema().BuildFieldReferenceExpr(
- Base, isArrow, OpLoc, EmptySS, cast<FieldDecl>(Member),
- DeclAccessPair::make(FoundDecl, FoundDecl->getAccess()), MemberNameInfo);
+ return getSema().BuildMemberReferenceExpr(
+ Base, Base->getType(), OpLoc, isArrow, EmptySS, TemplateKWLoc,
+ FirstQualifierInScope, R, ExplicitTemplateArgs,
+ /*S*/ nullptr);
}
CXXScopeSpec SS;
diff --git a/clang/test/AST/ast-dump-anonymous-class.cpp b/clang/test/AST/ast-dump-anonymous-class.cpp
new file mode 100644
index 00000000000000..393c084c913d15
--- /dev/null
+++ b/clang/test/AST/ast-dump-anonymous-class.cpp
@@ -0,0 +1,49 @@
+// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-unknown -ast-dump %s \
+// RUN: | FileCheck -strict-whitespace %s
+
+struct S {
+ struct {
+ int i;
+ };
+};
+
+int accessInRegularFunction() {
+ return S().i;
+ // CHECK: FunctionDecl {{.*}} accessInRegularFunction 'int ()'
+ // CHECK: | `-ReturnStmt {{.*}}
+ // CHECK-NEXT: | `-ExprWithCleanups {{.*}} 'int'
+ // CHECK-NEXT: | `-ImplicitCastExpr {{.*}} 'int' <LValueToRValue>
+ // CHECK-NEXT: | `-MemberExpr {{.*}} 'int' xvalue .i
+ // CHECK-NEXT: | `-MemberExpr {{.*}} 'S::(anonymous struct at {{.*}})
+ // CHECK-NEXT: | `-MaterializeTemporaryExpr {{.*}} 'S' xvalue
+ // CHECK-NEXT: | `-CXXTemporaryObjectExpr {{.*}} 'S' 'void () noexcept' zeroing
+}
+
+// AST should look the same in a function template with an unused template
+// parameter.
+template <class>
+int accessInFunctionTemplate() {
+ return S().i;
+ // CHECK: FunctionDecl {{.*}} accessInFunctionTemplate 'int ()'
+ // CHECK: | `-ReturnStmt {{.*}}
+ // CHECK-NEXT: | `-ExprWithCleanups {{.*}} 'int'
+ // CHECK-NEXT: | `-ImplicitCastExpr {{.*}} 'int' <LValueToRValue>
+ // CHECK-NEXT: | `-MemberExpr {{.*}} 'int' xvalue .i
+ // CHECK-NEXT: | `-MemberExpr {{.*}} 'S::(anonymous struct at {{.*}})
+ // CHECK-NEXT: | `-MaterializeTemporaryExpr {{.*}} 'S' xvalue
+ // CHECK-NEXT: | `-CXXTemporaryObjectExpr {{.*}} 'S' 'void () noexcept' zeroing
+}
+
+// AST should look the same in an instantiation of the function template.
+// This is a regression test: The AST used to contain the
+// `MaterializeTemporaryExpr` in the wrong place, causing a `MemberExpr` to have
+// a prvalue base (which is not allowed in C++).
+template int accessInFunctionTemplate<int>();
+ // CHECK: FunctionDecl {{.*}} accessInFunctionTemplate 'int ()' explicit_instantiation_definition
+ // CHECK: `-ReturnStmt {{.*}}
+ // CHECK-NEXT: `-ExprWithCleanups {{.*}} 'int'
+ // CHECK-NEXT: `-ImplicitCastExpr {{.*}} 'int' <LValueToRValue>
+ // CHECK-NEXT: `-MemberExpr {{.*}} 'int' xvalue .i
+ // CHECK-NEXT: `-MemberExpr {{.*}} 'S::(anonymous struct at {{.*}})
+ // CHECK-NEXT: `-MaterializeTemporaryExpr {{.*}} 'S' xvalue
+ // CHECK-NEXT: `-CXXTemporaryObjectExpr {{.*}} 'S' 'void () noexcept' zeroing
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 301bec32c0cf1d..3f68adaf1c62d2 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -27,6 +27,14 @@
#include <string>
#include <utility>
+namespace clang {
+namespace dataflow {
+namespace {
+AST_MATCHER(FunctionDecl, isTemplated) { return Node.isTemplated(); }
+} // namespace
+} // namespace dataflow
+} // namespace clang
+
namespace {
using namespace clang;
@@ -7205,4 +7213,31 @@ TEST(TransferTest, ConditionalRelation) {
});
}
+// This is a crash repro.
+// We used to crash while transferring `S().i` because Clang contained a bug
+// causing the AST to be malformed.
+TEST(TransferTest, AnonymousUnionMemberExprInTemplate) {
+ using ast_matchers::functionDecl;
+ using ast_matchers::hasName;
+ using ast_matchers::unless;
+
+ std::string Code = R"cc(
+ struct S {
+ struct {
+ int i;
+ };
+ };
+
+ template <class>
+ void target() {
+ S().i;
+ }
+
+ template void target<int>();
+ )cc";
+ auto Matcher = functionDecl(hasName("target"), unless(isTemplated()));
+ ASSERT_THAT_ERROR(checkDataflowWithNoopAnalysis(Code, Matcher),
+ llvm::Succeeded());
+}
+
} // namespace
|
@AaronBallman You're probably not the right person to review this yourself, but I'm told you probably have an idea of who might be an appropriate reviewer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this fix.
You should reference the issue that this fixes in your summary.
This also need a release note.
clang/lib/Sema/TreeTransform.h
Outdated
return getSema().BuildMemberReferenceExpr( | ||
Base, Base->getType(), OpLoc, isArrow, EmptySS, TemplateKWLoc, | ||
FirstQualifierInScope, R, ExplicitTemplateArgs, | ||
/*S*/ nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/*S*/ nullptr); | |
/*S=*/nullptr); |
We use bugprone-argument-comment format for these.
I actually prefer option#2. This doesn't really do what it should, so I think we should just make sure we materialize the temporary if necessary in |
Note there is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility to consider: when transforming a member access, strip off any implicit member accesses from the base expression before transforming the base. That way, we'll rebuild the implicit member access from scratch, which might be a little more work, but should get details like this one right. (We can then probably also remove the logic to deal with an absent DeclName
in RebuildMemberExpr
since that shouldn't happen any more.)
There isn't currently a github issue for this. Should I create one?
Added. How does this look? (Is it in the appropriate section?) |
Done. WDYT? |
This sounds enticing, but after experimenting with this a bit, I’ve concluded that it’s unfortunately not so simple to get this working (unless there’s something I’m overlooking). Looking at the relevant part of the AST: `-MemberExpr <col:10, col:14> 'int' xvalue .i 0xbdcb3c0
`-MemberExpr <col:10, col:14> 'S::(anonymous struct at line:2:3)' xvalue .S::(anonymous struct at line:2:3) 0xbdcb488
`-MaterializeTemporaryExpr <col:10, col:12> 'S' xvalue
`-CXXTemporaryObjectExpr <col:10, col:12> 'S' 'void () noexcept' zeroing What I think you’re suggesting is that while processing the base for the outer This would work if we handed an The My feeling is that rather than trying to deal with all of the various cases involved in stripping, then recreating the implicit |
Thanks. The issue with this is that it requires an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems simple enough, and the AST output looks reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing further on my side, LGTM
CI failure (DataFlowSanitizer-x86_64 :: release_shadow_space.c) looks unrelated. Will merge now. @shafik If you have any remaining comments, happy to address those in a followup PR. |
Observed erroneous behavior
Prior to this change, a
MemberExpr
that accesses an anonymous class might have a prvalue as its base (even though C++ mandates that the base of aMemberExpr
must be a glvalue), if the code containing theMemberExpr
was in a template.Here's an example on godbolt (that is essentially identical to the new test this patch adds).
This example sets up a struct containing an anonymous struct:
It then accesses the member
i
using the expressionS().i
.When we do this in a non-template function, we get the following AST:
As expected, the AST contains a
MaterializeTemporarExpr
to materialize the prvalueS()
before accessing its members.When we perform this access in a function template (that doesn't actually even use its template parameter), the AST for the template itself looks the same as above. However, the AST for an instantiation of the template looks different:
Note how the inner
MemberExpr
(the one accessing the anonymous struct) acts on a prvalue.Interestingly, this does not appear to cause any problems for CodeGen, probably because CodeGen is set up to deal with
MemberExpr
s on rvalues in C. However, it does cause issues in the dataflow framework, which only supports C++ today and expects the base of aMemberExpr
to be a glvalue.Beyond the issues with the dataflow framework, I think this issue should be fixed because it goes contrary to what the C++ standard mandates, and the AST produced for the non-template case indicates that we want to follow the C++ rules here.
Reasons for erroneous behavior
Here's why we're getting this malformed AST.
First of all,
TreeTransform
strips anyMaterializeTemporaryExpr
s from the AST.It is therefore up to
TreeTransform::RebuildMemberExpr()
to recreate aMaterializeTemporaryExpr
if needed. In the general case, it does this: It callsSema::BuildMemberReferenceExpr()
, which ensures that the base is a glvalue by materializing a temporary if needed. However, whenTreeTransform::RebuildMemberExpr()
encounters an anonymous class, it callsSema::BuildFieldReferenceExpr()
, which, unlikeSema::BuildMemberReferenceExpr()
, does not make sure that the base is a glvalue.Proposed fix
I considered several possible ways to fix this issue:
Add logic to
Sema::BuildFieldReferenceExpr()
that materializes a temporary if needed. This appears to work, but it feels like the fix is in the wrong place:Sema::BuildFieldReferenceExpr()
don't need this logic.TreeTransform
removing theMaterializeTemporaryExpr
, so it seems the fix should also be inTreeTransform
Materialize the temporary directly in
TreeTransform::RebuildMemberExpr()
if needed (within the case that deals with anonymous classes).This would work, too, but it would duplicate logic that already exists in
Sema::BuildMemberReferenceExpr()
(which we leverage for the general case).Use
Sema::BuildMemberReferenceExpr()
instead ofSema::BuildFieldReferenceExpr()
for the anonymous class case, so that it also uses the existing logic for materializing the temporary.This is the option I've decided to go with here. There's a slight wrinkle in that we create a
LookupResult
that claims we looked up the unnamed field for the anonymous class -- even though we would obviously never be able to look up an unnamed field. I think this is defensible and still better than the other alternatives, but I would welcome feedback on this from others who know the code better.