Skip to content

[clang][dataflow] Fix crash when operator= result type is not destination type. #90898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 6, 2024

Conversation

martinboehme
Copy link
Contributor

The existing code was full of comments about how we assume this is always the
case, but it's not mandated by the standard, and there is code out there that
returns a different type. So check that the result type is in fact the same as
the destination type before attempting to copy to the result.

To make sure that we don't bail out in more cases than intended, I've extended
existing tests to verify that in the common case, we do return the destination
object (by reference or value, as the case may be).

…nation type.

The existing code was full of comments about how we assume this is always the
case, but it's not mandated by the standard, and there is code out there that
returns a different type. So check that the result type is in fact the same as
the destination type before attempting to copy to the result.

To make sure that we don't bail out in more cases than intended, I've extended
existing tests to verify that in the common case, we do return the destination
object (by reference or value, as the case may be).
@martinboehme martinboehme requested a review from ymand May 2, 2024 19:51
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels May 2, 2024
@martinboehme martinboehme requested a review from Xazax-hun May 2, 2024 19:51
@llvmbot
Copy link
Member

llvmbot commented May 2, 2024

@llvm/pr-subscribers-clang

Author: None (martinboehme)

Changes

The existing code was full of comments about how we assume this is always the
case, but it's not mandated by the standard, and there is code out there that
returns a different type. So check that the result type is in fact the same as
the destination type before attempting to copy to the result.

To make sure that we don't bail out in more cases than intended, I've extended
existing tests to verify that in the common case, we do return the destination
object (by reference or value, as the case may be).


Full diff: https://github.com/llvm/llvm-project/pull/90898.diff

2 Files Affected:

  • (modified) clang/lib/Analysis/FlowSensitive/Transfer.cpp (+15-6)
  • (modified) clang/unittests/Analysis/FlowSensitive/TransferTest.cpp (+71-2)
diff --git a/clang/lib/Analysis/FlowSensitive/Transfer.cpp b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
index fd224aeb79b151..5a57f11b000d8a 100644
--- a/clang/lib/Analysis/FlowSensitive/Transfer.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
@@ -556,14 +556,23 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
 
       copyRecord(*LocSrc, *LocDst, Env);
 
-      // If the expr is a glvalue, we can reasonably assume the operator is
-      // returning T& and thus we can assign it `LocDst`.
-      if (S->isGLValue()) {
+      // The assignment operator can have an arbitrary return type. We model the
+      // return value only if the return type is the same as or a base class of
+      // the destination type.
+      if (S->getType().getCanonicalType().getUnqualifiedType() !=
+          LocDst->getType().getCanonicalType().getUnqualifiedType()) {
+        auto ReturnDecl = S->getType()->getAsCXXRecordDecl();
+        auto DstDecl = LocDst->getType()->getAsCXXRecordDecl();
+        if (ReturnDecl == nullptr || DstDecl == nullptr)
+          return;
+        if (!DstDecl->isDerivedFrom(ReturnDecl))
+          return;
+      }
+
+      if (S->isGLValue())
         Env.setStorageLocation(*S, *LocDst);
-      } else if (S->getType()->isRecordType()) {
-        // Assume that the assignment returns the assigned value.
+      else
         copyRecord(*LocDst, Env.getResultObjectLocation(*S), Env);
-      }
 
       return;
     }
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 301bec32c0cf1d..0b441faa6c76da 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -2228,7 +2228,7 @@ TEST(TransferTest, AssignmentOperator) {
       A Foo = { 1 };
       A Bar = { 2 };
       // [[p1]]
-      Foo = Bar;
+      A &Rval = (Foo = Bar);
       // [[p2]]
       Foo.Baz = 3;
       // [[p3]]
@@ -2274,6 +2274,7 @@ TEST(TransferTest, AssignmentOperator) {
               cast<RecordStorageLocation>(Env2.getStorageLocation(*BarDecl));
 
           EXPECT_TRUE(recordsEqual(*FooLoc2, *BarLoc2, Env2));
+          EXPECT_EQ(&getLocForDecl(ASTCtx, Env2, "Rval"), FooLoc2);
 
           const auto *FooBazVal2 =
               cast<IntegerValue>(getFieldValue(FooLoc2, *BazDecl, Env2));
@@ -2441,7 +2442,75 @@ TEST(TransferTest, AssignmentOperatorReturnsByValue) {
   // This is a crash repro.
   std::string Code = R"(
     struct S {
-      S operator=(S&& other);
+      S operator=(const S&);
+      int i;
+    };
+    void target() {
+      S S1 = { 1 };
+      S S2 = { 2 };
+      S S3 = { 3 };
+      // [[before]]
+      // Test that the returned value is modeled by assigning to another value.
+      S1 = (S2 = S3);
+      (void)0;
+      // [[after]]
+    }
+  )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {
+        const ValueDecl *S1Decl = findValueDecl(ASTCtx, "S1");
+        const ValueDecl *S2Decl = findValueDecl(ASTCtx, "S2");
+        const ValueDecl *S3Decl = findValueDecl(ASTCtx, "S3");
+
+        const Environment &EnvBefore =
+            getEnvironmentAtAnnotation(Results, "before");
+
+        EXPECT_FALSE(recordsEqual(
+            *EnvBefore.get<RecordStorageLocation>(*S1Decl),
+            *EnvBefore.get<RecordStorageLocation>(*S2Decl), EnvBefore));
+        EXPECT_FALSE(recordsEqual(
+            *EnvBefore.get<RecordStorageLocation>(*S2Decl),
+            *EnvBefore.get<RecordStorageLocation>(*S3Decl), EnvBefore));
+
+        const Environment &EnvAfter =
+            getEnvironmentAtAnnotation(Results, "after");
+
+        EXPECT_TRUE(recordsEqual(*EnvAfter.get<RecordStorageLocation>(*S1Decl),
+                                 *EnvAfter.get<RecordStorageLocation>(*S2Decl),
+                                 EnvAfter));
+        EXPECT_TRUE(recordsEqual(*EnvAfter.get<RecordStorageLocation>(*S2Decl),
+                                 *EnvAfter.get<RecordStorageLocation>(*S3Decl),
+                                 EnvAfter));
+      });
+}
+
+TEST(TransferTest, AssignmentOperatorReturnsDifferentTypeByRef) {
+  // This is a crash repro.
+  std::string Code = R"(
+    struct DifferentType {};
+    struct S {
+      DifferentType& operator=(const S&);
+    };
+    void target() {
+      S s;
+      s = S();
+      // [[p]]
+    }
+  )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {});
+}
+
+TEST(TransferTest, AssignmentOperatorReturnsDifferentTypeByValue) {
+  // This is a crash repro.
+  std::string Code = R"(
+    struct DifferentType {};
+    struct S {
+      DifferentType operator=(const S&);
     };
     void target() {
       S s;

@llvmbot
Copy link
Member

llvmbot commented May 2, 2024

@llvm/pr-subscribers-clang-analysis

Author: None (martinboehme)

Changes

The existing code was full of comments about how we assume this is always the
case, but it's not mandated by the standard, and there is code out there that
returns a different type. So check that the result type is in fact the same as
the destination type before attempting to copy to the result.

To make sure that we don't bail out in more cases than intended, I've extended
existing tests to verify that in the common case, we do return the destination
object (by reference or value, as the case may be).


Full diff: https://github.com/llvm/llvm-project/pull/90898.diff

2 Files Affected:

  • (modified) clang/lib/Analysis/FlowSensitive/Transfer.cpp (+15-6)
  • (modified) clang/unittests/Analysis/FlowSensitive/TransferTest.cpp (+71-2)
diff --git a/clang/lib/Analysis/FlowSensitive/Transfer.cpp b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
index fd224aeb79b151..5a57f11b000d8a 100644
--- a/clang/lib/Analysis/FlowSensitive/Transfer.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
@@ -556,14 +556,23 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
 
       copyRecord(*LocSrc, *LocDst, Env);
 
-      // If the expr is a glvalue, we can reasonably assume the operator is
-      // returning T& and thus we can assign it `LocDst`.
-      if (S->isGLValue()) {
+      // The assignment operator can have an arbitrary return type. We model the
+      // return value only if the return type is the same as or a base class of
+      // the destination type.
+      if (S->getType().getCanonicalType().getUnqualifiedType() !=
+          LocDst->getType().getCanonicalType().getUnqualifiedType()) {
+        auto ReturnDecl = S->getType()->getAsCXXRecordDecl();
+        auto DstDecl = LocDst->getType()->getAsCXXRecordDecl();
+        if (ReturnDecl == nullptr || DstDecl == nullptr)
+          return;
+        if (!DstDecl->isDerivedFrom(ReturnDecl))
+          return;
+      }
+
+      if (S->isGLValue())
         Env.setStorageLocation(*S, *LocDst);
-      } else if (S->getType()->isRecordType()) {
-        // Assume that the assignment returns the assigned value.
+      else
         copyRecord(*LocDst, Env.getResultObjectLocation(*S), Env);
-      }
 
       return;
     }
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 301bec32c0cf1d..0b441faa6c76da 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -2228,7 +2228,7 @@ TEST(TransferTest, AssignmentOperator) {
       A Foo = { 1 };
       A Bar = { 2 };
       // [[p1]]
-      Foo = Bar;
+      A &Rval = (Foo = Bar);
       // [[p2]]
       Foo.Baz = 3;
       // [[p3]]
@@ -2274,6 +2274,7 @@ TEST(TransferTest, AssignmentOperator) {
               cast<RecordStorageLocation>(Env2.getStorageLocation(*BarDecl));
 
           EXPECT_TRUE(recordsEqual(*FooLoc2, *BarLoc2, Env2));
+          EXPECT_EQ(&getLocForDecl(ASTCtx, Env2, "Rval"), FooLoc2);
 
           const auto *FooBazVal2 =
               cast<IntegerValue>(getFieldValue(FooLoc2, *BazDecl, Env2));
@@ -2441,7 +2442,75 @@ TEST(TransferTest, AssignmentOperatorReturnsByValue) {
   // This is a crash repro.
   std::string Code = R"(
     struct S {
-      S operator=(S&& other);
+      S operator=(const S&);
+      int i;
+    };
+    void target() {
+      S S1 = { 1 };
+      S S2 = { 2 };
+      S S3 = { 3 };
+      // [[before]]
+      // Test that the returned value is modeled by assigning to another value.
+      S1 = (S2 = S3);
+      (void)0;
+      // [[after]]
+    }
+  )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {
+        const ValueDecl *S1Decl = findValueDecl(ASTCtx, "S1");
+        const ValueDecl *S2Decl = findValueDecl(ASTCtx, "S2");
+        const ValueDecl *S3Decl = findValueDecl(ASTCtx, "S3");
+
+        const Environment &EnvBefore =
+            getEnvironmentAtAnnotation(Results, "before");
+
+        EXPECT_FALSE(recordsEqual(
+            *EnvBefore.get<RecordStorageLocation>(*S1Decl),
+            *EnvBefore.get<RecordStorageLocation>(*S2Decl), EnvBefore));
+        EXPECT_FALSE(recordsEqual(
+            *EnvBefore.get<RecordStorageLocation>(*S2Decl),
+            *EnvBefore.get<RecordStorageLocation>(*S3Decl), EnvBefore));
+
+        const Environment &EnvAfter =
+            getEnvironmentAtAnnotation(Results, "after");
+
+        EXPECT_TRUE(recordsEqual(*EnvAfter.get<RecordStorageLocation>(*S1Decl),
+                                 *EnvAfter.get<RecordStorageLocation>(*S2Decl),
+                                 EnvAfter));
+        EXPECT_TRUE(recordsEqual(*EnvAfter.get<RecordStorageLocation>(*S2Decl),
+                                 *EnvAfter.get<RecordStorageLocation>(*S3Decl),
+                                 EnvAfter));
+      });
+}
+
+TEST(TransferTest, AssignmentOperatorReturnsDifferentTypeByRef) {
+  // This is a crash repro.
+  std::string Code = R"(
+    struct DifferentType {};
+    struct S {
+      DifferentType& operator=(const S&);
+    };
+    void target() {
+      S s;
+      s = S();
+      // [[p]]
+    }
+  )";
+  runDataflow(
+      Code,
+      [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+         ASTContext &ASTCtx) {});
+}
+
+TEST(TransferTest, AssignmentOperatorReturnsDifferentTypeByValue) {
+  // This is a crash repro.
+  std::string Code = R"(
+    struct DifferentType {};
+    struct S {
+      DifferentType operator=(const S&);
     };
     void target() {
       S s;

auto DstDecl = LocDst->getType()->getAsCXXRecordDecl();
if (ReturnDecl == nullptr || DstDecl == nullptr)
return;
if (!DstDecl->isDerivedFrom(ReturnDecl))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to create a fresh storage location here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's consistent with our more general approach -- we treat "storage location not set" as meaning "we have no information" 1. For example, if above we discover that this isn't a copy or move assignment operator, we bail out without doing anything.

Granted, this case is a bit different in that we have already modeled the assignment itself (by doing the copyRecord()) above, but I think it's still consistent to do this but decide we don't know how to model the return value / location.

Footnotes

  1. Of course, a fresh storage location that isn't otherwise constrained expresses essentially the same thing, but it's simpler to instead not assign a storage location at all.

@martinboehme
Copy link
Contributor Author

CI failures appear to be unrelated.

@martinboehme martinboehme merged commit 0348e71 into llvm:main May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants