Skip to content

[clang][dataflow] Propagate locations from result objects to initializers. #87320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 10, 2024

Conversation

martinboehme
Copy link
Contributor

Previously, we were propagating storage locations the other way around, i.e.
from initializers to result objects, using RecordValue::getLoc(). This gave
the wrong behavior in some cases -- see the newly added or fixed tests in this
patch.

In addition, this patch now unblocks removing the RecordValue class entirely,
as we no longer need RecordValue::getLoc().

With this patch, the test TransferTest.DifferentReferenceLocInJoin started to
fail because the framework now always uses the same storge location for a
MaterializeTemporaryExpr, meaning that the code under test no longer set up
the desired state where a variable of reference type is mapped to two different
storage locations in environments being joined. Rather than trying to modify
this test to set up the test condition again, I have chosen to replace the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the test
condition directly; because this test is more direct, it will also be less
brittle in the face of future changes.

@martinboehme martinboehme requested a review from ymand April 2, 2024 08:00
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Apr 2, 2024
@martinboehme martinboehme requested a review from Xazax-hun April 2, 2024 08:00
@llvmbot
Copy link
Member

llvmbot commented Apr 2, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-analysis

Author: None (martinboehme)

Changes

Previously, we were propagating storage locations the other way around, i.e.
from initializers to result objects, using RecordValue::getLoc(). This gave
the wrong behavior in some cases -- see the newly added or fixed tests in this
patch.

In addition, this patch now unblocks removing the RecordValue class entirely,
as we no longer need RecordValue::getLoc().

With this patch, the test TransferTest.DifferentReferenceLocInJoin started to
fail because the framework now always uses the same storge location for a
MaterializeTemporaryExpr, meaning that the code under test no longer set up
the desired state where a variable of reference type is mapped to two different
storage locations in environments being joined. Rather than trying to modify
this test to set up the test condition again, I have chosen to replace the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the test
condition directly; because this test is more direct, it will also be less
brittle in the face of future changes.


Patch is 52.94 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87320.diff

6 Files Affected:

  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h (+44-20)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp (+304-98)
  • (modified) clang/lib/Analysis/FlowSensitive/Transfer.cpp (+73-99)
  • (modified) clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp (+3-10)
  • (modified) clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp (+43)
  • (modified) clang/unittests/Analysis/FlowSensitive/TransferTest.cpp (+115-57)
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index c30bccd06674a4..dc3a9c239552be 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -30,6 +30,7 @@
 #include "llvm/ADT/MapVector.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/ErrorHandling.h"
+#include <memory>
 #include <type_traits>
 #include <utility>
 
@@ -330,17 +331,6 @@ class Environment {
   /// location of the result object to pass in `this`, even though prvalues are
   /// otherwise not associated with storage locations.
   ///
-  /// FIXME: Currently, this simply returns a stable storage location for `E`,
-  /// but this doesn't do the right thing in scenarios like the following:
-  /// ```
-  /// MyClass c = some_condition()? MyClass(foo) : MyClass(bar);
-  /// ```
-  /// Here, `MyClass(foo)` and `MyClass(bar)` will have two different storage
-  /// locations, when in fact their storage locations should be the same.
-  /// Eventually, we want to propagate storage locations from result objects
-  /// down to the prvalues that initialize them, similar to the way that this is
-  /// done in Clang's CodeGen.
-  ///
   /// Requirements:
   ///  `E` must be a prvalue of record type.
   RecordStorageLocation &
@@ -448,7 +438,13 @@ class Environment {
   /// Initializes the fields (including synthetic fields) of `Loc` with values,
   /// unless values of the field type are not supported or we hit one of the
   /// limits at which we stop producing values.
-  void initializeFieldsWithValues(RecordStorageLocation &Loc);
+  /// If `Type` is provided, initializes only those fields that are modeled for
+  /// `Type`; this is intended for use in cases where `Loc` is a derived type
+  /// and we only want to initialize the fields of a base type.
+  void initializeFieldsWithValues(RecordStorageLocation &Loc, QualType Type);
+  void initializeFieldsWithValues(RecordStorageLocation &Loc) {
+    initializeFieldsWithValues(Loc, Loc.getType());
+  }
 
   /// Assigns `Val` as the value of `Loc` in the environment.
   void setValue(const StorageLocation &Loc, Value &Val);
@@ -639,6 +635,9 @@ class Environment {
   LLVM_DUMP_METHOD void dump(raw_ostream &OS) const;
 
 private:
+  using PrValueToResultObject =
+      llvm::DenseMap<const Expr *, RecordStorageLocation *>;
+
   // The copy-constructor is for use in fork() only.
   Environment(const Environment &) = default;
 
@@ -668,8 +667,10 @@ class Environment {
   /// Initializes the fields (including synthetic fields) of `Loc` with values,
   /// unless values of the field type are not supported or we hit one of the
   /// limits at which we stop producing values (controlled by `Visited`,
-  /// `Depth`, and `CreatedValuesCount`).
-  void initializeFieldsWithValues(RecordStorageLocation &Loc,
+  /// `Depth`, and `CreatedValuesCount`). If `Type` is different from
+  /// `Loc.getType()`, initializes only those fields that are modeled for
+  /// `Type`.
+  void initializeFieldsWithValues(RecordStorageLocation &Loc, QualType Type,
                                   llvm::DenseSet<QualType> &Visited, int Depth,
                                   int &CreatedValuesCount);
 
@@ -688,22 +689,45 @@ class Environment {
   /// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
   void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);
 
+  static PrValueToResultObject
+  buildResultObjectMap(DataflowAnalysisContext *DACtx,
+                       const FunctionDecl *FuncDecl,
+                       RecordStorageLocation *ThisPointeeLoc,
+                       RecordStorageLocation *LocForRecordReturnVal);
+
   // `DACtx` is not null and not owned by this object.
   DataflowAnalysisContext *DACtx;
 
-  // FIXME: move the fields `CallStack`, `ReturnVal`, `ReturnLoc` and
-  // `ThisPointeeLoc` into a separate call-context object, shared between
-  // environments in the same call.
+  // FIXME: move the fields `CallStack`, `ResultObjectMap`, `ReturnVal`,
+  // `ReturnLoc` and `ThisPointeeLoc` into a separate call-context object,
+  // shared between environments in the same call.
   // https://github.com/llvm/llvm-project/issues/59005
 
   // `DeclContext` of the block being analysed if provided.
   std::vector<const DeclContext *> CallStack;
 
-  // Value returned by the function (if it has non-reference return type).
+  // Maps from prvalues of record type to their result objects. Shared between
+  // all environments for the same function.
+  // FIXME: It's somewhat unsatisfactory that we have to use a `shared_ptr`
+  // here, though the cost is acceptable: The overhead of a `shared_ptr` is
+  // incurred when it is copied, and this happens only relatively rarely (when
+  // we fork the environment). The need for a `shared_ptr` will go away once we
+  // introduce a shared call-context object (see above).
+  std::shared_ptr<PrValueToResultObject> ResultObjectMap;
+
+  // The following three member variables handle various different types of
+  // return values.
+  // - If the return type is not a reference and not a record: Value returned
+  //   by the function.
   Value *ReturnVal = nullptr;
-  // Storage location of the reference returned by the function (if it has
-  // reference return type).
+  // - If the return type is a reference: Storage location of the reference
+  //   returned by the function.
   StorageLocation *ReturnLoc = nullptr;
+  // - If the return type is a record or the function being analyzed is a
+  //   constructor: Storage location into which the return value should be
+  //   constructed.
+  RecordStorageLocation *LocForRecordReturnVal = nullptr;
+
   // The storage location of the `this` pointee. Should only be null if the
   // function being analyzed is only a function and not a method.
   RecordStorageLocation *ThisPointeeLoc = nullptr;
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index f729d676dd0de8..67d5c444d52cb6 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -15,6 +15,7 @@
 #include "clang/Analysis/FlowSensitive/DataflowEnvironment.h"
 #include "clang/AST/Decl.h"
 #include "clang/AST/DeclCXX.h"
+#include "clang/AST/RecursiveASTVisitor.h"
 #include "clang/AST/Type.h"
 #include "clang/Analysis/FlowSensitive/DataflowLattice.h"
 #include "clang/Analysis/FlowSensitive/Value.h"
@@ -353,6 +354,8 @@ getFieldsGlobalsAndFuncs(const Stmt &S, FieldSet &Fields,
   for (auto *Child : S.children())
     if (Child != nullptr)
       getFieldsGlobalsAndFuncs(*Child, Fields, Vars, Funcs);
+  if (const auto *DefaultArg = dyn_cast<CXXDefaultArgExpr>(&S))
+    getFieldsGlobalsAndFuncs(*DefaultArg->getExpr(), Fields, Vars, Funcs);
   if (const auto *DefaultInit = dyn_cast<CXXDefaultInitExpr>(&S))
     getFieldsGlobalsAndFuncs(*DefaultInit->getExpr(), Fields, Vars, Funcs);
 
@@ -385,6 +388,185 @@ getFieldsGlobalsAndFuncs(const Stmt &S, FieldSet &Fields,
   }
 }
 
+namespace {
+
+// Visitor that builds a map from record prvalues to result objects.
+// This traverses the body of the function to be analyzed; for each result
+// object that it encounters, it propagates the storage location of the result
+// object to all record prvalues that can initialize it.
+class ResultObjectVisitor : public RecursiveASTVisitor<ResultObjectVisitor> {
+public:
+  // `ResultObjectMap` will be filled with a map from record prvalues to result
+  // object. If the function being analyzed returns a record by value,
+  // `LocForRecordReturnVal` is the location to which this record should be
+  // written; otherwise, it is null.
+  explicit ResultObjectVisitor(
+      llvm::DenseMap<const Expr *, RecordStorageLocation *> &ResultObjectMap,
+      RecordStorageLocation *LocForRecordReturnVal,
+      DataflowAnalysisContext &DACtx)
+      : ResultObjectMap(ResultObjectMap),
+        LocForRecordReturnVal(LocForRecordReturnVal), DACtx(DACtx) {}
+
+  bool shouldVisitImplicitCode() { return true; }
+
+  bool shouldVisitLambdaBody() const { return false; }
+
+  // Traverse all member and base initializers of `Ctor`. This function is not
+  // called by `RecursiveASTVisitor`; it should be called manually if we are
+  // analyzing a constructor. `ThisPointeeLoc` is the storage location that
+  // `this` points to.
+  void TraverseConstructorInits(const CXXConstructorDecl *Ctor,
+                                RecordStorageLocation *ThisPointeeLoc) {
+    assert(ThisPointeeLoc != nullptr);
+    for (const CXXCtorInitializer *Init : Ctor->inits()) {
+      Expr *InitExpr = Init->getInit();
+      if (FieldDecl *Field = Init->getMember();
+          Field != nullptr && Field->getType()->isRecordType()) {
+        PropagateResultObject(InitExpr, cast<RecordStorageLocation>(
+                                            ThisPointeeLoc->getChild(*Field)));
+      } else if (Init->getBaseClass()) {
+        PropagateResultObject(InitExpr, ThisPointeeLoc);
+      }
+
+      // Ensure that any result objects within `InitExpr` (e.g. temporaries)
+      // are also propagated to the prvalues that initialize them.
+      TraverseStmt(InitExpr);
+
+      // If this is a `CXXDefaultInitExpr`, also propagate any result objects
+      // within the default expression.
+      if (auto *DefaultInit = dyn_cast<CXXDefaultInitExpr>(InitExpr))
+        TraverseStmt(DefaultInit->getExpr());
+    }
+  }
+
+  bool TraverseBindingDecl(BindingDecl *BD) {
+    // `RecursiveASTVisitor` doesn't traverse holding variables for
+    // `BindingDecl`s by itself, so we need to tell it to.
+    if (VarDecl *HoldingVar = BD->getHoldingVar())
+      TraverseDecl(HoldingVar);
+    return RecursiveASTVisitor<ResultObjectVisitor>::TraverseBindingDecl(BD);
+  }
+
+  bool VisitVarDecl(VarDecl *VD) {
+    if (VD->getType()->isRecordType() && VD->hasInit())
+      PropagateResultObject(
+          VD->getInit(),
+          &cast<RecordStorageLocation>(DACtx.getStableStorageLocation(*VD)));
+    return true;
+  }
+
+  bool VisitMaterializeTemporaryExpr(MaterializeTemporaryExpr *MTE) {
+    if (MTE->getType()->isRecordType())
+      PropagateResultObject(
+          MTE->getSubExpr(),
+          &cast<RecordStorageLocation>(DACtx.getStableStorageLocation(*MTE)));
+    return true;
+  }
+
+  bool VisitReturnStmt(ReturnStmt *Return) {
+    Expr *RetValue = Return->getRetValue();
+    if (RetValue != nullptr && RetValue->getType()->isRecordType() &&
+        RetValue->isPRValue())
+      PropagateResultObject(RetValue, LocForRecordReturnVal);
+    return true;
+  }
+
+  bool VisitExpr(Expr *E) {
+    // Clang's AST can have record-type prvalues without a result object -- for
+    // example as full-expressions contained in a compound statement or as
+    // arguments of call expressions. We notice this if we get here and a
+    // storage location has not yet been associated with `E`. In this case,
+    // treat this as if it was a `MaterializeTemporaryExpr`.
+    if (E->isPRValue() && E->getType()->isRecordType() &&
+        !ResultObjectMap.contains(E))
+      PropagateResultObject(
+          E, &cast<RecordStorageLocation>(DACtx.getStableStorageLocation(*E)));
+    return true;
+  }
+
+  // Assigns `Loc` as the result object location of `E`, then propagates the
+  // location to all lower-level prvalues that initialize the same object as
+  // `E` (or one of its base classes or member variables).
+  void PropagateResultObject(Expr *E, RecordStorageLocation *Loc) {
+    if (!E->isPRValue() || !E->getType()->isRecordType()) {
+      assert(false);
+      // Ensure we don't propagate the result object if we hit this in a
+      // release build.
+      return;
+    }
+
+    ResultObjectMap[E] = Loc;
+
+    // The following AST node kinds are "original initializers": They are the
+    // lowest-level AST node that initializes a given object, and nothing
+    // below them can initialize the same object (or part of it).
+    if (isa<CXXConstructExpr>(E) || isa<CallExpr>(E) || isa<LambdaExpr>(E) ||
+        isa<CXXDefaultArgExpr>(E) || isa<CXXDefaultInitExpr>(E) ||
+        isa<CXXStdInitializerListExpr>(E)) {
+      return;
+    }
+
+    if (auto *InitList = dyn_cast<InitListExpr>(E)) {
+      if (!InitList->isSemanticForm())
+        return;
+      if (InitList->isTransparent()) {
+        for (Expr *Init : InitList->inits())
+          PropagateResultObject(Init, Loc);
+        return;
+      }
+
+      RecordInitListHelper InitListHelper(InitList);
+
+      for (auto [Base, Init] : InitListHelper.base_inits()) {
+        assert(Base->getType().getCanonicalType() ==
+               Init->getType().getCanonicalType());
+
+        // Storage location for the base class is the same as that of the
+        // derived class because we "flatten" the object hierarchy and put all
+        // fields in `RecordStorageLocation` of the derived class.
+        PropagateResultObject(Init, Loc);
+      }
+
+      for (auto [Field, Init] : InitListHelper.field_inits()) {
+        // Fields of non-record type are handled in
+        // `TransferVisitor::VisitInitListExpr()`.
+        if (!Field->getType()->isRecordType())
+          continue;
+        PropagateResultObject(
+            Init, cast<RecordStorageLocation>(Loc->getChild(*Field)));
+      }
+      return;
+    }
+
+    if (auto *Op = dyn_cast<BinaryOperator>(E); Op && Op->isCommaOp()) {
+      PropagateResultObject(Op->getRHS(), Loc);
+      return;
+    }
+
+    if (auto *Cond = dyn_cast<AbstractConditionalOperator>(E)) {
+      PropagateResultObject(Cond->getTrueExpr(), Loc);
+      PropagateResultObject(Cond->getFalseExpr(), Loc);
+      return;
+    }
+
+    // All other expression nodes that propagate a record prvalue should have
+    // exactly one child.
+    llvm::SmallVector<Stmt *> Children(E->child_begin(), E->child_end());
+    if (Children.size() != 1)
+      E->dump();
+    assert(Children.size() == 1);
+    for (Stmt *S : Children)
+      PropagateResultObject(cast<Expr>(S), Loc);
+  }
+
+private:
+  llvm::DenseMap<const Expr *, RecordStorageLocation *> &ResultObjectMap;
+  RecordStorageLocation *LocForRecordReturnVal;
+  DataflowAnalysisContext &DACtx;
+};
+
+} // namespace
+
 Environment::Environment(DataflowAnalysisContext &DACtx)
     : DACtx(&DACtx),
       FlowConditionToken(DACtx.arena().makeFlowConditionToken()) {}
@@ -400,17 +582,23 @@ void Environment::initialize() {
   if (DeclCtx == nullptr)
     return;
 
-  if (const auto *FuncDecl = dyn_cast<FunctionDecl>(DeclCtx)) {
-    assert(FuncDecl->doesThisDeclarationHaveABody());
+  const auto *FuncDecl = dyn_cast<FunctionDecl>(DeclCtx);
+  if (FuncDecl == nullptr)
+    return;
+
+  assert(FuncDecl->doesThisDeclarationHaveABody());
 
-    initFieldsGlobalsAndFuncs(FuncDecl);
+  initFieldsGlobalsAndFuncs(FuncDecl);
 
-    for (const auto *ParamDecl : FuncDecl->parameters()) {
-      assert(ParamDecl != nullptr);
-      setStorageLocation(*ParamDecl, createObject(*ParamDecl, nullptr));
-    }
+  for (const auto *ParamDecl : FuncDecl->parameters()) {
+    assert(ParamDecl != nullptr);
+    setStorageLocation(*ParamDecl, createObject(*ParamDecl, nullptr));
   }
 
+  if (FuncDecl->getReturnType()->isRecordType())
+    LocForRecordReturnVal = &cast<RecordStorageLocation>(
+        createStorageLocation(FuncDecl->getReturnType()));
+
   if (const auto *MethodDecl = dyn_cast<CXXMethodDecl>(DeclCtx)) {
     auto *Parent = MethodDecl->getParent();
     assert(Parent != nullptr);
@@ -443,6 +631,12 @@ void Environment::initialize() {
         initializeFieldsWithValues(ThisLoc);
     }
   }
+
+  // We do this below the handling of `CXXMethodDecl` above so that we can
+  // be sure that the storage location for `this` has been set.
+  ResultObjectMap = std::make_shared<PrValueToResultObject>(
+      buildResultObjectMap(DACtx, FuncDecl, getThisPointeeStorageLocation(),
+                           LocForRecordReturnVal));
 }
 
 // FIXME: Add support for resetting globals after function calls to enable
@@ -483,13 +677,18 @@ void Environment::initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl) {
     if (getStorageLocation(*D) != nullptr)
       continue;
 
-    setStorageLocation(*D, createObject(*D));
+    // We don't run transfer functions on the initializers of global variables,
+    // so they won't be associated with a value or storage location. We
+    // therefore intentionally don't pass an initializer to `createObject()`;
+    // in particular, this ensures that `createObject()` will initialize the
+    // fields of record-type variables with values.
+    setStorageLocation(*D, createObject(*D, nullptr));
   }
 
   for (const FunctionDecl *FD : Funcs) {
     if (getStorageLocation(*FD) != nullptr)
       continue;
-    auto &Loc = createStorageLocation(FD->getType());
+    auto &Loc = createStorageLocation(*FD);
     setStorageLocation(*FD, Loc);
   }
 }
@@ -518,6 +717,9 @@ Environment Environment::pushCall(const CallExpr *Call) const {
     }
   }
 
+  if (Call->getType()->isRecordType() && Call->isPRValue())
+    Env.LocForRecordReturnVal = &Env.getResultObjectLocation(*Call);
+
   Env.pushCallInternal(Call->getDirectCallee(),
                        llvm::ArrayRef(Call->getArgs(), Call->getNumArgs()));
 
@@ -528,6 +730,7 @@ Environment Environment::pushCall(const CXXConstructExpr *Call) const {
   Environment Env(*this);
 
   Env.ThisPointeeLoc = &Env.getResultObjectLocation(*Call);
+  Env.LocForRecordReturnVal = &Env.getResultObjectLocation(*Call);
 
   Env.pushCallInternal(Call->getConstructor(),
                        llvm::ArrayRef(Call->getArgs(), Call->getNumArgs()));
@@ -556,6 +759,10 @@ void Environment::pushCallInternal(const FunctionDecl *FuncDecl,
     const VarDecl *Param = *ParamIt;
     setStorageLocation(*Param, createObject(*Param, Args[ArgIndex]));
   }
+
+  ResultObjectMap = std::make_shared<PrValueToResultObject>(
+      buildResultObjectMap(DACtx, FuncDecl, getThisPointeeStorageLocation(),
+                           LocForRecordReturnVal));
 }
 
 void Environment::popCall(const CallExpr *Call, const Environment &CalleeEnv) {
@@ -599,6 +806,9 @@ bool Environment::equivalentTo(const Environment &Other,
   if (ReturnLoc != Other.ReturnLoc)
     return false;
 
+  if (LocForRecordReturnVal != Other.LocForRecordReturnVal)
+    return false;
+
   if (ThisPointeeLoc != Other.ThisPointeeLoc)
     return false;
 
@@ -622,8 +832,10 @@ LatticeJoinEffect Environment::widen(const Environment &PrevEnv,
   assert(DACtx == PrevEnv.DACtx);
   assert(ReturnVal == PrevEnv.ReturnVal);
   assert(ReturnLoc == PrevEnv.ReturnLoc);
+  assert(LocForRecordReturnVal == PrevEnv.LocForRecordReturnVal);
   assert(ThisPointeeLoc == PrevEnv.ThisPointeeLoc);
   assert(CallStack == PrevEnv.CallStack);
+  assert(ResultObjectMap == PrevEnv.ResultObjectMap);
 
   auto Effect = LatticeJoinEffect::Unchanged;
 
@@ -655,12 +867,16 @@ Environment Environment::join(const Environment &EnvA, const Environment &EnvB,
                               Environment::ValueModel &Model,
                               ExprJoinBehavior ExprBehavior) {
   assert(EnvA.DACtx == EnvB.DACtx);
+  assert(EnvA.LocForRecordReturnVal == EnvB.LocForRecordReturnVal);
   assert(EnvA.ThisPointeeLoc == EnvB.ThisPointeeLoc);
   assert(EnvA.CallStack == EnvB.CallStack);
+  assert(EnvA.ResultObjectMap == EnvB.ResultObjectMap);
 
   Environment JoinedEnv(*EnvA.DACtx);
 
   JoinedEnv.CallStack = EnvA.CallStack;
+  JoinedEnv.ResultObjectMap = EnvA.ResultObjectMap;
+  JoinedEnv.LocForRecordReturnVal = EnvA.LocForRecordReturnVal;
   JoinedEnv.ThisPointeeLoc = EnvA.ThisPointeeLoc;
 
   if (EnvA.ReturnVal == nullptr || EnvB.ReturnVal == nullptr) {
@@ -729,6 +945,12 @@ StorageLocation &Environment::createStorageLocation(const Expr &E) {
 
 void Environment::setStorageLocation(const ValueDecl &D, StorageLocation &Loc) {
   assert(!DeclToLoc.contains(&D));
+  // The only kinds of declarations that may have a "variable" storage location
+  // are declarations of reference type and `BindingDecl`. For all other
+  // declaration, the storage location should be the stable storage location
+  // returned by `createStorageLocation()`.
+  assert(D.getType()->isReferenceType() || isa<BindingDecl>(D) ||
+         &Loc...
[truncated]

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not finish reviewing this, but wanted to send a couple of questions early to parallelize the process a bit.

// incurred when it is copied, and this happens only relatively rarely (when
// we fork the environment). The need for a `shared_ptr` will go away once we
// introduce a shared call-context object (see above).
std::shared_ptr<PrValueToResultObject> ResultObjectMap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the model for something like:

while (cond)
{
  f();
}

where f returns a value. Do we have the same storage location for the prvalue in all iterations? In that case, the location is in fact a "summary" for the different iterations. I think this is fine, as we want convergence. I am just wondering if we need to make this explicit in the future, so checks would know if a location were a summary, or an exact location.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for this to be owned by the DataflowAnalysisContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the model for something like:

while (cond)
{
  f();
}

where f returns a value. Do we have the same storage location for the prvalue in all iterations?

Let me clarify what you're asking.

I think what you mean is that f() is declared like this, correct?

struct S {};
S f();

In this case, there is indeed no glvalue of type S -- just the prvalue for the call f(). Normally, of course, prvalues don't have storage locations, but in this case, because there isn't an actual result object, we treat the CallExpr as if it was a MaterializeTemporaryExpr (see comment in ResultObjectVisitor::VisitExpr()).

(Just making sure up to here that we agree on the setting -- do you agree with all of the above?)

First of all, yes, we use the same storage location on all iterations of the loop -- witness the call to DACtx.getStableStorageLocation(*E).

In that case, the location is in fact a "summary" for the different iterations.

I'm not sure I would agree with this. Let's assume this was a case where we actually have a MaterializeTemporaryExpr (imagine we're accessing some member of S, i.e. assume the expression in the loop is actually f().some_member_fn()). Then the compiler's codegen would very likely place the temporary in the same storage location on every iteration of the loop. (Sure, nothing in the standard mandates this AFAIK, but it's almost certainly what the compiler is going to do.) So the stable storage location that we use for the expression isn't a "summary" storage location, but it reflects the fact that the generated code, too, would use the same storage location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for this to be owned by the DataflowAnalysisContext?

Unfortunately, no.

The DataflowAnalysisContext is shared by the whole analysis -- which may span several functions if we're doing a context-sensitive analysis.

In contrast, the result object map is specific to a given function, just like some of the other fields that are in Environment (CallStack, ReturnVal, ReturnLoc and ThisPointeeLoc). See the comment above saying that we should create a call-context object (which would be a per-function data structure) and move all of these fields to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should create a call-context object

Ah, ok. +1, I like this approach.

}

bool TraverseBindingDecl(BindingDecl *BD) {
// `RecursiveASTVisitor` doesn't traverse holding variables for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this omission should be addressed in RecursiveASTVisitor. This is a relatively new construct, so I can imagine whoever added them just forgot to update the RecursiveASTVisitor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point!

This makes sense to me, but I expect that making this change to RecursiveASTVisitor would require followon changes elsewhere, so I'll handle this here for now. I've put this on my list of things to investigate.

// exactly one child.
llvm::SmallVector<Stmt *> Children(E->child_begin(), E->child_end());
if (Children.size() != 1)
E->dump();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some policies about dumps like this? Do we want to also emit a call to action like report the bug with a reproducer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, this wasn't intended as a user-facing dump in the first place -- just assistance in case the assert fails. Moved to an LLVM_DEBUG block.

@martinboehme
Copy link
Contributor Author

I think all comments should be addressed.

@Xazax-hun Should be ready for further review.

@ymand LMK if you have any additional comments.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is not exactly what we need, but I found Expr::skipRValueSubobjectAdjustments in the compiler. I think it might be a good idea to take a look just in case there is something we forgot to cover, and maybe in the future we could look for factoring out some common code.

if (auto *InitList = dyn_cast<InitListExpr>(E)) {
if (!InitList->isSemanticForm())
return;
if (InitList->isTransparent()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the branch that would be triggered for the initialization of the array member in code like:

struct A {
    int b[5];
};

void f() {
    A{{1, 2, 3, 4, 5}};
}

I am mostly interested in what do we expect to happen for the array elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the branch that would be triggered for the initialization of the array member in code like:
[snip]

No. (Neither of the two InitListExprs here is transparent. More details below.)

I am mostly interested in what do we expect to happen for the array elements.

AFAIK we don't handle arrays at all yet, neither here nor in the rest of the framework. (For this particular function, see the bit at the top that bails out if this isn't a record type.) So the answer to this particular case is: Nothing.

But in any case, the InitListExprs in the code you posted aren't transparent. (Unfortunately, the AST dumper doesn't output whether an InitListExpr is transparent; I locally modified TextNodeDumper::VisitInitListExpr() to verify my understanding of when an InitListExpr is transparent.)

Here's the documentation of InitListExpr::isTransparent():

  /// Is this a transparent initializer list (that is, an InitListExpr that is
  /// purely syntactic, and whose semantics are that of the sole contained
  /// initializer)?

This isn't, unfortunately, entirely clear, but I think what this is saying is that an InitListExpr is transparent if it contains exactly one initializer and has the same semantics as simply replacing the InitListExpr with that initializer, i.e. leaving off the curly braces.

For example, here's a transparent InitListExpr:

struct S {};

void f() {
  S s = {S()};
}

Thinking about this, I realize that the code here is too general -- a transparent initializer list always has exactly one initializer, so there's no need to loop over inits(). (See here for an example in Clang that simply accesses getInit(0).) I'll push a fixup that changes this.


// All other expression nodes that propagate a record prvalue should have
// exactly one child.
llvm::SmallVector<Stmt *> Children(E->child_begin(), E->child_end());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do we need llvm:: here? Also, while there might be a reasonable default in SmallVector, here we really expect to have only one element in correct code. Should we specify the size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do we need llvm:: here?

Oops, no -- removed.

Also, while there might be a reasonable default in SmallVector, here we really expect to have only one element in correct code. Should we specify the size?

Good point -- this is really a case where we should specify the size explicitly. Done.

@@ -556,6 +763,10 @@ void Environment::pushCallInternal(const FunctionDecl *FuncDecl,
const VarDecl *Param = *ParamIt;
setStorageLocation(*Param, createObject(*Param, Args[ArgIndex]));
}

ResultObjectMap = std::make_shared<PrValueToResultObject>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is something that we might want to get cached. That being said, we'd probably want to benchmark first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True -- if we're making several calls to the same function, it might indeed be worth caching this.

I think this isn't a priority for the time being though because a) I'm not aware of any non-test models that use context-sensitive analysis yet, and b) as you say, we should benchmark to see whether this makes up a significant portion of the total effort for analyzing the callee (which in itself will be expensive).

@martinboehme
Copy link
Contributor Author

I know it is not exactly what we need, but I found Expr::skipRValueSubobjectAdjustments in the compiler. I think it might be a good idea to take a look just in case there is something we forgot to cover, and maybe in the future we could look for factoring out some common code.

Thanks! I took a look and didn't see anything in there that we would need to cover but don't yet.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

…zers.

Previously, we were propagating storage locations the other way around, i.e.
from initializers to result objects, using `RecordValue::getLoc()`. This gave
the wrong behavior in some cases -- see the newly added or fixed tests in this
patch.

In addition, this patch now unblocks removing the `RecordValue` class entirely,
as we no longer need `RecordValue::getLoc()`.

With this patch, the test `TransferTest.DifferentReferenceLocInJoin` started to
fail because the framework now always uses the same storge location for a
`MaterializeTemporaryExpr`, meaning that the code under test no longer set up
the desired state where a variable of reference type is mapped to two different
storage locations in environments being joined. Rather than trying to modify
this test to set up the test condition again, I have chosen to replace the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the test
condition directly; because this test is more direct, it will also be less
brittle in the face of future changes.
@martinboehme martinboehme force-pushed the piper_export_cl_608290664 branch from eb3392d to 4ae9d7e Compare April 10, 2024 14:13
@martinboehme
Copy link
Contributor Author

Rebased to head and resolved merge conflicts.

Will merge once checks pass.

@martinboehme
Copy link
Contributor Author

CI failure looks unrelated

@martinboehme martinboehme merged commit 21009f4 into llvm:main Apr 10, 2024
@martinboehme
Copy link
Contributor Author

This is causing buildbots to fail because isOriginalRecordConstructor() is now unused. Reverting.

martinboehme added a commit that referenced this pull request Apr 10, 2024
…initializers." (#88315)

Reverts #87320

This is causing buildbots to fail because
`isOriginalRecordConstructor()` is now unused.
martinboehme added a commit that referenced this pull request Apr 11, 2024
…cts to initializers. (#88316)

This relands #87320 and additionally removes the now-unused function
`isOriginalRecordConstructor()`, which was causing buildbots to fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants