Skip to content

[clang][dataflow] Add synthetic fields to RecordStorageLocation #73860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 4, 2023

Conversation

martinboehme
Copy link
Contributor

Synthetic fields are intended to model the internal state of a class (e.g. the value stored in a std::optional) without having to depend on that class's implementation details.

Today, this is typically done with properties on RecordValues, but these have several drawbacks:

  • Care must be taken to call refreshRecordValue() before modifying a property so that the modified property values aren’t seen by other environments that may have access to the same RecordValue.

  • Properties aren’t associated with a storage location. If an analysis needs to associate a location with the value stored in a property (e.g. to model the reference returned by std::optional::value()), it needs to manually add an indirection using a PointerValue. (See for example the way this is done in UncheckedOptionalAccessModel.cpp, specifically in maybeInitializeOptionalValueMember().)

  • Properties don’t participate in the builtin compare, join, and widen operations. If an analysis needs to apply these operations to properties, it needs to override the corresponding methods of ValueModel.

  • Longer-term, we plan to eliminate RecordValue, as by-value operations on records aren’t really “a thing” in C++ (see https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This would obviously eliminate the ability to set properties on RecordValues.

To demonstrate the advantages of synthetic fields, this patch converts UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly simplifies the implementation of the check.

This PR is pretty big; to make it easier to review, I have broken it down into a stack of three commits, each of which contains a set of logically related changes. I considered submitting each of these as a separate PR, but the commits only really make sense when taken together.

To review, I suggest first looking at the changes in UncheckedOptionalAccessModel.cpp. This gives a flavor for how the various API changes work together in the context of an analysis. Then, review the rest of the changes.

The `Environment` is now only initialized with storage locations and values for
global variables, parameters, and the like after the analysis has been created.

Followup commits will add the ability to set callbacks that create synthetic
properties and associate these synthetic properties with values. The analysis
needs to be able to set these callbacks before any storage locations and values
are created.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Nov 29, 2023
@martinboehme martinboehme requested a review from ymand November 29, 2023 21:42
@llvmbot
Copy link
Member

llvmbot commented Nov 29, 2023

@llvm/pr-subscribers-clang-analysis

Author: None (martinboehme)

Changes

Synthetic fields are intended to model the internal state of a class (e.g. the value stored in a std::optional) without having to depend on that class's implementation details.

Today, this is typically done with properties on RecordValues, but these have several drawbacks:

  • Care must be taken to call refreshRecordValue() before modifying a property so that the modified property values aren’t seen by other environments that may have access to the same RecordValue.

  • Properties aren’t associated with a storage location. If an analysis needs to associate a location with the value stored in a property (e.g. to model the reference returned by std::optional::value()), it needs to manually add an indirection using a PointerValue. (See for example the way this is done in UncheckedOptionalAccessModel.cpp, specifically in maybeInitializeOptionalValueMember().)

  • Properties don’t participate in the builtin compare, join, and widen operations. If an analysis needs to apply these operations to properties, it needs to override the corresponding methods of ValueModel.

  • Longer-term, we plan to eliminate RecordValue, as by-value operations on records aren’t really “a thing” in C++ (see https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This would obviously eliminate the ability to set properties on RecordValues.

To demonstrate the advantages of synthetic fields, this patch converts UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly simplifies the implementation of the check.

This PR is pretty big; to make it easier to review, I have broken it down into a stack of three commits, each of which contains a set of logically related changes. I considered submitting each of these as a separate PR, but the commits only really make sense when taken together.

To review, I suggest first looking at the changes in UncheckedOptionalAccessModel.cpp. This gives a flavor for how the various API changes work together in the context of an analysis. Then, review the rest of the changes.


Patch is 63.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73860.diff

18 Files Affected:

  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h (+17-1)
  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h (+46)
  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h (+19-6)
  • (modified) clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h (+1-12)
  • (modified) clang/include/clang/Analysis/FlowSensitive/RecordOps.h (+9-8)
  • (modified) clang/include/clang/Analysis/FlowSensitive/StorageLocation.h (+26-6)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp (+39-1)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp (+20-53)
  • (modified) clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp (+4)
  • (modified) clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp (+108-270)
  • (modified) clang/lib/Analysis/FlowSensitive/RecordOps.cpp (+24)
  • (modified) clang/lib/Analysis/FlowSensitive/Transfer.cpp (+12-14)
  • (modified) clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp (+59-2)
  • (modified) clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp (+5)
  • (modified) clang/unittests/Analysis/FlowSensitive/RecordOpsTest.cpp (+50-9)
  • (modified) clang/unittests/Analysis/FlowSensitive/TestingSupport.cpp (+6-3)
  • (modified) clang/unittests/Analysis/FlowSensitive/TestingSupport.h (+3-1)
  • (modified) clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp (+2-2)
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
index c5f14cfcd4f7bee..1dffbe8a09c3609 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
@@ -234,6 +234,22 @@ runDataflowAnalysis(
   return std::move(BlockStates);
 }
 
+// Create an analysis class that is derived from `DataflowAnalysis`. This is an
+// SFINAE adapter that allows us to call two different variants of constructor
+// (either with or without the optional `Environment` parameter).
+// FIXME: Make all classes derived from `DataflowAnalysis` take an `Environment`
+// parameter in their constructor so that we can get rid of this abomination.
+template <typename AnalysisT>
+auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
+    -> decltype(AnalysisT(ASTCtx, Env)) {
+  return AnalysisT(ASTCtx, Env);
+}
+template <typename AnalysisT>
+auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
+    -> decltype(AnalysisT(ASTCtx)) {
+  return AnalysisT(ASTCtx);
+}
+
 /// Runs a dataflow analysis over the given function and then runs `Diagnoser`
 /// over the results. Returns a list of diagnostics for `FuncDecl` or an
 /// error. Currently, errors can occur (at least) because the analysis requires
@@ -262,7 +278,7 @@ llvm::Expected<llvm::SmallVector<Diagnostic>> diagnoseFunction(
   const WatchedLiteralsSolver *Solver = OwnedSolver.get();
   DataflowAnalysisContext AnalysisContext(std::move(OwnedSolver));
   Environment Env(AnalysisContext, FuncDecl);
-  AnalysisT Analysis(ASTCtx);
+  AnalysisT Analysis = createAnalysis<AnalysisT>(ASTCtx, Env);
   llvm::SmallVector<Diagnostic> Diagnostics;
   if (llvm::Error Err =
           runTypeErasedDataflowAnalysis(
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
index c1aa8484817a9d9..20e45cc27b01fa8 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
@@ -58,6 +58,10 @@ using FieldSet = llvm::SmallSetVector<const FieldDecl *, 4>;
 /// Returns the set of all fields in the type.
 FieldSet getObjectFields(QualType Type);
 
+/// Returns whether `Fields` and `FieldLocs` contain the same fields.
+bool containsSameFields(const FieldSet &Fields,
+                        const RecordStorageLocation::FieldToLoc &FieldLocs);
+
 struct ContextSensitiveOptions {
   /// The maximum depth to analyze. A value of zero is equivalent to disabling
   /// context-sensitive analysis entirely.
@@ -92,11 +96,39 @@ class DataflowAnalysisContext {
                               /*Logger=*/nullptr});
   ~DataflowAnalysisContext();
 
+  /// Sets a callback that returns the names and types of the synthetic fields
+  /// to add to a `RecordStorageLocation` of a given type.
+  /// Typically, this is called from the constructor of a `DataflowAnalysis`
+  ///
+  /// To maintain the invariant that all `RecordStorageLocation`s of a given
+  /// type have the same fields:
+  /// *  The callback must always return the same result for a given type
+  /// *  `setSyntheticFieldCallback()` must be called before any
+  //     `RecordStorageLocation`s are created.
+  void setSyntheticFieldCallback(
+      std::function<llvm::StringMap<QualType>(QualType)> CB) {
+    assert(!RecordStorageLocationCreated);
+    SyntheticFieldCallback = CB;
+  }
+
   /// Returns a new storage location appropriate for `Type`.
   ///
   /// A null `Type` is interpreted as the pointee type of `std::nullptr_t`.
   StorageLocation &createStorageLocation(QualType Type);
 
+  /// Creates a `RecordStorageLocation` for the given type and with the given
+  /// fields.
+  ///
+  /// Requirements:
+  ///
+  ///  `FieldLocs` must contain exactly the fields returned by
+  ///  `getModeledFields(Type)`.
+  ///  `SyntheticFields` must contain exactly the fields returned by
+  ///  `getSyntheticFields(Type)`.
+  RecordStorageLocation &createRecordStorageLocation(
+      QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
+      RecordStorageLocation::SyntheticFieldMap SyntheticFields);
+
   /// Returns a stable storage location for `D`.
   StorageLocation &getStableStorageLocation(const ValueDecl &D);
 
@@ -169,6 +201,15 @@ class DataflowAnalysisContext {
   /// context.
   FieldSet getModeledFields(QualType Type);
 
+  /// Returns the names and types of the synthetic fields for the given record
+  /// type.
+  llvm::StringMap<QualType> getSyntheticFields(QualType Type) {
+    assert(Type->isRecordType());
+    if (SyntheticFieldCallback)
+      return SyntheticFieldCallback(Type);
+    return {};
+  }
+
 private:
   friend class Environment;
 
@@ -250,6 +291,11 @@ class DataflowAnalysisContext {
   FieldSet ModeledFields;
 
   std::unique_ptr<Logger> LogOwner; // If created via flags.
+
+  std::function<llvm::StringMap<QualType>(QualType)> SyntheticFieldCallback;
+
+  /// Has any `RecordStorageLocation` been created yet?
+  bool RecordStorageLocationCreated = false;
 };
 
 } // namespace dataflow
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index 7c1f5491096326b..8502f4da7e5414f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -166,6 +166,10 @@ class Environment {
   /// with a symbolic representation of the `this` pointee.
   Environment(DataflowAnalysisContext &DACtx, const DeclContext &DeclCtx);
 
+  /// Assigns storage locations and values to all global variables, fields
+  /// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
+  void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);
+
   /// Returns a new environment that is a copy of this one.
   ///
   /// The state of the program is initially the same, but can be mutated without
@@ -283,7 +287,15 @@ class Environment {
   /// Returns the storage location assigned to the `this` pointee in the
   /// environment or null if the `this` pointee has no assigned storage location
   /// in the environment.
-  RecordStorageLocation *getThisPointeeStorageLocation() const;
+  RecordStorageLocation *getThisPointeeStorageLocation() const {
+    return ThisPointeeLoc;
+  }
+
+  /// Sets the storage location assigned to the `this` pointee in the
+  /// environment.
+  void setThisPointeeStorageLocation(RecordStorageLocation &Loc) {
+    ThisPointeeLoc = &Loc;
+  }
 
   /// Returns the location of the result object for a record-type prvalue.
   ///
@@ -367,7 +379,8 @@ class Environment {
   /// storage locations and values for indirections until it finds a
   /// non-pointer/non-reference type.
   ///
-  /// If `Type` is a class, struct, or union type, calls `setValue()` to
+  /// If `Type` is a class, struct, or union type, creates values for all
+  /// modeled fields (including synthetic fields) and calls `setValue()` to
   /// associate the `RecordValue` with its storage location
   /// (`RecordValue::getLoc()`).
   ///
@@ -570,6 +583,9 @@ class Environment {
     return dyn_cast<FunctionDecl>(getDeclCtx());
   }
 
+  /// Returns the size of the call stack.
+  size_t callStackSize() const { return CallStack.size(); }
+
   /// Returns whether this `Environment` can be extended to analyze the given
   /// `Callee` (i.e. if `pushCall` can be used), with recursion disallowed and a
   /// given `MaxDepth`.
@@ -629,10 +645,7 @@ class Environment {
   void pushCallInternal(const FunctionDecl *FuncDecl,
                         ArrayRef<const Expr *> Args);
 
-  /// Assigns storage locations and values to all global variables, fields
-  /// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
-  void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);
-
+private:
   // `DACtx` is not null and not owned by this object.
   DataflowAnalysisContext *DACtx;
 
diff --git a/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h b/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
index aeaf75b2154222c..09eb8b93822612f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
+++ b/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
@@ -45,7 +45,7 @@ struct UncheckedOptionalAccessModelOptions {
 class UncheckedOptionalAccessModel
     : public DataflowAnalysis<UncheckedOptionalAccessModel, NoopLattice> {
 public:
-  UncheckedOptionalAccessModel(ASTContext &Ctx);
+  UncheckedOptionalAccessModel(ASTContext &Ctx, dataflow::Environment &Env);
 
   /// Returns a matcher for the optional classes covered by this model.
   static ast_matchers::DeclarationMatcher optionalClassDecl();
@@ -54,17 +54,6 @@ class UncheckedOptionalAccessModel
 
   void transfer(const CFGElement &Elt, NoopLattice &L, Environment &Env);
 
-  ComparisonResult compare(QualType Type, const Value &Val1,
-                           const Environment &Env1, const Value &Val2,
-                           const Environment &Env2) override;
-
-  bool merge(QualType Type, const Value &Val1, const Environment &Env1,
-             const Value &Val2, const Environment &Env2, Value &MergedVal,
-             Environment &MergedEnv) override;
-
-  Value *widen(QualType Type, Value &Prev, const Environment &PrevEnv,
-               Value &Current, Environment &CurrentEnv) override;
-
 private:
   CFGMatchSwitch<TransferState<NoopLattice>> TransferMatchSwitch;
 };
diff --git a/clang/include/clang/Analysis/FlowSensitive/RecordOps.h b/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
index f007a947a82f132..7b87840d626b4b6 100644
--- a/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
+++ b/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
@@ -21,10 +21,10 @@ namespace dataflow {
 
 /// Copies a record (struct, class, or union) from `Src` to `Dst`.
 ///
-/// This performs a deep copy, i.e. it copies every field and recurses on
-/// fields of record type. It also copies properties from the `RecordValue`
-/// associated with `Src` to the `RecordValue` associated with `Dst` (if these
-/// `RecordValue`s exist).
+/// This performs a deep copy, i.e. it copies every field (including synthetic
+/// fields) and recurses on fields of record type. It also copies properties
+/// from the `RecordValue` associated with `Src` to the `RecordValue` associated
+/// with `Dst` (if these `RecordValue`s exist).
 ///
 /// If there is a `RecordValue` associated with `Dst` in the environment, this
 /// function creates a new `RecordValue` and associates it with `Dst`; clients
@@ -47,10 +47,11 @@ void copyRecord(RecordStorageLocation &Src, RecordStorageLocation &Dst,
 /// retrieved from `Env2`. A convenience overload retrieves values for `Loc1`
 /// and `Loc2` from the same environment.
 ///
-/// This performs a deep comparison, i.e. it compares every field and recurses
-/// on fields of record type. Fields of reference type compare equal if they
-/// refer to the same storage location. If `RecordValue`s are associated with
-/// `Loc1` and `Loc2`, it also compares the properties on those `RecordValue`s.
+/// This performs a deep comparison, i.e. it compares every field (including
+/// synthetic fields) and recurses on fields of record type. Fields of reference
+/// type compare equal if they refer to the same storage location. If
+/// `RecordValue`s are associated with `Loc1` and Loc2`, it also compares the
+/// properties on those `RecordValue`s.
 ///
 /// Note on how to interpret the result:
 /// - If this returns true, the records are guaranteed to be equal at runtime.
diff --git a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
index 89d2bbfbb69f9fb..b7203378189d6bf 100644
--- a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
+++ b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
@@ -73,7 +73,13 @@ class ScalarStorageLocation final : public StorageLocation {
 ///
 /// Contains storage locations for all modeled fields of the record (also
 /// referred to as "children"). The child map is flat, so accessible members of
-/// the base class are directly accesible as children of this location.
+/// the base class are directly accessible as children of this location.
+///
+/// Record storage locations may also contain so-called synthetic fields. These
+/// are typically used to the internal state of a class (e.g. the value stored
+/// in a `std::optional`) without having to depend on that class's
+/// implementation details. All `RecordStorageLocation`s of a given type should
+/// have the same synthetic fields.
 ///
 /// The storage location for a field of reference type may be null. This
 /// typically occurs in one of two situations:
@@ -88,12 +94,12 @@ class ScalarStorageLocation final : public StorageLocation {
 class RecordStorageLocation final : public StorageLocation {
 public:
   using FieldToLoc = llvm::DenseMap<const ValueDecl *, StorageLocation *>;
+  using SyntheticFieldMap = llvm::StringMap<StorageLocation *>;
 
-  explicit RecordStorageLocation(QualType Type)
-      : RecordStorageLocation(Type, FieldToLoc()) {}
-
-  RecordStorageLocation(QualType Type, FieldToLoc TheChildren)
-      : StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)) {
+  RecordStorageLocation(QualType Type, FieldToLoc TheChildren,
+                        SyntheticFieldMap TheSyntheticFields)
+      : StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)),
+        SyntheticFields(std::move(TheSyntheticFields)) {
     assert(!Type.isNull());
     assert(Type->isRecordType());
     assert([this] {
@@ -133,6 +139,19 @@ class RecordStorageLocation final : public StorageLocation {
     return It->second;
   }
 
+  /// Returns the storage location for the synthetic field `Name`.
+  /// The synthetic field must exist.
+  StorageLocation &getSyntheticField(llvm::StringRef Name) const {
+    StorageLocation *Loc = SyntheticFields.lookup(Name);
+    assert(Loc != nullptr);
+    return *Loc;
+  }
+
+  llvm::iterator_range<SyntheticFieldMap::const_iterator>
+  synthetic_fields() const {
+    return {SyntheticFields.begin(), SyntheticFields.end()};
+  }
+
   /// Changes the child storage location for a field `D` of reference type.
   /// All other fields cannot change their storage location and always retain
   /// the storage location passed to the `RecordStorageLocation` constructor.
@@ -151,6 +170,7 @@ class RecordStorageLocation final : public StorageLocation {
 
 private:
   FieldToLoc Children;
+  SyntheticFieldMap SyntheticFields;
 };
 
 } // namespace dataflow
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
index 0a2fcd4ad695a30..fa114979c8e3263 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
@@ -68,11 +68,38 @@ StorageLocation &DataflowAnalysisContext::createStorageLocation(QualType Type) {
       else
         FieldLocs.insert({Field, &createStorageLocation(
                                      Field->getType().getNonReferenceType())});
-    return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs));
+
+    RecordStorageLocation::SyntheticFieldMap SyntheticFields;
+    for (const auto &Entry : getSyntheticFields(Type))
+      SyntheticFields.insert(
+          {Entry.getKey(),
+           &createStorageLocation(Entry.getValue().getNonReferenceType())});
+
+    return createRecordStorageLocation(Type, std::move(FieldLocs),
+                                       std::move(SyntheticFields));
   }
   return arena().create<ScalarStorageLocation>(Type);
 }
 
+// Returns the keys for a given `StringMap`.
+// Can't use `StringSet` as the return type as it doesn't support `operator==`.
+template <typename T>
+static llvm::DenseSet<llvm::StringRef> getKeys(const llvm::StringMap<T> &Map) {
+  return llvm::DenseSet<llvm::StringRef>(Map.keys().begin(), Map.keys().end());
+}
+
+RecordStorageLocation &DataflowAnalysisContext::createRecordStorageLocation(
+    QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
+    RecordStorageLocation::SyntheticFieldMap SyntheticFields) {
+  assert(Type->isRecordType());
+  assert(containsSameFields(getModeledFields(Type), FieldLocs));
+  assert(getKeys(getSyntheticFields(Type)) == getKeys(SyntheticFields));
+
+  RecordStorageLocationCreated = true;
+  return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs),
+                                               std::move(SyntheticFields));
+}
+
 StorageLocation &
 DataflowAnalysisContext::getStableStorageLocation(const ValueDecl &D) {
   if (auto *Loc = DeclToLoc.lookup(&D))
@@ -367,3 +394,14 @@ clang::dataflow::FieldSet clang::dataflow::getObjectFields(QualType Type) {
   getFieldsFromClassHierarchy(Type, Fields);
   return Fields;
 }
+
+bool clang::dataflow::containsSameFields(
+    const clang::dataflow::FieldSet &Fields,
+    const clang::dataflow::RecordStorageLocation::FieldToLoc &FieldLocs) {
+  if (Fields.size() != FieldLocs.size())
+    return false;
+  for ([[maybe_unused]] auto [Field, Loc] : FieldLocs)
+    if (!Fields.contains(cast_or_null<FieldDecl>(Field)))
+      return false;
+  return true;
+}
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index 525ab188b01b8aa..983375da9cc3a83 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -367,6 +367,16 @@ getFieldsGlobalsAndFuncs(const Stmt &S, FieldSet &Fields,
   }
 }
 
+Environment::Environment(DataflowAnalysisContext &DACtx)
+    : DACtx(&DACtx),
+      FlowConditionToken(DACtx.arena().makeFlowConditionToken()) {}
+
+Environment::Environment(DataflowAnalysisContext &DACtx,
+                         const DeclContext &DeclCtx)
+    : Environment(DACtx) {
+  CallStack.push_back(&DeclCtx);
+}
+
 // FIXME: Add support for resetting globals after function calls to enable
 // the implementation of sound analyses.
 void Environment::initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl) {
@@ -416,59 +426,12 @@ void Environment::initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl) {
   }
 }
 
-Environment::Environment(DataflowAnalysisContext &DACtx)
-    : DACtx(&DACtx),
-      FlowConditionToken(DACtx.arena().makeFlowConditionToken()) {}
-
 Environment Environment::fork() const {
   Environment Copy(*this);
   Copy.FlowConditionToken = DACtx->forkFlowCondition(FlowConditionToken);
   return Copy;
 }
 
-Environment::Environment(DataflowAnalysisContext &DACtx,
-                         const DeclContext &DeclCtx)
-    : Environment(DACtx) {
-  CallStack.push_back(&DeclCtx);
-
-  if (const auto *FuncDecl = dyn_cast<FunctionDecl>(&DeclCtx)) {
-    assert(FuncDecl->getBody() != nullptr);
-
-    initFieldsGlobalsAndFuncs(FuncDecl);
-
-    for (const auto *ParamDecl : FuncDecl->parameters()) {
-      assert(ParamDecl != nullptr);
-      setStorageLocation(*ParamDecl, createObject(*ParamDecl, nullptr));
-    }
-  }
-
-  if (const auto *MethodDecl = dyn_cast<CXXMethodDecl>(&DeclCtx)) {
-    auto *Parent = MethodDecl->getParent();
-    assert(Parent != nullptr);
-
-    if (Parent->isLambda()) {
-      for (auto Capture : Parent->captures()) {
-        if (Capture.capturesVariable()) {
-          const auto *VarDecl = Capture.getCapturedVar();
-          assert(VarDecl != nullptr);
-          setStorageLocation(*VarDecl, createObject(*VarDecl, nullptr));
-        } else if (Capture.capturesThis()) {
-          const auto *SurroundingMethodDecl =
-              cast<CXXMethodDecl>(DeclCtx.getNonClosureAncestor());
-          QualType ThisPointeeType =
-              SurroundingMethodDecl->getFunctionObjectParameterType();
-          ThisPointeeLoc =
-              &cast<RecordValue>(createValue(ThisPointeeType))->getLoc();
-        }
-      }
-    } else if (MethodDecl->isImplicitObjectMemberFunction()) {
-      QualType ThisPointeeType = MethodDecl->getFunctionObjectParameterType();
-      ThisPointeeLoc =
-          &cast<RecordVal...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 29, 2023

@llvm/pr-subscribers-clang

Author: None (martinboehme)

Changes

Synthetic fields are intended to model the internal state of a class (e.g. the value stored in a std::optional) without having to depend on that class's implementation details.

Today, this is typically done with properties on RecordValues, but these have several drawbacks:

  • Care must be taken to call refreshRecordValue() before modifying a property so that the modified property values aren’t seen by other environments that may have access to the same RecordValue.

  • Properties aren’t associated with a storage location. If an analysis needs to associate a location with the value stored in a property (e.g. to model the reference returned by std::optional::value()), it needs to manually add an indirection using a PointerValue. (See for example the way this is done in UncheckedOptionalAccessModel.cpp, specifically in maybeInitializeOptionalValueMember().)

  • Properties don’t participate in the builtin compare, join, and widen operations. If an analysis needs to apply these operations to properties, it needs to override the corresponding methods of ValueModel.

  • Longer-term, we plan to eliminate RecordValue, as by-value operations on records aren’t really “a thing” in C++ (see https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This would obviously eliminate the ability to set properties on RecordValues.

To demonstrate the advantages of synthetic fields, this patch converts UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly simplifies the implementation of the check.

This PR is pretty big; to make it easier to review, I have broken it down into a stack of three commits, each of which contains a set of logically related changes. I considered submitting each of these as a separate PR, but the commits only really make sense when taken together.

To review, I suggest first looking at the changes in UncheckedOptionalAccessModel.cpp. This gives a flavor for how the various API changes work together in the context of an analysis. Then, review the rest of the changes.


Patch is 63.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73860.diff

18 Files Affected:

  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h (+17-1)
  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h (+46)
  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h (+19-6)
  • (modified) clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h (+1-12)
  • (modified) clang/include/clang/Analysis/FlowSensitive/RecordOps.h (+9-8)
  • (modified) clang/include/clang/Analysis/FlowSensitive/StorageLocation.h (+26-6)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp (+39-1)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp (+20-53)
  • (modified) clang/lib/Analysis/FlowSensitive/HTMLLogger.cpp (+4)
  • (modified) clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp (+108-270)
  • (modified) clang/lib/Analysis/FlowSensitive/RecordOps.cpp (+24)
  • (modified) clang/lib/Analysis/FlowSensitive/Transfer.cpp (+12-14)
  • (modified) clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp (+59-2)
  • (modified) clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp (+5)
  • (modified) clang/unittests/Analysis/FlowSensitive/RecordOpsTest.cpp (+50-9)
  • (modified) clang/unittests/Analysis/FlowSensitive/TestingSupport.cpp (+6-3)
  • (modified) clang/unittests/Analysis/FlowSensitive/TestingSupport.h (+3-1)
  • (modified) clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp (+2-2)
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
index c5f14cfcd4f7bee..1dffbe8a09c3609 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
@@ -234,6 +234,22 @@ runDataflowAnalysis(
   return std::move(BlockStates);
 }
 
+// Create an analysis class that is derived from `DataflowAnalysis`. This is an
+// SFINAE adapter that allows us to call two different variants of constructor
+// (either with or without the optional `Environment` parameter).
+// FIXME: Make all classes derived from `DataflowAnalysis` take an `Environment`
+// parameter in their constructor so that we can get rid of this abomination.
+template <typename AnalysisT>
+auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
+    -> decltype(AnalysisT(ASTCtx, Env)) {
+  return AnalysisT(ASTCtx, Env);
+}
+template <typename AnalysisT>
+auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
+    -> decltype(AnalysisT(ASTCtx)) {
+  return AnalysisT(ASTCtx);
+}
+
 /// Runs a dataflow analysis over the given function and then runs `Diagnoser`
 /// over the results. Returns a list of diagnostics for `FuncDecl` or an
 /// error. Currently, errors can occur (at least) because the analysis requires
@@ -262,7 +278,7 @@ llvm::Expected<llvm::SmallVector<Diagnostic>> diagnoseFunction(
   const WatchedLiteralsSolver *Solver = OwnedSolver.get();
   DataflowAnalysisContext AnalysisContext(std::move(OwnedSolver));
   Environment Env(AnalysisContext, FuncDecl);
-  AnalysisT Analysis(ASTCtx);
+  AnalysisT Analysis = createAnalysis<AnalysisT>(ASTCtx, Env);
   llvm::SmallVector<Diagnostic> Diagnostics;
   if (llvm::Error Err =
           runTypeErasedDataflowAnalysis(
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
index c1aa8484817a9d9..20e45cc27b01fa8 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
@@ -58,6 +58,10 @@ using FieldSet = llvm::SmallSetVector<const FieldDecl *, 4>;
 /// Returns the set of all fields in the type.
 FieldSet getObjectFields(QualType Type);
 
+/// Returns whether `Fields` and `FieldLocs` contain the same fields.
+bool containsSameFields(const FieldSet &Fields,
+                        const RecordStorageLocation::FieldToLoc &FieldLocs);
+
 struct ContextSensitiveOptions {
   /// The maximum depth to analyze. A value of zero is equivalent to disabling
   /// context-sensitive analysis entirely.
@@ -92,11 +96,39 @@ class DataflowAnalysisContext {
                               /*Logger=*/nullptr});
   ~DataflowAnalysisContext();
 
+  /// Sets a callback that returns the names and types of the synthetic fields
+  /// to add to a `RecordStorageLocation` of a given type.
+  /// Typically, this is called from the constructor of a `DataflowAnalysis`
+  ///
+  /// To maintain the invariant that all `RecordStorageLocation`s of a given
+  /// type have the same fields:
+  /// *  The callback must always return the same result for a given type
+  /// *  `setSyntheticFieldCallback()` must be called before any
+  //     `RecordStorageLocation`s are created.
+  void setSyntheticFieldCallback(
+      std::function<llvm::StringMap<QualType>(QualType)> CB) {
+    assert(!RecordStorageLocationCreated);
+    SyntheticFieldCallback = CB;
+  }
+
   /// Returns a new storage location appropriate for `Type`.
   ///
   /// A null `Type` is interpreted as the pointee type of `std::nullptr_t`.
   StorageLocation &createStorageLocation(QualType Type);
 
+  /// Creates a `RecordStorageLocation` for the given type and with the given
+  /// fields.
+  ///
+  /// Requirements:
+  ///
+  ///  `FieldLocs` must contain exactly the fields returned by
+  ///  `getModeledFields(Type)`.
+  ///  `SyntheticFields` must contain exactly the fields returned by
+  ///  `getSyntheticFields(Type)`.
+  RecordStorageLocation &createRecordStorageLocation(
+      QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
+      RecordStorageLocation::SyntheticFieldMap SyntheticFields);
+
   /// Returns a stable storage location for `D`.
   StorageLocation &getStableStorageLocation(const ValueDecl &D);
 
@@ -169,6 +201,15 @@ class DataflowAnalysisContext {
   /// context.
   FieldSet getModeledFields(QualType Type);
 
+  /// Returns the names and types of the synthetic fields for the given record
+  /// type.
+  llvm::StringMap<QualType> getSyntheticFields(QualType Type) {
+    assert(Type->isRecordType());
+    if (SyntheticFieldCallback)
+      return SyntheticFieldCallback(Type);
+    return {};
+  }
+
 private:
   friend class Environment;
 
@@ -250,6 +291,11 @@ class DataflowAnalysisContext {
   FieldSet ModeledFields;
 
   std::unique_ptr<Logger> LogOwner; // If created via flags.
+
+  std::function<llvm::StringMap<QualType>(QualType)> SyntheticFieldCallback;
+
+  /// Has any `RecordStorageLocation` been created yet?
+  bool RecordStorageLocationCreated = false;
 };
 
 } // namespace dataflow
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index 7c1f5491096326b..8502f4da7e5414f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -166,6 +166,10 @@ class Environment {
   /// with a symbolic representation of the `this` pointee.
   Environment(DataflowAnalysisContext &DACtx, const DeclContext &DeclCtx);
 
+  /// Assigns storage locations and values to all global variables, fields
+  /// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
+  void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);
+
   /// Returns a new environment that is a copy of this one.
   ///
   /// The state of the program is initially the same, but can be mutated without
@@ -283,7 +287,15 @@ class Environment {
   /// Returns the storage location assigned to the `this` pointee in the
   /// environment or null if the `this` pointee has no assigned storage location
   /// in the environment.
-  RecordStorageLocation *getThisPointeeStorageLocation() const;
+  RecordStorageLocation *getThisPointeeStorageLocation() const {
+    return ThisPointeeLoc;
+  }
+
+  /// Sets the storage location assigned to the `this` pointee in the
+  /// environment.
+  void setThisPointeeStorageLocation(RecordStorageLocation &Loc) {
+    ThisPointeeLoc = &Loc;
+  }
 
   /// Returns the location of the result object for a record-type prvalue.
   ///
@@ -367,7 +379,8 @@ class Environment {
   /// storage locations and values for indirections until it finds a
   /// non-pointer/non-reference type.
   ///
-  /// If `Type` is a class, struct, or union type, calls `setValue()` to
+  /// If `Type` is a class, struct, or union type, creates values for all
+  /// modeled fields (including synthetic fields) and calls `setValue()` to
   /// associate the `RecordValue` with its storage location
   /// (`RecordValue::getLoc()`).
   ///
@@ -570,6 +583,9 @@ class Environment {
     return dyn_cast<FunctionDecl>(getDeclCtx());
   }
 
+  /// Returns the size of the call stack.
+  size_t callStackSize() const { return CallStack.size(); }
+
   /// Returns whether this `Environment` can be extended to analyze the given
   /// `Callee` (i.e. if `pushCall` can be used), with recursion disallowed and a
   /// given `MaxDepth`.
@@ -629,10 +645,7 @@ class Environment {
   void pushCallInternal(const FunctionDecl *FuncDecl,
                         ArrayRef<const Expr *> Args);
 
-  /// Assigns storage locations and values to all global variables, fields
-  /// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
-  void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);
-
+private:
   // `DACtx` is not null and not owned by this object.
   DataflowAnalysisContext *DACtx;
 
diff --git a/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h b/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
index aeaf75b2154222c..09eb8b93822612f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
+++ b/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
@@ -45,7 +45,7 @@ struct UncheckedOptionalAccessModelOptions {
 class UncheckedOptionalAccessModel
     : public DataflowAnalysis<UncheckedOptionalAccessModel, NoopLattice> {
 public:
-  UncheckedOptionalAccessModel(ASTContext &Ctx);
+  UncheckedOptionalAccessModel(ASTContext &Ctx, dataflow::Environment &Env);
 
   /// Returns a matcher for the optional classes covered by this model.
   static ast_matchers::DeclarationMatcher optionalClassDecl();
@@ -54,17 +54,6 @@ class UncheckedOptionalAccessModel
 
   void transfer(const CFGElement &Elt, NoopLattice &L, Environment &Env);
 
-  ComparisonResult compare(QualType Type, const Value &Val1,
-                           const Environment &Env1, const Value &Val2,
-                           const Environment &Env2) override;
-
-  bool merge(QualType Type, const Value &Val1, const Environment &Env1,
-             const Value &Val2, const Environment &Env2, Value &MergedVal,
-             Environment &MergedEnv) override;
-
-  Value *widen(QualType Type, Value &Prev, const Environment &PrevEnv,
-               Value &Current, Environment &CurrentEnv) override;
-
 private:
   CFGMatchSwitch<TransferState<NoopLattice>> TransferMatchSwitch;
 };
diff --git a/clang/include/clang/Analysis/FlowSensitive/RecordOps.h b/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
index f007a947a82f132..7b87840d626b4b6 100644
--- a/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
+++ b/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
@@ -21,10 +21,10 @@ namespace dataflow {
 
 /// Copies a record (struct, class, or union) from `Src` to `Dst`.
 ///
-/// This performs a deep copy, i.e. it copies every field and recurses on
-/// fields of record type. It also copies properties from the `RecordValue`
-/// associated with `Src` to the `RecordValue` associated with `Dst` (if these
-/// `RecordValue`s exist).
+/// This performs a deep copy, i.e. it copies every field (including synthetic
+/// fields) and recurses on fields of record type. It also copies properties
+/// from the `RecordValue` associated with `Src` to the `RecordValue` associated
+/// with `Dst` (if these `RecordValue`s exist).
 ///
 /// If there is a `RecordValue` associated with `Dst` in the environment, this
 /// function creates a new `RecordValue` and associates it with `Dst`; clients
@@ -47,10 +47,11 @@ void copyRecord(RecordStorageLocation &Src, RecordStorageLocation &Dst,
 /// retrieved from `Env2`. A convenience overload retrieves values for `Loc1`
 /// and `Loc2` from the same environment.
 ///
-/// This performs a deep comparison, i.e. it compares every field and recurses
-/// on fields of record type. Fields of reference type compare equal if they
-/// refer to the same storage location. If `RecordValue`s are associated with
-/// `Loc1` and `Loc2`, it also compares the properties on those `RecordValue`s.
+/// This performs a deep comparison, i.e. it compares every field (including
+/// synthetic fields) and recurses on fields of record type. Fields of reference
+/// type compare equal if they refer to the same storage location. If
+/// `RecordValue`s are associated with `Loc1` and Loc2`, it also compares the
+/// properties on those `RecordValue`s.
 ///
 /// Note on how to interpret the result:
 /// - If this returns true, the records are guaranteed to be equal at runtime.
diff --git a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
index 89d2bbfbb69f9fb..b7203378189d6bf 100644
--- a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
+++ b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
@@ -73,7 +73,13 @@ class ScalarStorageLocation final : public StorageLocation {
 ///
 /// Contains storage locations for all modeled fields of the record (also
 /// referred to as "children"). The child map is flat, so accessible members of
-/// the base class are directly accesible as children of this location.
+/// the base class are directly accessible as children of this location.
+///
+/// Record storage locations may also contain so-called synthetic fields. These
+/// are typically used to the internal state of a class (e.g. the value stored
+/// in a `std::optional`) without having to depend on that class's
+/// implementation details. All `RecordStorageLocation`s of a given type should
+/// have the same synthetic fields.
 ///
 /// The storage location for a field of reference type may be null. This
 /// typically occurs in one of two situations:
@@ -88,12 +94,12 @@ class ScalarStorageLocation final : public StorageLocation {
 class RecordStorageLocation final : public StorageLocation {
 public:
   using FieldToLoc = llvm::DenseMap<const ValueDecl *, StorageLocation *>;
+  using SyntheticFieldMap = llvm::StringMap<StorageLocation *>;
 
-  explicit RecordStorageLocation(QualType Type)
-      : RecordStorageLocation(Type, FieldToLoc()) {}
-
-  RecordStorageLocation(QualType Type, FieldToLoc TheChildren)
-      : StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)) {
+  RecordStorageLocation(QualType Type, FieldToLoc TheChildren,
+                        SyntheticFieldMap TheSyntheticFields)
+      : StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)),
+        SyntheticFields(std::move(TheSyntheticFields)) {
     assert(!Type.isNull());
     assert(Type->isRecordType());
     assert([this] {
@@ -133,6 +139,19 @@ class RecordStorageLocation final : public StorageLocation {
     return It->second;
   }
 
+  /// Returns the storage location for the synthetic field `Name`.
+  /// The synthetic field must exist.
+  StorageLocation &getSyntheticField(llvm::StringRef Name) const {
+    StorageLocation *Loc = SyntheticFields.lookup(Name);
+    assert(Loc != nullptr);
+    return *Loc;
+  }
+
+  llvm::iterator_range<SyntheticFieldMap::const_iterator>
+  synthetic_fields() const {
+    return {SyntheticFields.begin(), SyntheticFields.end()};
+  }
+
   /// Changes the child storage location for a field `D` of reference type.
   /// All other fields cannot change their storage location and always retain
   /// the storage location passed to the `RecordStorageLocation` constructor.
@@ -151,6 +170,7 @@ class RecordStorageLocation final : public StorageLocation {
 
 private:
   FieldToLoc Children;
+  SyntheticFieldMap SyntheticFields;
 };
 
 } // namespace dataflow
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
index 0a2fcd4ad695a30..fa114979c8e3263 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
@@ -68,11 +68,38 @@ StorageLocation &DataflowAnalysisContext::createStorageLocation(QualType Type) {
       else
         FieldLocs.insert({Field, &createStorageLocation(
                                      Field->getType().getNonReferenceType())});
-    return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs));
+
+    RecordStorageLocation::SyntheticFieldMap SyntheticFields;
+    for (const auto &Entry : getSyntheticFields(Type))
+      SyntheticFields.insert(
+          {Entry.getKey(),
+           &createStorageLocation(Entry.getValue().getNonReferenceType())});
+
+    return createRecordStorageLocation(Type, std::move(FieldLocs),
+                                       std::move(SyntheticFields));
   }
   return arena().create<ScalarStorageLocation>(Type);
 }
 
+// Returns the keys for a given `StringMap`.
+// Can't use `StringSet` as the return type as it doesn't support `operator==`.
+template <typename T>
+static llvm::DenseSet<llvm::StringRef> getKeys(const llvm::StringMap<T> &Map) {
+  return llvm::DenseSet<llvm::StringRef>(Map.keys().begin(), Map.keys().end());
+}
+
+RecordStorageLocation &DataflowAnalysisContext::createRecordStorageLocation(
+    QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
+    RecordStorageLocation::SyntheticFieldMap SyntheticFields) {
+  assert(Type->isRecordType());
+  assert(containsSameFields(getModeledFields(Type), FieldLocs));
+  assert(getKeys(getSyntheticFields(Type)) == getKeys(SyntheticFields));
+
+  RecordStorageLocationCreated = true;
+  return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs),
+                                               std::move(SyntheticFields));
+}
+
 StorageLocation &
 DataflowAnalysisContext::getStableStorageLocation(const ValueDecl &D) {
   if (auto *Loc = DeclToLoc.lookup(&D))
@@ -367,3 +394,14 @@ clang::dataflow::FieldSet clang::dataflow::getObjectFields(QualType Type) {
   getFieldsFromClassHierarchy(Type, Fields);
   return Fields;
 }
+
+bool clang::dataflow::containsSameFields(
+    const clang::dataflow::FieldSet &Fields,
+    const clang::dataflow::RecordStorageLocation::FieldToLoc &FieldLocs) {
+  if (Fields.size() != FieldLocs.size())
+    return false;
+  for ([[maybe_unused]] auto [Field, Loc] : FieldLocs)
+    if (!Fields.contains(cast_or_null<FieldDecl>(Field)))
+      return false;
+  return true;
+}
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index 525ab188b01b8aa..983375da9cc3a83 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -367,6 +367,16 @@ getFieldsGlobalsAndFuncs(const Stmt &S, FieldSet &Fields,
   }
 }
 
+Environment::Environment(DataflowAnalysisContext &DACtx)
+    : DACtx(&DACtx),
+      FlowConditionToken(DACtx.arena().makeFlowConditionToken()) {}
+
+Environment::Environment(DataflowAnalysisContext &DACtx,
+                         const DeclContext &DeclCtx)
+    : Environment(DACtx) {
+  CallStack.push_back(&DeclCtx);
+}
+
 // FIXME: Add support for resetting globals after function calls to enable
 // the implementation of sound analyses.
 void Environment::initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl) {
@@ -416,59 +426,12 @@ void Environment::initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl) {
   }
 }
 
-Environment::Environment(DataflowAnalysisContext &DACtx)
-    : DACtx(&DACtx),
-      FlowConditionToken(DACtx.arena().makeFlowConditionToken()) {}
-
 Environment Environment::fork() const {
   Environment Copy(*this);
   Copy.FlowConditionToken = DACtx->forkFlowCondition(FlowConditionToken);
   return Copy;
 }
 
-Environment::Environment(DataflowAnalysisContext &DACtx,
-                         const DeclContext &DeclCtx)
-    : Environment(DACtx) {
-  CallStack.push_back(&DeclCtx);
-
-  if (const auto *FuncDecl = dyn_cast<FunctionDecl>(&DeclCtx)) {
-    assert(FuncDecl->getBody() != nullptr);
-
-    initFieldsGlobalsAndFuncs(FuncDecl);
-
-    for (const auto *ParamDecl : FuncDecl->parameters()) {
-      assert(ParamDecl != nullptr);
-      setStorageLocation(*ParamDecl, createObject(*ParamDecl, nullptr));
-    }
-  }
-
-  if (const auto *MethodDecl = dyn_cast<CXXMethodDecl>(&DeclCtx)) {
-    auto *Parent = MethodDecl->getParent();
-    assert(Parent != nullptr);
-
-    if (Parent->isLambda()) {
-      for (auto Capture : Parent->captures()) {
-        if (Capture.capturesVariable()) {
-          const auto *VarDecl = Capture.getCapturedVar();
-          assert(VarDecl != nullptr);
-          setStorageLocation(*VarDecl, createObject(*VarDecl, nullptr));
-        } else if (Capture.capturesThis()) {
-          const auto *SurroundingMethodDecl =
-              cast<CXXMethodDecl>(DeclCtx.getNonClosureAncestor());
-          QualType ThisPointeeType =
-              SurroundingMethodDecl->getFunctionObjectParameterType();
-          ThisPointeeLoc =
-              &cast<RecordValue>(createValue(ThisPointeeType))->getLoc();
-        }
-      }
-    } else if (MethodDecl->isImplicitObjectMemberFunction()) {
-      QualType ThisPointeeType = MethodDecl->getFunctionObjectParameterType();
-      ThisPointeeLoc =
-          &cast<RecordVal...
[truncated]

Copy link
Collaborator

@ymand ymand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! I spent a while searching for changes to join/widen to account for synthetic fields and then realized that (of course!) no special handling is needed!

I also love that we now have the hooks we've wanted since the beginning.

Thank you!

/// type have the same fields:
/// * The callback must always return the same result for a given type
/// * `setSyntheticFieldCallback()` must be called before any
// `RecordStorageLocation`s are created.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be integrated into the constructor? Could be in a followup patch (to minimize further churn in this one).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if having a single SyntheticFieldCallback will be too restrictive in the future. What if an ananlysis wants to use modeling both for optionals and some other data structure, both having its own callbacks?

Although this is mostly food for thought for the future, no need to address in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be integrated into the constructor? Could be in a followup patch (to minimize further churn in this one).

Unfortunately, that doesn't work because setSyntheticFieldCallback() needs to be called by the DataflowAnalysis, and the DataflowAnalysis doesn't create the DataflowAnalysisContext (i.e. it can't pass anything to its constructor).

I think we could consider changing this though: I don't think there's anything that would prevent the DataflowAnalysis itself from creating the DataflowAnalysisContext, and then the synthetic field callback could simply be a constructor parameter. This would require changing existing analyses, however, so I don't think this is something to tackle as part of this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if having a single SyntheticFieldCallback will be too restrictive in the future. What if an ananlysis wants to use modeling both for optionals and some other data structure, both having its own callbacks?

A single synthetic field callback can handle any number of different types, so the analysis wouldn't need to set more than one callback.

Maybe, however, you're thinking of a scenario where we might want to run several analyses at the same time? I think this is something we've discussed in the past, and in this scenario each analysis would of course potentially want to be able to set its own synthetic field callback. As we currently don't have the ability to run more than one analysis at the same time, I think it is currently sufficient to only support a single callback as well.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally love this direction, this was just a first pass, not a full review. I plan to take a second look but wanted to familiarize myself with the PR and start some conversations.

///
/// Requirements:
///
/// `FieldLocs` must contain exactly the fields returned by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With requirements like this, I was wondering if this should be a private API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be nice (this is not the API that most callers should call), but this API is not just called from Environment (which is already a friend of the DataflowAnalysisContext) but also from Transfer.cpp -- and it's harder to set things up so that that latter call site is a friend.

In the end, I don't think it's worth the effort of "locking down" this API -- the documentation makes it clear that the function has pretty exacting preconditions (which we assert within the implementation), which makes the function unattractive to a potential caller anyway.

@@ -88,12 +94,12 @@ class ScalarStorageLocation final : public StorageLocation {
class RecordStorageLocation final : public StorageLocation {
public:
using FieldToLoc = llvm::DenseMap<const ValueDecl *, StorageLocation *>;
using SyntheticFieldMap = llvm::StringMap<StorageLocation *>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, if we want this map to not be indexed by strings. An alternative would be to somehow access the IdentifierTable and create an identifier from the names and use the pointer to the identifier as the key. This has the potential to be a bit lighter on memory (identifiers only stored once as opposed to storing the keys in each RecordStorageLocation instance), and potentially a bit quicker lookup.

That being said, I am not insisting on the change, and even if we want to do it, it is probably OK in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting idea, but I'm hesitant about mutating the IdentifierTable in something that is going to be used from a clang-tidy check, which will be sharing the IdentifierTable with other checks.

I suppose we could always create our own mapping from strings to integers or pointers of some sort, which could then serve as the keys for the SyntheticFieldMap -- but I would lean towards leaving this as a potential future optimization that we would do if this does indeed turn out to be a performance issue.

/// type have the same fields:
/// * The callback must always return the same result for a given type
/// * `setSyntheticFieldCallback()` must be called before any
// `RecordStorageLocation`s are created.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if having a single SyntheticFieldCallback will be too restrictive in the future. What if an ananlysis wants to use modeling both for optionals and some other data structure, both having its own callbacks?

Although this is mostly food for thought for the future, no need to address in this PR.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I love it! I have some comments on some features in the future, but those are probably not going to be addressed any time soon. First of all, I think in the future when we reason about the values of the pointers, synthetic fields might need to have offsets to be able to know things like &modeled.getField1() != &modeled.getField2(). Offsets might also be a way to support unions, multiple synthetic fields starting at the same offset might express just that.

Another big item is properly supporting inheritance. Consider:

struct B { int a; };
struct D : /*virtual*/ B {};
struct E : /*virtual*/ B {};
struct F: D, E {
    void m() {
        int* l = &(D::a);
        int* m = &(E::a);
    }
};

Here, whether we have the same or different memory locations for D::a and E::a depends on whether we are doing virtual inheritance. This is something that should work both for regular fields and synthetic fields.

if (Fields.size() != FieldLocs.size())
return false;
for ([[maybe_unused]] auto [Field, Loc] : FieldLocs)
if (!Fields.contains(cast_or_null<FieldDecl>(Field)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the cases when we expect to see a null pointer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen for fields of reference type in the cases discussed here.

@martinboehme
Copy link
Contributor Author

Overall, I love it! I have some comments on some features in the future, but those are probably not going to be addressed any time soon. First of all, I think in the future when we reason about the values of the pointers, synthetic fields might need to have offsets to be able to know things like &modeled.getField1() != &modeled.getField2().

Agree.

We do already mostly support this particular case; the transfer function for (in)equality on pointer values would need to be strengthened to compare not the PointerValues themselves but the StorageLocations they point to. As it currently stands, we only return true if the PointerValues themselves are equal, and otherwise we return an atomic. Making pointer comparison stronger would benefit our modeling of pointers in general, not just synthetic fields.

Offsets might also be a way to support unions, multiple synthetic fields starting at the same offset might express just that.

True. And this isn't just restricted to synthetic fields -- regular fields of a union should receive the same treatment.

If I understand the standard right, the only thing that's really guaranteed about the fields of a union is that they all share the same address; writing to one field, then reading from another is undefined behavior (though many compilers implement this as a non-standard language extension).

Currently, when comparing the addresses of different fields of a union, our transfer function produces an atomic (see above). Interestingly, this isn't actually wrong, per se, just inaccurate.

Anyway, though, I think it's relatively rare that a program depends on the behavior of comparing the addresses of two different fields of a union, so I don't think it's a high priority to fix this.

Another big item is properly supporting inheritance. Consider:

struct B { int a; };
struct D : /*virtual*/ B {};
struct E : /*virtual*/ B {};
struct F: D, E {
    void m() {
        int* l = &(D::a);
        int* m = &(E::a);
    }
};

Here, whether we have the same or different memory locations for D::a and E::a depends on whether we are doing virtual inheritance. This is something that should work both for regular fields and synthetic fields.

Agreed, we also don't do this correctly yet. Again, though, I think this is lower priority than many of the other gaps in functionality that we have, as many style guides discourage virtual inheritance anyway, and in the cases where we see it in the code, I would guess that the result of our analysis often doesn't depend on modelling this correctly.

Thanks for all of these observations -- I think at least modelling pointer comparison more accurately is something we should look at soon.

@martinboehme martinboehme merged commit 71f2ec2 into llvm:main Dec 4, 2023
@martinboehme martinboehme deleted the piper_export_cl_583969847 branch December 4, 2023 08:29
martinboehme added a commit to martinboehme/llvm-project that referenced this pull request Dec 13, 2023
-  Instead of comparing the identity of the `PointerValue`s, compare the
   underlying `StorageLocation`s.

-  If the `StorageLocation`s are different, return a definite "false" as the
   result of the comparison. Before, if the `PointerValue`s were different, the
   best we could do was to return an atom (because the `StorageLocation`s might
   still be the same).

On an internal codebase, this change reduces SAT solver timeouts by over 20% and
"maximum iterations reached" errors by over 50%. In addition, it obviously
improves the precision of the analysis.

@Xazax-hun inspired me to think about this with his
[comments](llvm#73860 (review))
on a different PR.

The one thing where the new code currently does the wrong thing is when
comparing the addresses of different members of a union. By the language
standard, all members of a union should have the same address, but we currently
model them with different `StorageLocation`s, and so with this change, we will
return false when comparing the addreses.

I propose that this is acceptable because is unlikely to affect the behavior of
real-world code in meaningful ways.

With this change, the test TransferTest.DifferentReferenceLocInJoin started to
fail because the code under test no longer set up the desired state where a
variable of reference type is mapped to two different storage locations in
environments being joined. Rather than trying to modify this test to set up the
test condition again, I have chosen to replace the test with an equivalent
test in DataflowEnvironmentTest.cpp that sets up the test condition directly;
because this test is more direct, it will also be less brittle in the face of
future changes.
martinboehme added a commit to martinboehme/llvm-project that referenced this pull request Dec 21, 2023
Instead, synthetic fields should now be used for the same purpose. These have a
number of advantages, as described in
llvm#73860, and longer-term, we want to
eliminate `RecordValue` entirely.

As `RecordValue`s cannot have properties any more, I have replaced the
`OptionalIntAnalysis` with an equivalent analysis that tracks nullness of
pointers (instead of whether an optional has a value). This serves the same
purpose, namely to check whether the framework applies a custom `merge()`
operation to widen properties.
martinboehme added a commit that referenced this pull request Dec 21, 2023
)

Instead, synthetic fields should now be used for the same purpose. These
have a
number of advantages, as described in
#73860, and longer-term, we
want to
eliminate `RecordValue` entirely.

As `RecordValue`s cannot have properties any more, I have replaced the
`OptionalIntAnalysis` with an equivalent analysis that tracks nullness
of
pointers (instead of whether an optional has a value). This serves the
same
purpose, namely to check whether the framework applies a custom
`merge()`
operation to widen properties.
qihangkong pushed a commit to rvgpu/rvgpu-llvm that referenced this pull request Apr 23, 2024
…042)

Instead, synthetic fields should now be used for the same purpose. These
have a
number of advantages, as described in
llvm/llvm-project#73860, and longer-term, we
want to
eliminate `RecordValue` entirely.

As `RecordValue`s cannot have properties any more, I have replaced the
`OptionalIntAnalysis` with an equivalent analysis that tracks nullness
of
pointers (instead of whether an optional has a value). This serves the
same
purpose, namely to check whether the framework applies a custom
`merge()`
operation to widen properties.
martinboehme added a commit to martinboehme/llvm-project that referenced this pull request May 6, 2024
-  Instead of comparing the identity of the `PointerValue`s, compare the
   underlying `StorageLocation`s.

-  If the `StorageLocation`s are the same, return a definite "true" as the
   result of the comparison. Before, if the `PointerValue`s were different, we
   would return an atom, even if the storage locations themselves were the same.

-  If the `StorageLocation`s are different, return an atom (as before). Pointers
   that have different storage locations may still alias, so we can't return a
   definite "false" in this case.

The application-level gains from this are relatively modest. For the Crubit
nullability check running on an internal codebase, this change reduces the
number of functions on which the SAT solver times out from 223 to 221; the
number of "pointer expression not modeled" errors reduces from 3815 to 3778.

Still, it seems that the gain in precision is generally worthwhile.

@Xazax-hun inspired me to think about this with his
[comments](llvm#73860 (review))
on a different PR.
martinboehme added a commit that referenced this pull request May 7, 2024
-  Instead of comparing the identity of the `PointerValue`s, compare the
   underlying `StorageLocation`s.

- If the `StorageLocation`s are the same, return a definite "true" as
the
result of the comparison. Before, if the `PointerValue`s were different,
we
would return an atom, even if the storage locations themselves were the
same.

- If the `StorageLocation`s are different, return an atom (as before).
Pointers
that have different storage locations may still alias, so we can't
return a
   definite "false" in this case.

The application-level gains from this are relatively modest. For the
Crubit
nullability check running on an internal codebase, this change reduces
the
number of functions on which the SAT solver times out from 223 to 221;
the
number of "pointer expression not modeled" errors reduces from 3815 to
3778.

Still, it seems that the gain in precision is generally worthwhile.

@Xazax-hun inspired me to think about this with his

[comments](#73860 (review))
on a different PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants