-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[clang][dataflow] Add synthetic fields to RecordStorageLocation
#73860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang][dataflow] Add synthetic fields to RecordStorageLocation
#73860
Conversation
The `Environment` is now only initialized with storage locations and values for global variables, parameters, and the like after the analysis has been created. Followup commits will add the ability to set callbacks that create synthetic properties and associate these synthetic properties with values. The analysis needs to be able to set these callbacks before any storage locations and values are created.
@llvm/pr-subscribers-clang-analysis Author: None (martinboehme) ChangesSynthetic fields are intended to model the internal state of a class (e.g. the value stored in a Today, this is typically done with properties on
To demonstrate the advantages of synthetic fields, this patch converts UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly simplifies the implementation of the check. This PR is pretty big; to make it easier to review, I have broken it down into a stack of three commits, each of which contains a set of logically related changes. I considered submitting each of these as a separate PR, but the commits only really make sense when taken together. To review, I suggest first looking at the changes in UncheckedOptionalAccessModel.cpp. This gives a flavor for how the various API changes work together in the context of an analysis. Then, review the rest of the changes. Patch is 63.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73860.diff 18 Files Affected:
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
index c5f14cfcd4f7bee..1dffbe8a09c3609 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
@@ -234,6 +234,22 @@ runDataflowAnalysis(
return std::move(BlockStates);
}
+// Create an analysis class that is derived from `DataflowAnalysis`. This is an
+// SFINAE adapter that allows us to call two different variants of constructor
+// (either with or without the optional `Environment` parameter).
+// FIXME: Make all classes derived from `DataflowAnalysis` take an `Environment`
+// parameter in their constructor so that we can get rid of this abomination.
+template <typename AnalysisT>
+auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
+ -> decltype(AnalysisT(ASTCtx, Env)) {
+ return AnalysisT(ASTCtx, Env);
+}
+template <typename AnalysisT>
+auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
+ -> decltype(AnalysisT(ASTCtx)) {
+ return AnalysisT(ASTCtx);
+}
+
/// Runs a dataflow analysis over the given function and then runs `Diagnoser`
/// over the results. Returns a list of diagnostics for `FuncDecl` or an
/// error. Currently, errors can occur (at least) because the analysis requires
@@ -262,7 +278,7 @@ llvm::Expected<llvm::SmallVector<Diagnostic>> diagnoseFunction(
const WatchedLiteralsSolver *Solver = OwnedSolver.get();
DataflowAnalysisContext AnalysisContext(std::move(OwnedSolver));
Environment Env(AnalysisContext, FuncDecl);
- AnalysisT Analysis(ASTCtx);
+ AnalysisT Analysis = createAnalysis<AnalysisT>(ASTCtx, Env);
llvm::SmallVector<Diagnostic> Diagnostics;
if (llvm::Error Err =
runTypeErasedDataflowAnalysis(
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
index c1aa8484817a9d9..20e45cc27b01fa8 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
@@ -58,6 +58,10 @@ using FieldSet = llvm::SmallSetVector<const FieldDecl *, 4>;
/// Returns the set of all fields in the type.
FieldSet getObjectFields(QualType Type);
+/// Returns whether `Fields` and `FieldLocs` contain the same fields.
+bool containsSameFields(const FieldSet &Fields,
+ const RecordStorageLocation::FieldToLoc &FieldLocs);
+
struct ContextSensitiveOptions {
/// The maximum depth to analyze. A value of zero is equivalent to disabling
/// context-sensitive analysis entirely.
@@ -92,11 +96,39 @@ class DataflowAnalysisContext {
/*Logger=*/nullptr});
~DataflowAnalysisContext();
+ /// Sets a callback that returns the names and types of the synthetic fields
+ /// to add to a `RecordStorageLocation` of a given type.
+ /// Typically, this is called from the constructor of a `DataflowAnalysis`
+ ///
+ /// To maintain the invariant that all `RecordStorageLocation`s of a given
+ /// type have the same fields:
+ /// * The callback must always return the same result for a given type
+ /// * `setSyntheticFieldCallback()` must be called before any
+ // `RecordStorageLocation`s are created.
+ void setSyntheticFieldCallback(
+ std::function<llvm::StringMap<QualType>(QualType)> CB) {
+ assert(!RecordStorageLocationCreated);
+ SyntheticFieldCallback = CB;
+ }
+
/// Returns a new storage location appropriate for `Type`.
///
/// A null `Type` is interpreted as the pointee type of `std::nullptr_t`.
StorageLocation &createStorageLocation(QualType Type);
+ /// Creates a `RecordStorageLocation` for the given type and with the given
+ /// fields.
+ ///
+ /// Requirements:
+ ///
+ /// `FieldLocs` must contain exactly the fields returned by
+ /// `getModeledFields(Type)`.
+ /// `SyntheticFields` must contain exactly the fields returned by
+ /// `getSyntheticFields(Type)`.
+ RecordStorageLocation &createRecordStorageLocation(
+ QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
+ RecordStorageLocation::SyntheticFieldMap SyntheticFields);
+
/// Returns a stable storage location for `D`.
StorageLocation &getStableStorageLocation(const ValueDecl &D);
@@ -169,6 +201,15 @@ class DataflowAnalysisContext {
/// context.
FieldSet getModeledFields(QualType Type);
+ /// Returns the names and types of the synthetic fields for the given record
+ /// type.
+ llvm::StringMap<QualType> getSyntheticFields(QualType Type) {
+ assert(Type->isRecordType());
+ if (SyntheticFieldCallback)
+ return SyntheticFieldCallback(Type);
+ return {};
+ }
+
private:
friend class Environment;
@@ -250,6 +291,11 @@ class DataflowAnalysisContext {
FieldSet ModeledFields;
std::unique_ptr<Logger> LogOwner; // If created via flags.
+
+ std::function<llvm::StringMap<QualType>(QualType)> SyntheticFieldCallback;
+
+ /// Has any `RecordStorageLocation` been created yet?
+ bool RecordStorageLocationCreated = false;
};
} // namespace dataflow
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index 7c1f5491096326b..8502f4da7e5414f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -166,6 +166,10 @@ class Environment {
/// with a symbolic representation of the `this` pointee.
Environment(DataflowAnalysisContext &DACtx, const DeclContext &DeclCtx);
+ /// Assigns storage locations and values to all global variables, fields
+ /// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
+ void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);
+
/// Returns a new environment that is a copy of this one.
///
/// The state of the program is initially the same, but can be mutated without
@@ -283,7 +287,15 @@ class Environment {
/// Returns the storage location assigned to the `this` pointee in the
/// environment or null if the `this` pointee has no assigned storage location
/// in the environment.
- RecordStorageLocation *getThisPointeeStorageLocation() const;
+ RecordStorageLocation *getThisPointeeStorageLocation() const {
+ return ThisPointeeLoc;
+ }
+
+ /// Sets the storage location assigned to the `this` pointee in the
+ /// environment.
+ void setThisPointeeStorageLocation(RecordStorageLocation &Loc) {
+ ThisPointeeLoc = &Loc;
+ }
/// Returns the location of the result object for a record-type prvalue.
///
@@ -367,7 +379,8 @@ class Environment {
/// storage locations and values for indirections until it finds a
/// non-pointer/non-reference type.
///
- /// If `Type` is a class, struct, or union type, calls `setValue()` to
+ /// If `Type` is a class, struct, or union type, creates values for all
+ /// modeled fields (including synthetic fields) and calls `setValue()` to
/// associate the `RecordValue` with its storage location
/// (`RecordValue::getLoc()`).
///
@@ -570,6 +583,9 @@ class Environment {
return dyn_cast<FunctionDecl>(getDeclCtx());
}
+ /// Returns the size of the call stack.
+ size_t callStackSize() const { return CallStack.size(); }
+
/// Returns whether this `Environment` can be extended to analyze the given
/// `Callee` (i.e. if `pushCall` can be used), with recursion disallowed and a
/// given `MaxDepth`.
@@ -629,10 +645,7 @@ class Environment {
void pushCallInternal(const FunctionDecl *FuncDecl,
ArrayRef<const Expr *> Args);
- /// Assigns storage locations and values to all global variables, fields
- /// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
- void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);
-
+private:
// `DACtx` is not null and not owned by this object.
DataflowAnalysisContext *DACtx;
diff --git a/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h b/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
index aeaf75b2154222c..09eb8b93822612f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
+++ b/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
@@ -45,7 +45,7 @@ struct UncheckedOptionalAccessModelOptions {
class UncheckedOptionalAccessModel
: public DataflowAnalysis<UncheckedOptionalAccessModel, NoopLattice> {
public:
- UncheckedOptionalAccessModel(ASTContext &Ctx);
+ UncheckedOptionalAccessModel(ASTContext &Ctx, dataflow::Environment &Env);
/// Returns a matcher for the optional classes covered by this model.
static ast_matchers::DeclarationMatcher optionalClassDecl();
@@ -54,17 +54,6 @@ class UncheckedOptionalAccessModel
void transfer(const CFGElement &Elt, NoopLattice &L, Environment &Env);
- ComparisonResult compare(QualType Type, const Value &Val1,
- const Environment &Env1, const Value &Val2,
- const Environment &Env2) override;
-
- bool merge(QualType Type, const Value &Val1, const Environment &Env1,
- const Value &Val2, const Environment &Env2, Value &MergedVal,
- Environment &MergedEnv) override;
-
- Value *widen(QualType Type, Value &Prev, const Environment &PrevEnv,
- Value &Current, Environment &CurrentEnv) override;
-
private:
CFGMatchSwitch<TransferState<NoopLattice>> TransferMatchSwitch;
};
diff --git a/clang/include/clang/Analysis/FlowSensitive/RecordOps.h b/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
index f007a947a82f132..7b87840d626b4b6 100644
--- a/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
+++ b/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
@@ -21,10 +21,10 @@ namespace dataflow {
/// Copies a record (struct, class, or union) from `Src` to `Dst`.
///
-/// This performs a deep copy, i.e. it copies every field and recurses on
-/// fields of record type. It also copies properties from the `RecordValue`
-/// associated with `Src` to the `RecordValue` associated with `Dst` (if these
-/// `RecordValue`s exist).
+/// This performs a deep copy, i.e. it copies every field (including synthetic
+/// fields) and recurses on fields of record type. It also copies properties
+/// from the `RecordValue` associated with `Src` to the `RecordValue` associated
+/// with `Dst` (if these `RecordValue`s exist).
///
/// If there is a `RecordValue` associated with `Dst` in the environment, this
/// function creates a new `RecordValue` and associates it with `Dst`; clients
@@ -47,10 +47,11 @@ void copyRecord(RecordStorageLocation &Src, RecordStorageLocation &Dst,
/// retrieved from `Env2`. A convenience overload retrieves values for `Loc1`
/// and `Loc2` from the same environment.
///
-/// This performs a deep comparison, i.e. it compares every field and recurses
-/// on fields of record type. Fields of reference type compare equal if they
-/// refer to the same storage location. If `RecordValue`s are associated with
-/// `Loc1` and `Loc2`, it also compares the properties on those `RecordValue`s.
+/// This performs a deep comparison, i.e. it compares every field (including
+/// synthetic fields) and recurses on fields of record type. Fields of reference
+/// type compare equal if they refer to the same storage location. If
+/// `RecordValue`s are associated with `Loc1` and Loc2`, it also compares the
+/// properties on those `RecordValue`s.
///
/// Note on how to interpret the result:
/// - If this returns true, the records are guaranteed to be equal at runtime.
diff --git a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
index 89d2bbfbb69f9fb..b7203378189d6bf 100644
--- a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
+++ b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
@@ -73,7 +73,13 @@ class ScalarStorageLocation final : public StorageLocation {
///
/// Contains storage locations for all modeled fields of the record (also
/// referred to as "children"). The child map is flat, so accessible members of
-/// the base class are directly accesible as children of this location.
+/// the base class are directly accessible as children of this location.
+///
+/// Record storage locations may also contain so-called synthetic fields. These
+/// are typically used to the internal state of a class (e.g. the value stored
+/// in a `std::optional`) without having to depend on that class's
+/// implementation details. All `RecordStorageLocation`s of a given type should
+/// have the same synthetic fields.
///
/// The storage location for a field of reference type may be null. This
/// typically occurs in one of two situations:
@@ -88,12 +94,12 @@ class ScalarStorageLocation final : public StorageLocation {
class RecordStorageLocation final : public StorageLocation {
public:
using FieldToLoc = llvm::DenseMap<const ValueDecl *, StorageLocation *>;
+ using SyntheticFieldMap = llvm::StringMap<StorageLocation *>;
- explicit RecordStorageLocation(QualType Type)
- : RecordStorageLocation(Type, FieldToLoc()) {}
-
- RecordStorageLocation(QualType Type, FieldToLoc TheChildren)
- : StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)) {
+ RecordStorageLocation(QualType Type, FieldToLoc TheChildren,
+ SyntheticFieldMap TheSyntheticFields)
+ : StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)),
+ SyntheticFields(std::move(TheSyntheticFields)) {
assert(!Type.isNull());
assert(Type->isRecordType());
assert([this] {
@@ -133,6 +139,19 @@ class RecordStorageLocation final : public StorageLocation {
return It->second;
}
+ /// Returns the storage location for the synthetic field `Name`.
+ /// The synthetic field must exist.
+ StorageLocation &getSyntheticField(llvm::StringRef Name) const {
+ StorageLocation *Loc = SyntheticFields.lookup(Name);
+ assert(Loc != nullptr);
+ return *Loc;
+ }
+
+ llvm::iterator_range<SyntheticFieldMap::const_iterator>
+ synthetic_fields() const {
+ return {SyntheticFields.begin(), SyntheticFields.end()};
+ }
+
/// Changes the child storage location for a field `D` of reference type.
/// All other fields cannot change their storage location and always retain
/// the storage location passed to the `RecordStorageLocation` constructor.
@@ -151,6 +170,7 @@ class RecordStorageLocation final : public StorageLocation {
private:
FieldToLoc Children;
+ SyntheticFieldMap SyntheticFields;
};
} // namespace dataflow
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
index 0a2fcd4ad695a30..fa114979c8e3263 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
@@ -68,11 +68,38 @@ StorageLocation &DataflowAnalysisContext::createStorageLocation(QualType Type) {
else
FieldLocs.insert({Field, &createStorageLocation(
Field->getType().getNonReferenceType())});
- return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs));
+
+ RecordStorageLocation::SyntheticFieldMap SyntheticFields;
+ for (const auto &Entry : getSyntheticFields(Type))
+ SyntheticFields.insert(
+ {Entry.getKey(),
+ &createStorageLocation(Entry.getValue().getNonReferenceType())});
+
+ return createRecordStorageLocation(Type, std::move(FieldLocs),
+ std::move(SyntheticFields));
}
return arena().create<ScalarStorageLocation>(Type);
}
+// Returns the keys for a given `StringMap`.
+// Can't use `StringSet` as the return type as it doesn't support `operator==`.
+template <typename T>
+static llvm::DenseSet<llvm::StringRef> getKeys(const llvm::StringMap<T> &Map) {
+ return llvm::DenseSet<llvm::StringRef>(Map.keys().begin(), Map.keys().end());
+}
+
+RecordStorageLocation &DataflowAnalysisContext::createRecordStorageLocation(
+ QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
+ RecordStorageLocation::SyntheticFieldMap SyntheticFields) {
+ assert(Type->isRecordType());
+ assert(containsSameFields(getModeledFields(Type), FieldLocs));
+ assert(getKeys(getSyntheticFields(Type)) == getKeys(SyntheticFields));
+
+ RecordStorageLocationCreated = true;
+ return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs),
+ std::move(SyntheticFields));
+}
+
StorageLocation &
DataflowAnalysisContext::getStableStorageLocation(const ValueDecl &D) {
if (auto *Loc = DeclToLoc.lookup(&D))
@@ -367,3 +394,14 @@ clang::dataflow::FieldSet clang::dataflow::getObjectFields(QualType Type) {
getFieldsFromClassHierarchy(Type, Fields);
return Fields;
}
+
+bool clang::dataflow::containsSameFields(
+ const clang::dataflow::FieldSet &Fields,
+ const clang::dataflow::RecordStorageLocation::FieldToLoc &FieldLocs) {
+ if (Fields.size() != FieldLocs.size())
+ return false;
+ for ([[maybe_unused]] auto [Field, Loc] : FieldLocs)
+ if (!Fields.contains(cast_or_null<FieldDecl>(Field)))
+ return false;
+ return true;
+}
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index 525ab188b01b8aa..983375da9cc3a83 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -367,6 +367,16 @@ getFieldsGlobalsAndFuncs(const Stmt &S, FieldSet &Fields,
}
}
+Environment::Environment(DataflowAnalysisContext &DACtx)
+ : DACtx(&DACtx),
+ FlowConditionToken(DACtx.arena().makeFlowConditionToken()) {}
+
+Environment::Environment(DataflowAnalysisContext &DACtx,
+ const DeclContext &DeclCtx)
+ : Environment(DACtx) {
+ CallStack.push_back(&DeclCtx);
+}
+
// FIXME: Add support for resetting globals after function calls to enable
// the implementation of sound analyses.
void Environment::initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl) {
@@ -416,59 +426,12 @@ void Environment::initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl) {
}
}
-Environment::Environment(DataflowAnalysisContext &DACtx)
- : DACtx(&DACtx),
- FlowConditionToken(DACtx.arena().makeFlowConditionToken()) {}
-
Environment Environment::fork() const {
Environment Copy(*this);
Copy.FlowConditionToken = DACtx->forkFlowCondition(FlowConditionToken);
return Copy;
}
-Environment::Environment(DataflowAnalysisContext &DACtx,
- const DeclContext &DeclCtx)
- : Environment(DACtx) {
- CallStack.push_back(&DeclCtx);
-
- if (const auto *FuncDecl = dyn_cast<FunctionDecl>(&DeclCtx)) {
- assert(FuncDecl->getBody() != nullptr);
-
- initFieldsGlobalsAndFuncs(FuncDecl);
-
- for (const auto *ParamDecl : FuncDecl->parameters()) {
- assert(ParamDecl != nullptr);
- setStorageLocation(*ParamDecl, createObject(*ParamDecl, nullptr));
- }
- }
-
- if (const auto *MethodDecl = dyn_cast<CXXMethodDecl>(&DeclCtx)) {
- auto *Parent = MethodDecl->getParent();
- assert(Parent != nullptr);
-
- if (Parent->isLambda()) {
- for (auto Capture : Parent->captures()) {
- if (Capture.capturesVariable()) {
- const auto *VarDecl = Capture.getCapturedVar();
- assert(VarDecl != nullptr);
- setStorageLocation(*VarDecl, createObject(*VarDecl, nullptr));
- } else if (Capture.capturesThis()) {
- const auto *SurroundingMethodDecl =
- cast<CXXMethodDecl>(DeclCtx.getNonClosureAncestor());
- QualType ThisPointeeType =
- SurroundingMethodDecl->getFunctionObjectParameterType();
- ThisPointeeLoc =
- &cast<RecordValue>(createValue(ThisPointeeType))->getLoc();
- }
- }
- } else if (MethodDecl->isImplicitObjectMemberFunction()) {
- QualType ThisPointeeType = MethodDecl->getFunctionObjectParameterType();
- ThisPointeeLoc =
- &cast<RecordVal...
[truncated]
|
@llvm/pr-subscribers-clang Author: None (martinboehme) ChangesSynthetic fields are intended to model the internal state of a class (e.g. the value stored in a Today, this is typically done with properties on
To demonstrate the advantages of synthetic fields, this patch converts UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly simplifies the implementation of the check. This PR is pretty big; to make it easier to review, I have broken it down into a stack of three commits, each of which contains a set of logically related changes. I considered submitting each of these as a separate PR, but the commits only really make sense when taken together. To review, I suggest first looking at the changes in UncheckedOptionalAccessModel.cpp. This gives a flavor for how the various API changes work together in the context of an analysis. Then, review the rest of the changes. Patch is 63.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73860.diff 18 Files Affected:
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
index c5f14cfcd4f7bee..1dffbe8a09c3609 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
@@ -234,6 +234,22 @@ runDataflowAnalysis(
return std::move(BlockStates);
}
+// Create an analysis class that is derived from `DataflowAnalysis`. This is an
+// SFINAE adapter that allows us to call two different variants of constructor
+// (either with or without the optional `Environment` parameter).
+// FIXME: Make all classes derived from `DataflowAnalysis` take an `Environment`
+// parameter in their constructor so that we can get rid of this abomination.
+template <typename AnalysisT>
+auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
+ -> decltype(AnalysisT(ASTCtx, Env)) {
+ return AnalysisT(ASTCtx, Env);
+}
+template <typename AnalysisT>
+auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
+ -> decltype(AnalysisT(ASTCtx)) {
+ return AnalysisT(ASTCtx);
+}
+
/// Runs a dataflow analysis over the given function and then runs `Diagnoser`
/// over the results. Returns a list of diagnostics for `FuncDecl` or an
/// error. Currently, errors can occur (at least) because the analysis requires
@@ -262,7 +278,7 @@ llvm::Expected<llvm::SmallVector<Diagnostic>> diagnoseFunction(
const WatchedLiteralsSolver *Solver = OwnedSolver.get();
DataflowAnalysisContext AnalysisContext(std::move(OwnedSolver));
Environment Env(AnalysisContext, FuncDecl);
- AnalysisT Analysis(ASTCtx);
+ AnalysisT Analysis = createAnalysis<AnalysisT>(ASTCtx, Env);
llvm::SmallVector<Diagnostic> Diagnostics;
if (llvm::Error Err =
runTypeErasedDataflowAnalysis(
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
index c1aa8484817a9d9..20e45cc27b01fa8 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
@@ -58,6 +58,10 @@ using FieldSet = llvm::SmallSetVector<const FieldDecl *, 4>;
/// Returns the set of all fields in the type.
FieldSet getObjectFields(QualType Type);
+/// Returns whether `Fields` and `FieldLocs` contain the same fields.
+bool containsSameFields(const FieldSet &Fields,
+ const RecordStorageLocation::FieldToLoc &FieldLocs);
+
struct ContextSensitiveOptions {
/// The maximum depth to analyze. A value of zero is equivalent to disabling
/// context-sensitive analysis entirely.
@@ -92,11 +96,39 @@ class DataflowAnalysisContext {
/*Logger=*/nullptr});
~DataflowAnalysisContext();
+ /// Sets a callback that returns the names and types of the synthetic fields
+ /// to add to a `RecordStorageLocation` of a given type.
+ /// Typically, this is called from the constructor of a `DataflowAnalysis`
+ ///
+ /// To maintain the invariant that all `RecordStorageLocation`s of a given
+ /// type have the same fields:
+ /// * The callback must always return the same result for a given type
+ /// * `setSyntheticFieldCallback()` must be called before any
+ // `RecordStorageLocation`s are created.
+ void setSyntheticFieldCallback(
+ std::function<llvm::StringMap<QualType>(QualType)> CB) {
+ assert(!RecordStorageLocationCreated);
+ SyntheticFieldCallback = CB;
+ }
+
/// Returns a new storage location appropriate for `Type`.
///
/// A null `Type` is interpreted as the pointee type of `std::nullptr_t`.
StorageLocation &createStorageLocation(QualType Type);
+ /// Creates a `RecordStorageLocation` for the given type and with the given
+ /// fields.
+ ///
+ /// Requirements:
+ ///
+ /// `FieldLocs` must contain exactly the fields returned by
+ /// `getModeledFields(Type)`.
+ /// `SyntheticFields` must contain exactly the fields returned by
+ /// `getSyntheticFields(Type)`.
+ RecordStorageLocation &createRecordStorageLocation(
+ QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
+ RecordStorageLocation::SyntheticFieldMap SyntheticFields);
+
/// Returns a stable storage location for `D`.
StorageLocation &getStableStorageLocation(const ValueDecl &D);
@@ -169,6 +201,15 @@ class DataflowAnalysisContext {
/// context.
FieldSet getModeledFields(QualType Type);
+ /// Returns the names and types of the synthetic fields for the given record
+ /// type.
+ llvm::StringMap<QualType> getSyntheticFields(QualType Type) {
+ assert(Type->isRecordType());
+ if (SyntheticFieldCallback)
+ return SyntheticFieldCallback(Type);
+ return {};
+ }
+
private:
friend class Environment;
@@ -250,6 +291,11 @@ class DataflowAnalysisContext {
FieldSet ModeledFields;
std::unique_ptr<Logger> LogOwner; // If created via flags.
+
+ std::function<llvm::StringMap<QualType>(QualType)> SyntheticFieldCallback;
+
+ /// Has any `RecordStorageLocation` been created yet?
+ bool RecordStorageLocationCreated = false;
};
} // namespace dataflow
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index 7c1f5491096326b..8502f4da7e5414f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -166,6 +166,10 @@ class Environment {
/// with a symbolic representation of the `this` pointee.
Environment(DataflowAnalysisContext &DACtx, const DeclContext &DeclCtx);
+ /// Assigns storage locations and values to all global variables, fields
+ /// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
+ void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);
+
/// Returns a new environment that is a copy of this one.
///
/// The state of the program is initially the same, but can be mutated without
@@ -283,7 +287,15 @@ class Environment {
/// Returns the storage location assigned to the `this` pointee in the
/// environment or null if the `this` pointee has no assigned storage location
/// in the environment.
- RecordStorageLocation *getThisPointeeStorageLocation() const;
+ RecordStorageLocation *getThisPointeeStorageLocation() const {
+ return ThisPointeeLoc;
+ }
+
+ /// Sets the storage location assigned to the `this` pointee in the
+ /// environment.
+ void setThisPointeeStorageLocation(RecordStorageLocation &Loc) {
+ ThisPointeeLoc = &Loc;
+ }
/// Returns the location of the result object for a record-type prvalue.
///
@@ -367,7 +379,8 @@ class Environment {
/// storage locations and values for indirections until it finds a
/// non-pointer/non-reference type.
///
- /// If `Type` is a class, struct, or union type, calls `setValue()` to
+ /// If `Type` is a class, struct, or union type, creates values for all
+ /// modeled fields (including synthetic fields) and calls `setValue()` to
/// associate the `RecordValue` with its storage location
/// (`RecordValue::getLoc()`).
///
@@ -570,6 +583,9 @@ class Environment {
return dyn_cast<FunctionDecl>(getDeclCtx());
}
+ /// Returns the size of the call stack.
+ size_t callStackSize() const { return CallStack.size(); }
+
/// Returns whether this `Environment` can be extended to analyze the given
/// `Callee` (i.e. if `pushCall` can be used), with recursion disallowed and a
/// given `MaxDepth`.
@@ -629,10 +645,7 @@ class Environment {
void pushCallInternal(const FunctionDecl *FuncDecl,
ArrayRef<const Expr *> Args);
- /// Assigns storage locations and values to all global variables, fields
- /// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
- void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);
-
+private:
// `DACtx` is not null and not owned by this object.
DataflowAnalysisContext *DACtx;
diff --git a/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h b/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
index aeaf75b2154222c..09eb8b93822612f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
+++ b/clang/include/clang/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.h
@@ -45,7 +45,7 @@ struct UncheckedOptionalAccessModelOptions {
class UncheckedOptionalAccessModel
: public DataflowAnalysis<UncheckedOptionalAccessModel, NoopLattice> {
public:
- UncheckedOptionalAccessModel(ASTContext &Ctx);
+ UncheckedOptionalAccessModel(ASTContext &Ctx, dataflow::Environment &Env);
/// Returns a matcher for the optional classes covered by this model.
static ast_matchers::DeclarationMatcher optionalClassDecl();
@@ -54,17 +54,6 @@ class UncheckedOptionalAccessModel
void transfer(const CFGElement &Elt, NoopLattice &L, Environment &Env);
- ComparisonResult compare(QualType Type, const Value &Val1,
- const Environment &Env1, const Value &Val2,
- const Environment &Env2) override;
-
- bool merge(QualType Type, const Value &Val1, const Environment &Env1,
- const Value &Val2, const Environment &Env2, Value &MergedVal,
- Environment &MergedEnv) override;
-
- Value *widen(QualType Type, Value &Prev, const Environment &PrevEnv,
- Value &Current, Environment &CurrentEnv) override;
-
private:
CFGMatchSwitch<TransferState<NoopLattice>> TransferMatchSwitch;
};
diff --git a/clang/include/clang/Analysis/FlowSensitive/RecordOps.h b/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
index f007a947a82f132..7b87840d626b4b6 100644
--- a/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
+++ b/clang/include/clang/Analysis/FlowSensitive/RecordOps.h
@@ -21,10 +21,10 @@ namespace dataflow {
/// Copies a record (struct, class, or union) from `Src` to `Dst`.
///
-/// This performs a deep copy, i.e. it copies every field and recurses on
-/// fields of record type. It also copies properties from the `RecordValue`
-/// associated with `Src` to the `RecordValue` associated with `Dst` (if these
-/// `RecordValue`s exist).
+/// This performs a deep copy, i.e. it copies every field (including synthetic
+/// fields) and recurses on fields of record type. It also copies properties
+/// from the `RecordValue` associated with `Src` to the `RecordValue` associated
+/// with `Dst` (if these `RecordValue`s exist).
///
/// If there is a `RecordValue` associated with `Dst` in the environment, this
/// function creates a new `RecordValue` and associates it with `Dst`; clients
@@ -47,10 +47,11 @@ void copyRecord(RecordStorageLocation &Src, RecordStorageLocation &Dst,
/// retrieved from `Env2`. A convenience overload retrieves values for `Loc1`
/// and `Loc2` from the same environment.
///
-/// This performs a deep comparison, i.e. it compares every field and recurses
-/// on fields of record type. Fields of reference type compare equal if they
-/// refer to the same storage location. If `RecordValue`s are associated with
-/// `Loc1` and `Loc2`, it also compares the properties on those `RecordValue`s.
+/// This performs a deep comparison, i.e. it compares every field (including
+/// synthetic fields) and recurses on fields of record type. Fields of reference
+/// type compare equal if they refer to the same storage location. If
+/// `RecordValue`s are associated with `Loc1` and Loc2`, it also compares the
+/// properties on those `RecordValue`s.
///
/// Note on how to interpret the result:
/// - If this returns true, the records are guaranteed to be equal at runtime.
diff --git a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
index 89d2bbfbb69f9fb..b7203378189d6bf 100644
--- a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
+++ b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
@@ -73,7 +73,13 @@ class ScalarStorageLocation final : public StorageLocation {
///
/// Contains storage locations for all modeled fields of the record (also
/// referred to as "children"). The child map is flat, so accessible members of
-/// the base class are directly accesible as children of this location.
+/// the base class are directly accessible as children of this location.
+///
+/// Record storage locations may also contain so-called synthetic fields. These
+/// are typically used to the internal state of a class (e.g. the value stored
+/// in a `std::optional`) without having to depend on that class's
+/// implementation details. All `RecordStorageLocation`s of a given type should
+/// have the same synthetic fields.
///
/// The storage location for a field of reference type may be null. This
/// typically occurs in one of two situations:
@@ -88,12 +94,12 @@ class ScalarStorageLocation final : public StorageLocation {
class RecordStorageLocation final : public StorageLocation {
public:
using FieldToLoc = llvm::DenseMap<const ValueDecl *, StorageLocation *>;
+ using SyntheticFieldMap = llvm::StringMap<StorageLocation *>;
- explicit RecordStorageLocation(QualType Type)
- : RecordStorageLocation(Type, FieldToLoc()) {}
-
- RecordStorageLocation(QualType Type, FieldToLoc TheChildren)
- : StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)) {
+ RecordStorageLocation(QualType Type, FieldToLoc TheChildren,
+ SyntheticFieldMap TheSyntheticFields)
+ : StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)),
+ SyntheticFields(std::move(TheSyntheticFields)) {
assert(!Type.isNull());
assert(Type->isRecordType());
assert([this] {
@@ -133,6 +139,19 @@ class RecordStorageLocation final : public StorageLocation {
return It->second;
}
+ /// Returns the storage location for the synthetic field `Name`.
+ /// The synthetic field must exist.
+ StorageLocation &getSyntheticField(llvm::StringRef Name) const {
+ StorageLocation *Loc = SyntheticFields.lookup(Name);
+ assert(Loc != nullptr);
+ return *Loc;
+ }
+
+ llvm::iterator_range<SyntheticFieldMap::const_iterator>
+ synthetic_fields() const {
+ return {SyntheticFields.begin(), SyntheticFields.end()};
+ }
+
/// Changes the child storage location for a field `D` of reference type.
/// All other fields cannot change their storage location and always retain
/// the storage location passed to the `RecordStorageLocation` constructor.
@@ -151,6 +170,7 @@ class RecordStorageLocation final : public StorageLocation {
private:
FieldToLoc Children;
+ SyntheticFieldMap SyntheticFields;
};
} // namespace dataflow
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
index 0a2fcd4ad695a30..fa114979c8e3263 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
@@ -68,11 +68,38 @@ StorageLocation &DataflowAnalysisContext::createStorageLocation(QualType Type) {
else
FieldLocs.insert({Field, &createStorageLocation(
Field->getType().getNonReferenceType())});
- return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs));
+
+ RecordStorageLocation::SyntheticFieldMap SyntheticFields;
+ for (const auto &Entry : getSyntheticFields(Type))
+ SyntheticFields.insert(
+ {Entry.getKey(),
+ &createStorageLocation(Entry.getValue().getNonReferenceType())});
+
+ return createRecordStorageLocation(Type, std::move(FieldLocs),
+ std::move(SyntheticFields));
}
return arena().create<ScalarStorageLocation>(Type);
}
+// Returns the keys for a given `StringMap`.
+// Can't use `StringSet` as the return type as it doesn't support `operator==`.
+template <typename T>
+static llvm::DenseSet<llvm::StringRef> getKeys(const llvm::StringMap<T> &Map) {
+ return llvm::DenseSet<llvm::StringRef>(Map.keys().begin(), Map.keys().end());
+}
+
+RecordStorageLocation &DataflowAnalysisContext::createRecordStorageLocation(
+ QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
+ RecordStorageLocation::SyntheticFieldMap SyntheticFields) {
+ assert(Type->isRecordType());
+ assert(containsSameFields(getModeledFields(Type), FieldLocs));
+ assert(getKeys(getSyntheticFields(Type)) == getKeys(SyntheticFields));
+
+ RecordStorageLocationCreated = true;
+ return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs),
+ std::move(SyntheticFields));
+}
+
StorageLocation &
DataflowAnalysisContext::getStableStorageLocation(const ValueDecl &D) {
if (auto *Loc = DeclToLoc.lookup(&D))
@@ -367,3 +394,14 @@ clang::dataflow::FieldSet clang::dataflow::getObjectFields(QualType Type) {
getFieldsFromClassHierarchy(Type, Fields);
return Fields;
}
+
+bool clang::dataflow::containsSameFields(
+ const clang::dataflow::FieldSet &Fields,
+ const clang::dataflow::RecordStorageLocation::FieldToLoc &FieldLocs) {
+ if (Fields.size() != FieldLocs.size())
+ return false;
+ for ([[maybe_unused]] auto [Field, Loc] : FieldLocs)
+ if (!Fields.contains(cast_or_null<FieldDecl>(Field)))
+ return false;
+ return true;
+}
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index 525ab188b01b8aa..983375da9cc3a83 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -367,6 +367,16 @@ getFieldsGlobalsAndFuncs(const Stmt &S, FieldSet &Fields,
}
}
+Environment::Environment(DataflowAnalysisContext &DACtx)
+ : DACtx(&DACtx),
+ FlowConditionToken(DACtx.arena().makeFlowConditionToken()) {}
+
+Environment::Environment(DataflowAnalysisContext &DACtx,
+ const DeclContext &DeclCtx)
+ : Environment(DACtx) {
+ CallStack.push_back(&DeclCtx);
+}
+
// FIXME: Add support for resetting globals after function calls to enable
// the implementation of sound analyses.
void Environment::initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl) {
@@ -416,59 +426,12 @@ void Environment::initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl) {
}
}
-Environment::Environment(DataflowAnalysisContext &DACtx)
- : DACtx(&DACtx),
- FlowConditionToken(DACtx.arena().makeFlowConditionToken()) {}
-
Environment Environment::fork() const {
Environment Copy(*this);
Copy.FlowConditionToken = DACtx->forkFlowCondition(FlowConditionToken);
return Copy;
}
-Environment::Environment(DataflowAnalysisContext &DACtx,
- const DeclContext &DeclCtx)
- : Environment(DACtx) {
- CallStack.push_back(&DeclCtx);
-
- if (const auto *FuncDecl = dyn_cast<FunctionDecl>(&DeclCtx)) {
- assert(FuncDecl->getBody() != nullptr);
-
- initFieldsGlobalsAndFuncs(FuncDecl);
-
- for (const auto *ParamDecl : FuncDecl->parameters()) {
- assert(ParamDecl != nullptr);
- setStorageLocation(*ParamDecl, createObject(*ParamDecl, nullptr));
- }
- }
-
- if (const auto *MethodDecl = dyn_cast<CXXMethodDecl>(&DeclCtx)) {
- auto *Parent = MethodDecl->getParent();
- assert(Parent != nullptr);
-
- if (Parent->isLambda()) {
- for (auto Capture : Parent->captures()) {
- if (Capture.capturesVariable()) {
- const auto *VarDecl = Capture.getCapturedVar();
- assert(VarDecl != nullptr);
- setStorageLocation(*VarDecl, createObject(*VarDecl, nullptr));
- } else if (Capture.capturesThis()) {
- const auto *SurroundingMethodDecl =
- cast<CXXMethodDecl>(DeclCtx.getNonClosureAncestor());
- QualType ThisPointeeType =
- SurroundingMethodDecl->getFunctionObjectParameterType();
- ThisPointeeLoc =
- &cast<RecordValue>(createValue(ThisPointeeType))->getLoc();
- }
- }
- } else if (MethodDecl->isImplicitObjectMemberFunction()) {
- QualType ThisPointeeType = MethodDecl->getFunctionObjectParameterType();
- ThisPointeeLoc =
- &cast<RecordVal...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing! I spent a while searching for changes to join/widen to account for synthetic fields and then realized that (of course!) no special handling is needed!
I also love that we now have the hooks we've wanted since the beginning.
Thank you!
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
Outdated
Show resolved
Hide resolved
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
Outdated
Show resolved
Hide resolved
/// type have the same fields: | ||
/// * The callback must always return the same result for a given type | ||
/// * `setSyntheticFieldCallback()` must be called before any | ||
// `RecordStorageLocation`s are created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be integrated into the constructor? Could be in a followup patch (to minimize further churn in this one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if having a single SyntheticFieldCallback
will be too restrictive in the future. What if an ananlysis wants to use modeling both for optionals and some other data structure, both having its own callbacks?
Although this is mostly food for thought for the future, no need to address in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be integrated into the constructor? Could be in a followup patch (to minimize further churn in this one).
Unfortunately, that doesn't work because setSyntheticFieldCallback()
needs to be called by the DataflowAnalysis
, and the DataflowAnalysis
doesn't create the DataflowAnalysisContext
(i.e. it can't pass anything to its constructor).
I think we could consider changing this though: I don't think there's anything that would prevent the DataflowAnalysis
itself from creating the DataflowAnalysisContext
, and then the synthetic field callback could simply be a constructor parameter. This would require changing existing analyses, however, so I don't think this is something to tackle as part of this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if having a single
SyntheticFieldCallback
will be too restrictive in the future. What if an ananlysis wants to use modeling both for optionals and some other data structure, both having its own callbacks?
A single synthetic field callback can handle any number of different types, so the analysis wouldn't need to set more than one callback.
Maybe, however, you're thinking of a scenario where we might want to run several analyses at the same time? I think this is something we've discussed in the past, and in this scenario each analysis would of course potentially want to be able to set its own synthetic field callback. As we currently don't have the ability to run more than one analysis at the same time, I think it is currently sufficient to only support a single callback as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally love this direction, this was just a first pass, not a full review. I plan to take a second look but wanted to familiarize myself with the PR and start some conversations.
/// | ||
/// Requirements: | ||
/// | ||
/// `FieldLocs` must contain exactly the fields returned by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With requirements like this, I was wondering if this should be a private API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this would be nice (this is not the API that most callers should call), but this API is not just called from Environment
(which is already a friend of the DataflowAnalysisContext) but also from Transfer.cpp -- and it's harder to set things up so that that latter call site is a friend.
In the end, I don't think it's worth the effort of "locking down" this API -- the documentation makes it clear that the function has pretty exacting preconditions (which we assert within the implementation), which makes the function unattractive to a potential caller anyway.
@@ -88,12 +94,12 @@ class ScalarStorageLocation final : public StorageLocation { | |||
class RecordStorageLocation final : public StorageLocation { | |||
public: | |||
using FieldToLoc = llvm::DenseMap<const ValueDecl *, StorageLocation *>; | |||
using SyntheticFieldMap = llvm::StringMap<StorageLocation *>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering, if we want this map to not be indexed by strings. An alternative would be to somehow access the IdentifierTable
and create an identifier from the names and use the pointer to the identifier as the key. This has the potential to be a bit lighter on memory (identifiers only stored once as opposed to storing the keys in each RecordStorageLocation instance), and potentially a bit quicker lookup.
That being said, I am not insisting on the change, and even if we want to do it, it is probably OK in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting idea, but I'm hesitant about mutating the IdentifierTable
in something that is going to be used from a clang-tidy check, which will be sharing the IdentifierTable
with other checks.
I suppose we could always create our own mapping from strings to integers or pointers of some sort, which could then serve as the keys for the SyntheticFieldMap
-- but I would lean towards leaving this as a potential future optimization that we would do if this does indeed turn out to be a performance issue.
/// type have the same fields: | ||
/// * The callback must always return the same result for a given type | ||
/// * `setSyntheticFieldCallback()` must be called before any | ||
// `RecordStorageLocation`s are created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if having a single SyntheticFieldCallback
will be too restrictive in the future. What if an ananlysis wants to use modeling both for optionals and some other data structure, both having its own callbacks?
Although this is mostly food for thought for the future, no need to address in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I love it! I have some comments on some features in the future, but those are probably not going to be addressed any time soon. First of all, I think in the future when we reason about the values of the pointers, synthetic fields might need to have offsets to be able to know things like &modeled.getField1() != &modeled.getField2()
. Offsets might also be a way to support unions, multiple synthetic fields starting at the same offset might express just that.
Another big item is properly supporting inheritance. Consider:
struct B { int a; };
struct D : /*virtual*/ B {};
struct E : /*virtual*/ B {};
struct F: D, E {
void m() {
int* l = &(D::a);
int* m = &(E::a);
}
};
Here, whether we have the same or different memory locations for D::a
and E::a
depends on whether we are doing virtual inheritance. This is something that should work both for regular fields and synthetic fields.
if (Fields.size() != FieldLocs.size()) | ||
return false; | ||
for ([[maybe_unused]] auto [Field, Loc] : FieldLocs) | ||
if (!Fields.contains(cast_or_null<FieldDecl>(Field))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the cases when we expect to see a null pointer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can happen for fields of reference type in the cases discussed here.
Agree. We do already mostly support this particular case; the transfer function for (in)equality on pointer values would need to be strengthened to compare not the
True. And this isn't just restricted to synthetic fields -- regular fields of a union should receive the same treatment. If I understand the standard right, the only thing that's really guaranteed about the fields of a union is that they all share the same address; writing to one field, then reading from another is undefined behavior (though many compilers implement this as a non-standard language extension). Currently, when comparing the addresses of different fields of a union, our transfer function produces an atomic (see above). Interestingly, this isn't actually wrong, per se, just inaccurate. Anyway, though, I think it's relatively rare that a program depends on the behavior of comparing the addresses of two different fields of a union, so I don't think it's a high priority to fix this.
Agreed, we also don't do this correctly yet. Again, though, I think this is lower priority than many of the other gaps in functionality that we have, as many style guides discourage virtual inheritance anyway, and in the cases where we see it in the code, I would guess that the result of our analysis often doesn't depend on modelling this correctly. Thanks for all of these observations -- I think at least modelling pointer comparison more accurately is something we should look at soon. |
- Instead of comparing the identity of the `PointerValue`s, compare the underlying `StorageLocation`s. - If the `StorageLocation`s are different, return a definite "false" as the result of the comparison. Before, if the `PointerValue`s were different, the best we could do was to return an atom (because the `StorageLocation`s might still be the same). On an internal codebase, this change reduces SAT solver timeouts by over 20% and "maximum iterations reached" errors by over 50%. In addition, it obviously improves the precision of the analysis. @Xazax-hun inspired me to think about this with his [comments](llvm#73860 (review)) on a different PR. The one thing where the new code currently does the wrong thing is when comparing the addresses of different members of a union. By the language standard, all members of a union should have the same address, but we currently model them with different `StorageLocation`s, and so with this change, we will return false when comparing the addreses. I propose that this is acceptable because is unlikely to affect the behavior of real-world code in meaningful ways. With this change, the test TransferTest.DifferentReferenceLocInJoin started to fail because the code under test no longer set up the desired state where a variable of reference type is mapped to two different storage locations in environments being joined. Rather than trying to modify this test to set up the test condition again, I have chosen to replace the test with an equivalent test in DataflowEnvironmentTest.cpp that sets up the test condition directly; because this test is more direct, it will also be less brittle in the face of future changes.
Instead, synthetic fields should now be used for the same purpose. These have a number of advantages, as described in llvm#73860, and longer-term, we want to eliminate `RecordValue` entirely. As `RecordValue`s cannot have properties any more, I have replaced the `OptionalIntAnalysis` with an equivalent analysis that tracks nullness of pointers (instead of whether an optional has a value). This serves the same purpose, namely to check whether the framework applies a custom `merge()` operation to widen properties.
) Instead, synthetic fields should now be used for the same purpose. These have a number of advantages, as described in #73860, and longer-term, we want to eliminate `RecordValue` entirely. As `RecordValue`s cannot have properties any more, I have replaced the `OptionalIntAnalysis` with an equivalent analysis that tracks nullness of pointers (instead of whether an optional has a value). This serves the same purpose, namely to check whether the framework applies a custom `merge()` operation to widen properties.
…042) Instead, synthetic fields should now be used for the same purpose. These have a number of advantages, as described in llvm/llvm-project#73860, and longer-term, we want to eliminate `RecordValue` entirely. As `RecordValue`s cannot have properties any more, I have replaced the `OptionalIntAnalysis` with an equivalent analysis that tracks nullness of pointers (instead of whether an optional has a value). This serves the same purpose, namely to check whether the framework applies a custom `merge()` operation to widen properties.
- Instead of comparing the identity of the `PointerValue`s, compare the underlying `StorageLocation`s. - If the `StorageLocation`s are the same, return a definite "true" as the result of the comparison. Before, if the `PointerValue`s were different, we would return an atom, even if the storage locations themselves were the same. - If the `StorageLocation`s are different, return an atom (as before). Pointers that have different storage locations may still alias, so we can't return a definite "false" in this case. The application-level gains from this are relatively modest. For the Crubit nullability check running on an internal codebase, this change reduces the number of functions on which the SAT solver times out from 223 to 221; the number of "pointer expression not modeled" errors reduces from 3815 to 3778. Still, it seems that the gain in precision is generally worthwhile. @Xazax-hun inspired me to think about this with his [comments](llvm#73860 (review)) on a different PR.
- Instead of comparing the identity of the `PointerValue`s, compare the underlying `StorageLocation`s. - If the `StorageLocation`s are the same, return a definite "true" as the result of the comparison. Before, if the `PointerValue`s were different, we would return an atom, even if the storage locations themselves were the same. - If the `StorageLocation`s are different, return an atom (as before). Pointers that have different storage locations may still alias, so we can't return a definite "false" in this case. The application-level gains from this are relatively modest. For the Crubit nullability check running on an internal codebase, this change reduces the number of functions on which the SAT solver times out from 223 to 221; the number of "pointer expression not modeled" errors reduces from 3815 to 3778. Still, it seems that the gain in precision is generally worthwhile. @Xazax-hun inspired me to think about this with his [comments](#73860 (review)) on a different PR.
Synthetic fields are intended to model the internal state of a class (e.g. the value stored in a
std::optional
) without having to depend on that class's implementation details.Today, this is typically done with properties on
RecordValue
s, but these have several drawbacks:Care must be taken to call
refreshRecordValue()
before modifying a property so that the modified property values aren’t seen by other environments that may have access to the sameRecordValue
.Properties aren’t associated with a storage location. If an analysis needs to associate a location with the value stored in a property (e.g. to model the reference returned by
std::optional::value()
), it needs to manually add an indirection using aPointerValue
. (See for example the way this is done in UncheckedOptionalAccessModel.cpp, specifically inmaybeInitializeOptionalValueMember()
.)Properties don’t participate in the builtin compare, join, and widen operations. If an analysis needs to apply these operations to properties, it needs to override the corresponding methods of
ValueModel
.Longer-term, we plan to eliminate
RecordValue
, as by-value operations on records aren’t really “a thing” in C++ (see https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This would obviously eliminate the ability to set properties onRecordValue
s.To demonstrate the advantages of synthetic fields, this patch converts UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly simplifies the implementation of the check.
This PR is pretty big; to make it easier to review, I have broken it down into a stack of three commits, each of which contains a set of logically related changes. I considered submitting each of these as a separate PR, but the commits only really make sense when taken together.
To review, I suggest first looking at the changes in UncheckedOptionalAccessModel.cpp. This gives a flavor for how the various API changes work together in the context of an analysis. Then, review the rest of the changes.