Skip to content

[clang][dataflow] Add synthetic fields to RecordStorageLocation #73860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,22 @@ runDataflowAnalysis(
return std::move(BlockStates);
}

// Create an analysis class that is derived from `DataflowAnalysis`. This is an
// SFINAE adapter that allows us to call two different variants of constructor
// (either with or without the optional `Environment` parameter).
// FIXME: Make all classes derived from `DataflowAnalysis` take an `Environment`
// parameter in their constructor so that we can get rid of this abomination.
template <typename AnalysisT>
auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
-> decltype(AnalysisT(ASTCtx, Env)) {
return AnalysisT(ASTCtx, Env);
}
template <typename AnalysisT>
auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
-> decltype(AnalysisT(ASTCtx)) {
return AnalysisT(ASTCtx);
}

/// Runs a dataflow analysis over the given function and then runs `Diagnoser`
/// over the results. Returns a list of diagnostics for `FuncDecl` or an
/// error. Currently, errors can occur (at least) because the analysis requires
Expand Down Expand Up @@ -262,7 +278,7 @@ llvm::Expected<llvm::SmallVector<Diagnostic>> diagnoseFunction(
const WatchedLiteralsSolver *Solver = OwnedSolver.get();
DataflowAnalysisContext AnalysisContext(std::move(OwnedSolver));
Environment Env(AnalysisContext, FuncDecl);
AnalysisT Analysis(ASTCtx);
AnalysisT Analysis = createAnalysis<AnalysisT>(ASTCtx, Env);
llvm::SmallVector<Diagnostic> Diagnostics;
if (llvm::Error Err =
runTypeErasedDataflowAnalysis(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ using FieldSet = llvm::SmallSetVector<const FieldDecl *, 4>;
/// Returns the set of all fields in the type.
FieldSet getObjectFields(QualType Type);

/// Returns whether `Fields` and `FieldLocs` contain the same fields.
bool containsSameFields(const FieldSet &Fields,
const RecordStorageLocation::FieldToLoc &FieldLocs);

struct ContextSensitiveOptions {
/// The maximum depth to analyze. A value of zero is equivalent to disabling
/// context-sensitive analysis entirely.
Expand Down Expand Up @@ -92,11 +96,39 @@ class DataflowAnalysisContext {
/*Logger=*/nullptr});
~DataflowAnalysisContext();

/// Sets a callback that returns the names and types of the synthetic fields
/// to add to a `RecordStorageLocation` of a given type.
/// Typically, this is called from the constructor of a `DataflowAnalysis`
///
/// To maintain the invariant that all `RecordStorageLocation`s of a given
/// type have the same fields:
/// * The callback must always return the same result for a given type
/// * `setSyntheticFieldCallback()` must be called before any
// `RecordStorageLocation`s are created.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be integrated into the constructor? Could be in a followup patch (to minimize further churn in this one).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if having a single SyntheticFieldCallback will be too restrictive in the future. What if an ananlysis wants to use modeling both for optionals and some other data structure, both having its own callbacks?

Although this is mostly food for thought for the future, no need to address in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be integrated into the constructor? Could be in a followup patch (to minimize further churn in this one).

Unfortunately, that doesn't work because setSyntheticFieldCallback() needs to be called by the DataflowAnalysis, and the DataflowAnalysis doesn't create the DataflowAnalysisContext (i.e. it can't pass anything to its constructor).

I think we could consider changing this though: I don't think there's anything that would prevent the DataflowAnalysis itself from creating the DataflowAnalysisContext, and then the synthetic field callback could simply be a constructor parameter. This would require changing existing analyses, however, so I don't think this is something to tackle as part of this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if having a single SyntheticFieldCallback will be too restrictive in the future. What if an ananlysis wants to use modeling both for optionals and some other data structure, both having its own callbacks?

A single synthetic field callback can handle any number of different types, so the analysis wouldn't need to set more than one callback.

Maybe, however, you're thinking of a scenario where we might want to run several analyses at the same time? I think this is something we've discussed in the past, and in this scenario each analysis would of course potentially want to be able to set its own synthetic field callback. As we currently don't have the ability to run more than one analysis at the same time, I think it is currently sufficient to only support a single callback as well.

void setSyntheticFieldCallback(
std::function<llvm::StringMap<QualType>(QualType)> CB) {
assert(!RecordStorageLocationCreated);
SyntheticFieldCallback = CB;
}

/// Returns a new storage location appropriate for `Type`.
///
/// A null `Type` is interpreted as the pointee type of `std::nullptr_t`.
StorageLocation &createStorageLocation(QualType Type);

/// Creates a `RecordStorageLocation` for the given type and with the given
/// fields.
///
/// Requirements:
///
/// `FieldLocs` must contain exactly the fields returned by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With requirements like this, I was wondering if this should be a private API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be nice (this is not the API that most callers should call), but this API is not just called from Environment (which is already a friend of the DataflowAnalysisContext) but also from Transfer.cpp -- and it's harder to set things up so that that latter call site is a friend.

In the end, I don't think it's worth the effort of "locking down" this API -- the documentation makes it clear that the function has pretty exacting preconditions (which we assert within the implementation), which makes the function unattractive to a potential caller anyway.

/// `getModeledFields(Type)`.
/// `SyntheticFields` must contain exactly the fields returned by
/// `getSyntheticFields(Type)`.
RecordStorageLocation &createRecordStorageLocation(
QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
RecordStorageLocation::SyntheticFieldMap SyntheticFields);

/// Returns a stable storage location for `D`.
StorageLocation &getStableStorageLocation(const ValueDecl &D);

Expand Down Expand Up @@ -169,6 +201,15 @@ class DataflowAnalysisContext {
/// context.
FieldSet getModeledFields(QualType Type);

/// Returns the names and types of the synthetic fields for the given record
/// type.
llvm::StringMap<QualType> getSyntheticFields(QualType Type) {
assert(Type->isRecordType());
if (SyntheticFieldCallback)
return SyntheticFieldCallback(Type);
return {};
}

private:
friend class Environment;

Expand Down Expand Up @@ -250,6 +291,11 @@ class DataflowAnalysisContext {
FieldSet ModeledFields;

std::unique_ptr<Logger> LogOwner; // If created via flags.

std::function<llvm::StringMap<QualType>(QualType)> SyntheticFieldCallback;

/// Has any `RecordStorageLocation` been created yet?
bool RecordStorageLocationCreated = false;
};

} // namespace dataflow
Expand Down
25 changes: 23 additions & 2 deletions clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,15 @@ class Environment {
/// with a symbolic representation of the `this` pointee.
Environment(DataflowAnalysisContext &DACtx, const DeclContext &DeclCtx);

/// Assigns storage locations and values to all parameters, captures, global
/// variables, fields and functions referenced in the function currently being
/// analyzed.
///
/// Requirements:
///
/// The function must have a body.
void initialize();

/// Returns a new environment that is a copy of this one.
///
/// The state of the program is initially the same, but can be mutated without
Expand Down Expand Up @@ -283,7 +292,15 @@ class Environment {
/// Returns the storage location assigned to the `this` pointee in the
/// environment or null if the `this` pointee has no assigned storage location
/// in the environment.
RecordStorageLocation *getThisPointeeStorageLocation() const;
RecordStorageLocation *getThisPointeeStorageLocation() const {
return ThisPointeeLoc;
}

/// Sets the storage location assigned to the `this` pointee in the
/// environment.
void setThisPointeeStorageLocation(RecordStorageLocation &Loc) {
ThisPointeeLoc = &Loc;
}

/// Returns the location of the result object for a record-type prvalue.
///
Expand Down Expand Up @@ -367,7 +384,8 @@ class Environment {
/// storage locations and values for indirections until it finds a
/// non-pointer/non-reference type.
///
/// If `Type` is a class, struct, or union type, calls `setValue()` to
/// If `Type` is a class, struct, or union type, creates values for all
/// modeled fields (including synthetic fields) and calls `setValue()` to
/// associate the `RecordValue` with its storage location
/// (`RecordValue::getLoc()`).
///
Expand Down Expand Up @@ -570,6 +588,9 @@ class Environment {
return dyn_cast<FunctionDecl>(getDeclCtx());
}

/// Returns the size of the call stack.
size_t callStackSize() const { return CallStack.size(); }

/// Returns whether this `Environment` can be extended to analyze the given
/// `Callee` (i.e. if `pushCall` can be used), with recursion disallowed and a
/// given `MaxDepth`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ struct UncheckedOptionalAccessModelOptions {
class UncheckedOptionalAccessModel
: public DataflowAnalysis<UncheckedOptionalAccessModel, NoopLattice> {
public:
UncheckedOptionalAccessModel(ASTContext &Ctx);
UncheckedOptionalAccessModel(ASTContext &Ctx, dataflow::Environment &Env);

/// Returns a matcher for the optional classes covered by this model.
static ast_matchers::DeclarationMatcher optionalClassDecl();
Expand All @@ -54,17 +54,6 @@ class UncheckedOptionalAccessModel

void transfer(const CFGElement &Elt, NoopLattice &L, Environment &Env);

ComparisonResult compare(QualType Type, const Value &Val1,
const Environment &Env1, const Value &Val2,
const Environment &Env2) override;

bool merge(QualType Type, const Value &Val1, const Environment &Env1,
const Value &Val2, const Environment &Env2, Value &MergedVal,
Environment &MergedEnv) override;

Value *widen(QualType Type, Value &Prev, const Environment &PrevEnv,
Value &Current, Environment &CurrentEnv) override;

private:
CFGMatchSwitch<TransferState<NoopLattice>> TransferMatchSwitch;
};
Expand Down
17 changes: 9 additions & 8 deletions clang/include/clang/Analysis/FlowSensitive/RecordOps.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ namespace dataflow {

/// Copies a record (struct, class, or union) from `Src` to `Dst`.
///
/// This performs a deep copy, i.e. it copies every field and recurses on
/// fields of record type. It also copies properties from the `RecordValue`
/// associated with `Src` to the `RecordValue` associated with `Dst` (if these
/// `RecordValue`s exist).
/// This performs a deep copy, i.e. it copies every field (including synthetic
/// fields) and recurses on fields of record type. It also copies properties
/// from the `RecordValue` associated with `Src` to the `RecordValue` associated
/// with `Dst` (if these `RecordValue`s exist).
///
/// If there is a `RecordValue` associated with `Dst` in the environment, this
/// function creates a new `RecordValue` and associates it with `Dst`; clients
Expand All @@ -47,10 +47,11 @@ void copyRecord(RecordStorageLocation &Src, RecordStorageLocation &Dst,
/// retrieved from `Env2`. A convenience overload retrieves values for `Loc1`
/// and `Loc2` from the same environment.
///
/// This performs a deep comparison, i.e. it compares every field and recurses
/// on fields of record type. Fields of reference type compare equal if they
/// refer to the same storage location. If `RecordValue`s are associated with
/// `Loc1` and `Loc2`, it also compares the properties on those `RecordValue`s.
/// This performs a deep comparison, i.e. it compares every field (including
/// synthetic fields) and recurses on fields of record type. Fields of reference
/// type compare equal if they refer to the same storage location. If
/// `RecordValue`s are associated with `Loc1` and Loc2`, it also compares the
/// properties on those `RecordValue`s.
///
/// Note on how to interpret the result:
/// - If this returns true, the records are guaranteed to be equal at runtime.
Expand Down
32 changes: 26 additions & 6 deletions clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,13 @@ class ScalarStorageLocation final : public StorageLocation {
///
/// Contains storage locations for all modeled fields of the record (also
/// referred to as "children"). The child map is flat, so accessible members of
/// the base class are directly accesible as children of this location.
/// the base class are directly accessible as children of this location.
///
/// Record storage locations may also contain so-called synthetic fields. These
/// are typically used to model the internal state of a class (e.g. the value
/// stored in a `std::optional`) without having to depend on that class's
/// implementation details. All `RecordStorageLocation`s of a given type should
/// have the same synthetic fields.
///
/// The storage location for a field of reference type may be null. This
/// typically occurs in one of two situations:
Expand All @@ -88,12 +94,12 @@ class ScalarStorageLocation final : public StorageLocation {
class RecordStorageLocation final : public StorageLocation {
public:
using FieldToLoc = llvm::DenseMap<const ValueDecl *, StorageLocation *>;
using SyntheticFieldMap = llvm::StringMap<StorageLocation *>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, if we want this map to not be indexed by strings. An alternative would be to somehow access the IdentifierTable and create an identifier from the names and use the pointer to the identifier as the key. This has the potential to be a bit lighter on memory (identifiers only stored once as opposed to storing the keys in each RecordStorageLocation instance), and potentially a bit quicker lookup.

That being said, I am not insisting on the change, and even if we want to do it, it is probably OK in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting idea, but I'm hesitant about mutating the IdentifierTable in something that is going to be used from a clang-tidy check, which will be sharing the IdentifierTable with other checks.

I suppose we could always create our own mapping from strings to integers or pointers of some sort, which could then serve as the keys for the SyntheticFieldMap -- but I would lean towards leaving this as a potential future optimization that we would do if this does indeed turn out to be a performance issue.


explicit RecordStorageLocation(QualType Type)
: RecordStorageLocation(Type, FieldToLoc()) {}

RecordStorageLocation(QualType Type, FieldToLoc TheChildren)
: StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)) {
RecordStorageLocation(QualType Type, FieldToLoc TheChildren,
SyntheticFieldMap TheSyntheticFields)
: StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)),
SyntheticFields(std::move(TheSyntheticFields)) {
assert(!Type.isNull());
assert(Type->isRecordType());
assert([this] {
Expand Down Expand Up @@ -133,6 +139,19 @@ class RecordStorageLocation final : public StorageLocation {
return It->second;
}

/// Returns the storage location for the synthetic field `Name`.
/// The synthetic field must exist.
StorageLocation &getSyntheticField(llvm::StringRef Name) const {
StorageLocation *Loc = SyntheticFields.lookup(Name);
assert(Loc != nullptr);
return *Loc;
}

llvm::iterator_range<SyntheticFieldMap::const_iterator>
synthetic_fields() const {
return {SyntheticFields.begin(), SyntheticFields.end()};
}

/// Changes the child storage location for a field `D` of reference type.
/// All other fields cannot change their storage location and always retain
/// the storage location passed to the `RecordStorageLocation` constructor.
Expand All @@ -151,6 +170,7 @@ class RecordStorageLocation final : public StorageLocation {

private:
FieldToLoc Children;
SyntheticFieldMap SyntheticFields;
};

} // namespace dataflow
Expand Down
40 changes: 39 additions & 1 deletion clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,38 @@ StorageLocation &DataflowAnalysisContext::createStorageLocation(QualType Type) {
else
FieldLocs.insert({Field, &createStorageLocation(
Field->getType().getNonReferenceType())});
return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs));

RecordStorageLocation::SyntheticFieldMap SyntheticFields;
for (const auto &Entry : getSyntheticFields(Type))
SyntheticFields.insert(
{Entry.getKey(),
&createStorageLocation(Entry.getValue().getNonReferenceType())});

return createRecordStorageLocation(Type, std::move(FieldLocs),
std::move(SyntheticFields));
}
return arena().create<ScalarStorageLocation>(Type);
}

// Returns the keys for a given `StringMap`.
// Can't use `StringSet` as the return type as it doesn't support `operator==`.
template <typename T>
static llvm::DenseSet<llvm::StringRef> getKeys(const llvm::StringMap<T> &Map) {
return llvm::DenseSet<llvm::StringRef>(Map.keys().begin(), Map.keys().end());
}

RecordStorageLocation &DataflowAnalysisContext::createRecordStorageLocation(
QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
RecordStorageLocation::SyntheticFieldMap SyntheticFields) {
assert(Type->isRecordType());
assert(containsSameFields(getModeledFields(Type), FieldLocs));
assert(getKeys(getSyntheticFields(Type)) == getKeys(SyntheticFields));

RecordStorageLocationCreated = true;
return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs),
std::move(SyntheticFields));
}

StorageLocation &
DataflowAnalysisContext::getStableStorageLocation(const ValueDecl &D) {
if (auto *Loc = DeclToLoc.lookup(&D))
Expand Down Expand Up @@ -367,3 +394,14 @@ clang::dataflow::FieldSet clang::dataflow::getObjectFields(QualType Type) {
getFieldsFromClassHierarchy(Type, Fields);
return Fields;
}

bool clang::dataflow::containsSameFields(
const clang::dataflow::FieldSet &Fields,
const clang::dataflow::RecordStorageLocation::FieldToLoc &FieldLocs) {
if (Fields.size() != FieldLocs.size())
return false;
for ([[maybe_unused]] auto [Field, Loc] : FieldLocs)
if (!Fields.contains(cast_or_null<FieldDecl>(Field)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the cases when we expect to see a null pointer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen for fields of reference type in the cases discussed here.

return false;
return true;
}
Loading