Skip to content

[clang][dataflow] Propagate locations from result objects to initializers. #87320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 44 additions & 20 deletions clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "llvm/ADT/MapVector.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Support/ErrorHandling.h"
#include <memory>
#include <type_traits>
#include <utility>

Expand Down Expand Up @@ -344,17 +345,6 @@ class Environment {
/// location of the result object to pass in `this`, even though prvalues are
/// otherwise not associated with storage locations.
///
/// FIXME: Currently, this simply returns a stable storage location for `E`,
/// but this doesn't do the right thing in scenarios like the following:
/// ```
/// MyClass c = some_condition()? MyClass(foo) : MyClass(bar);
/// ```
/// Here, `MyClass(foo)` and `MyClass(bar)` will have two different storage
/// locations, when in fact their storage locations should be the same.
/// Eventually, we want to propagate storage locations from result objects
/// down to the prvalues that initialize them, similar to the way that this is
/// done in Clang's CodeGen.
///
/// Requirements:
/// `E` must be a prvalue of record type.
RecordStorageLocation &
Expand Down Expand Up @@ -462,7 +452,13 @@ class Environment {
/// Initializes the fields (including synthetic fields) of `Loc` with values,
/// unless values of the field type are not supported or we hit one of the
/// limits at which we stop producing values.
void initializeFieldsWithValues(RecordStorageLocation &Loc);
/// If `Type` is provided, initializes only those fields that are modeled for
/// `Type`; this is intended for use in cases where `Loc` is a derived type
/// and we only want to initialize the fields of a base type.
void initializeFieldsWithValues(RecordStorageLocation &Loc, QualType Type);
void initializeFieldsWithValues(RecordStorageLocation &Loc) {
initializeFieldsWithValues(Loc, Loc.getType());
}

/// Assigns `Val` as the value of `Loc` in the environment.
void setValue(const StorageLocation &Loc, Value &Val);
Expand Down Expand Up @@ -653,6 +649,9 @@ class Environment {
LLVM_DUMP_METHOD void dump(raw_ostream &OS) const;

private:
using PrValueToResultObject =
llvm::DenseMap<const Expr *, RecordStorageLocation *>;

// The copy-constructor is for use in fork() only.
Environment(const Environment &) = default;

Expand Down Expand Up @@ -682,8 +681,10 @@ class Environment {
/// Initializes the fields (including synthetic fields) of `Loc` with values,
/// unless values of the field type are not supported or we hit one of the
/// limits at which we stop producing values (controlled by `Visited`,
/// `Depth`, and `CreatedValuesCount`).
void initializeFieldsWithValues(RecordStorageLocation &Loc,
/// `Depth`, and `CreatedValuesCount`). If `Type` is different from
/// `Loc.getType()`, initializes only those fields that are modeled for
/// `Type`.
void initializeFieldsWithValues(RecordStorageLocation &Loc, QualType Type,
llvm::DenseSet<QualType> &Visited, int Depth,
int &CreatedValuesCount);

Expand All @@ -702,22 +703,45 @@ class Environment {
/// and functions referenced in `FuncDecl`. `FuncDecl` must have a body.
void initFieldsGlobalsAndFuncs(const FunctionDecl *FuncDecl);

static PrValueToResultObject
buildResultObjectMap(DataflowAnalysisContext *DACtx,
const FunctionDecl *FuncDecl,
RecordStorageLocation *ThisPointeeLoc,
RecordStorageLocation *LocForRecordReturnVal);

// `DACtx` is not null and not owned by this object.
DataflowAnalysisContext *DACtx;

// FIXME: move the fields `CallStack`, `ReturnVal`, `ReturnLoc` and
// `ThisPointeeLoc` into a separate call-context object, shared between
// environments in the same call.
// FIXME: move the fields `CallStack`, `ResultObjectMap`, `ReturnVal`,
// `ReturnLoc` and `ThisPointeeLoc` into a separate call-context object,
// shared between environments in the same call.
// https://github.com/llvm/llvm-project/issues/59005

// `DeclContext` of the block being analysed if provided.
std::vector<const DeclContext *> CallStack;

// Value returned by the function (if it has non-reference return type).
// Maps from prvalues of record type to their result objects. Shared between
// all environments for the same function.
// FIXME: It's somewhat unsatisfactory that we have to use a `shared_ptr`
// here, though the cost is acceptable: The overhead of a `shared_ptr` is
// incurred when it is copied, and this happens only relatively rarely (when
// we fork the environment). The need for a `shared_ptr` will go away once we
// introduce a shared call-context object (see above).
std::shared_ptr<PrValueToResultObject> ResultObjectMap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the model for something like:

while (cond)
{
  f();
}

where f returns a value. Do we have the same storage location for the prvalue in all iterations? In that case, the location is in fact a "summary" for the different iterations. I think this is fine, as we want convergence. I am just wondering if we need to make this explicit in the future, so checks would know if a location were a summary, or an exact location.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for this to be owned by the DataflowAnalysisContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the model for something like:

while (cond)
{
  f();
}

where f returns a value. Do we have the same storage location for the prvalue in all iterations?

Let me clarify what you're asking.

I think what you mean is that f() is declared like this, correct?

struct S {};
S f();

In this case, there is indeed no glvalue of type S -- just the prvalue for the call f(). Normally, of course, prvalues don't have storage locations, but in this case, because there isn't an actual result object, we treat the CallExpr as if it was a MaterializeTemporaryExpr (see comment in ResultObjectVisitor::VisitExpr()).

(Just making sure up to here that we agree on the setting -- do you agree with all of the above?)

First of all, yes, we use the same storage location on all iterations of the loop -- witness the call to DACtx.getStableStorageLocation(*E).

In that case, the location is in fact a "summary" for the different iterations.

I'm not sure I would agree with this. Let's assume this was a case where we actually have a MaterializeTemporaryExpr (imagine we're accessing some member of S, i.e. assume the expression in the loop is actually f().some_member_fn()). Then the compiler's codegen would very likely place the temporary in the same storage location on every iteration of the loop. (Sure, nothing in the standard mandates this AFAIK, but it's almost certainly what the compiler is going to do.) So the stable storage location that we use for the expression isn't a "summary" storage location, but it reflects the fact that the generated code, too, would use the same storage location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for this to be owned by the DataflowAnalysisContext?

Unfortunately, no.

The DataflowAnalysisContext is shared by the whole analysis -- which may span several functions if we're doing a context-sensitive analysis.

In contrast, the result object map is specific to a given function, just like some of the other fields that are in Environment (CallStack, ReturnVal, ReturnLoc and ThisPointeeLoc). See the comment above saying that we should create a call-context object (which would be a per-function data structure) and move all of these fields to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should create a call-context object

Ah, ok. +1, I like this approach.


// The following three member variables handle various different types of
// return values.
// - If the return type is not a reference and not a record: Value returned
// by the function.
Value *ReturnVal = nullptr;
// Storage location of the reference returned by the function (if it has
// reference return type).
// - If the return type is a reference: Storage location of the reference
// returned by the function.
StorageLocation *ReturnLoc = nullptr;
// - If the return type is a record or the function being analyzed is a
// constructor: Storage location into which the return value should be
// constructed.
RecordStorageLocation *LocForRecordReturnVal = nullptr;

// The storage location of the `this` pointee. Should only be null if the
// function being analyzed is only a function and not a method.
RecordStorageLocation *ThisPointeeLoc = nullptr;
Expand Down
Loading