-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[analyzer] Never create Regions wrapping reference TypedValueRegions (NFCI) #118096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…(NFCI) Like in the test case: ```c++ struct String { String(const String &) {} }; struct MatchComponent { unsigned numbers[2]; String prerelease; MatchComponent(MatchComponent const &) = default; }; MatchComponent get(); void consume(MatchComponent const &); MatchComponent parseMatchComponent() { MatchComponent component = get(); component.numbers[0] = 10; component.numbers[1] = 20; return component; // We should have no stack addr escape warning here. } void top() { consume(parseMatchComponent()); } ``` When calling `consume(parseMatchComponent())` the `parseMatchComponent()` would return a copy of a temporary of `component`. That copy would invoke the `MatchComponent::MatchComponent(const MatchComponent &)` ctor. That ctor would have a (reference typed) ParamVarRegion, holding the location (lvalue) of the object we are about to copy (&component). So far so good, but just before evaluating the binding operation for initializing the `numbers` field of the temporary, we evaluate the ArrayInitLoopExpr representing the by-value elementwise copy of the array `component.numbers`. This is represented by a LazyCompoundVal, because we (usually) don't just copy large arrays and bind many individual direct bindings. Rather, we take a snapshot by using a LCV. However, notice that the LCV representing this copy would look like this: lazyCompoundVal{ParamVarRegion{"reference param of cctor"}.numbers} Notice that it refers to the numbers field of a reference. It would be much better to desugar the reference to the actual object, thus it should be: `lazyCompoundVal{component.numbers}` Actually, when binding the result of the ArrayInitLoopExpr to the `temp_object.numbers` in the compiler-generated member initializer of the cctor, we should have two individual direct bindings because this is a "small array": binding &Element{temp_object.numbers, 0} <- loadFrom(&Element{component.numbers, 0}) binding &Element{temp_object.numbers, 1} <- loadFrom(&Element{component.numbers, 1}) Where loadFrom(...) would be: loadFrom(&Element{component.numbers, 0}): 10 U32b loadFrom(&Element{component.numbers, 1}): 20 U32b So the store should look like this, after PostInitializer of `temp_object.numbers`: temp_object at offset 0: 10 U32b temp_object at offset 32: 20 U32b The lesson is that it's okay to have TypedValueRegions of references as long as we don't form subregions of those. If we ever want to refer to a subregion of a "reference" we actually meant to "desugar" the reference and slice a subregion of the pointee of the reference instead. Once this canonicalization is in place, we can also drop the special handling of references in `ProcessInitializer`, because now reference TypedValueRegions are eagerly desugared into their referree region when forming a subregion of it. There should be no practical differences, but there are of course bugs that this patch may surface.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) ChangesLike in the test case: struct String {
String(const String &) {}
};
struct MatchComponent {
unsigned numbers[2];
String prerelease;
MatchComponent(MatchComponent const &) = default;
};
MatchComponent get();
void consume(MatchComponent const &);
MatchComponent parseMatchComponent() {
MatchComponent component = get();
component.numbers[0] = 10;
component.numbers[1] = 20;
return component; // We should have no stack addr escape warning here.
}
void top() {
consume(parseMatchComponent());
} When calling That ctor would have a (reference typed) ParamVarRegion, holding the location (lvalue) of the object we are about to copy (&component). So far so good, but just before evaluating the binding operation for initializing the However, notice that the LCV representing this copy would look like this: Notice that it refers to the numbers field of a reference. It would be much better to desugar the reference to the actual object, thus it should be: Actually, when binding the result of the ArrayInitLoopExpr to the
Where
So the store should look like this, after PostInitializer of
The lesson is that it's okay to have TypedValueRegions of references as long as we don't form subregions of those. If we ever want to refer to a subregion of a "reference" we actually meant to "desugar" the reference and slice a subregion of the pointee of the reference instead. Once this canonicalization is in place, we can also drop the special handling of references in There should be no practical differences, but there are of course bugs that this patch may surface. Full diff: https://github.com/llvm/llvm-project/pull/118096.diff 5 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
index eef7a54f03bf11..29f534eba2a265 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
@@ -487,6 +487,7 @@ class ProgramState : public llvm::FoldingSetNode {
friend void ProgramStateRetain(const ProgramState *state);
friend void ProgramStateRelease(const ProgramState *state);
+ SVal desugarReference(SVal Val) const;
SVal wrapSymbolicRegion(SVal Base) const;
};
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 22eab9f66418d4..648c7daf823ed3 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -1206,15 +1206,7 @@ void ExprEngine::ProcessInitializer(const CFGInitializer CFGInit,
while ((ASE = dyn_cast<ArraySubscriptExpr>(Init)))
Init = ASE->getBase()->IgnoreImplicit();
- SVal LValue = State->getSVal(Init, stackFrame);
- if (!Field->getType()->isReferenceType()) {
- if (std::optional<Loc> LValueLoc = LValue.getAs<Loc>()) {
- InitVal = State->getSVal(*LValueLoc);
- } else if (auto CV = LValue.getAs<nonloc::CompoundVal>()) {
- // Initializer list for an array.
- InitVal = *CV;
- }
- }
+ InitVal = State->getSVal(Init, stackFrame);
// If we fail to get the value for some reason, use a symbolic value.
if (InitVal.isUnknownOrUndef()) {
diff --git a/clang/lib/StaticAnalyzer/Core/MemRegion.cpp b/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
index 02d1358a2001ef..ad4e43630dd44e 100644
--- a/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
+++ b/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
@@ -65,6 +65,11 @@ using namespace ento;
// MemRegion Construction.
//===----------------------------------------------------------------------===//
+[[maybe_unused]] static bool isAReferenceTypedValueRegion(const MemRegion *R) {
+ const auto *TyReg = llvm::dyn_cast<TypedValueRegion>(R);
+ return TyReg && TyReg->getValueType()->isReferenceType();
+}
+
template <typename RegionTy, typename SuperTy, typename Arg1Ty>
RegionTy* MemRegionManager::getSubRegion(const Arg1Ty arg1,
const SuperTy *superRegion) {
@@ -76,6 +81,7 @@ RegionTy* MemRegionManager::getSubRegion(const Arg1Ty arg1,
if (!R) {
R = new (A) RegionTy(arg1, superRegion);
Regions.InsertNode(R, InsertPos);
+ assert(!isAReferenceTypedValueRegion(superRegion));
}
return R;
@@ -92,6 +98,7 @@ RegionTy* MemRegionManager::getSubRegion(const Arg1Ty arg1, const Arg2Ty arg2,
if (!R) {
R = new (A) RegionTy(arg1, arg2, superRegion);
Regions.InsertNode(R, InsertPos);
+ assert(!isAReferenceTypedValueRegion(superRegion));
}
return R;
@@ -110,6 +117,7 @@ RegionTy* MemRegionManager::getSubRegion(const Arg1Ty arg1, const Arg2Ty arg2,
if (!R) {
R = new (A) RegionTy(arg1, arg2, arg3, superRegion);
Regions.InsertNode(R, InsertPos);
+ assert(!isAReferenceTypedValueRegion(superRegion));
}
return R;
diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
index 0be2709f0907d8..d4f56342d934c9 100644
--- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
@@ -205,6 +205,16 @@ ProgramStateRef ProgramState::killBinding(Loc LV) const {
return makeWithStore(newStore);
}
+/// We should never form a MemRegion that would wrap a TypedValueRegion of a
+/// reference type. What we actually wanted was to create a MemRegion refering
+/// to the pointee of that reference.
+SVal ProgramState::desugarReference(SVal Val) const {
+ const auto *TyReg = dyn_cast_or_null<TypedValueRegion>(Val.getAsRegion());
+ if (!TyReg || !TyReg->getValueType()->isReferenceType())
+ return Val;
+ return getSVal(TyReg);
+}
+
/// SymbolicRegions are expected to be wrapped by an ElementRegion as a
/// canonical representation. As a canonical representation, SymbolicRegions
/// should be wrapped by ElementRegions before getting a FieldRegion.
@@ -445,12 +455,14 @@ void ProgramState::setStore(const StoreRef &newStore) {
}
SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const {
+ Base = desugarReference(Base);
Base = wrapSymbolicRegion(Base);
return getStateManager().StoreMgr->getLValueField(D, Base);
}
SVal ProgramState::getLValue(const IndirectFieldDecl *D, SVal Base) const {
StoreManager &SM = *getStateManager().StoreMgr;
+ Base = desugarReference(Base);
Base = wrapSymbolicRegion(Base);
// FIXME: This should work with `SM.getLValueField(D->getAnonField(), Base)`,
diff --git a/clang/test/Analysis/initializer.cpp b/clang/test/Analysis/initializer.cpp
index 16d7a348fdfb6d..f50afff25d2450 100644
--- a/clang/test/Analysis/initializer.cpp
+++ b/clang/test/Analysis/initializer.cpp
@@ -366,3 +366,29 @@ void testI() {
clang_analyzer_eval(B::b == 2); // expected-warning{{TRUE}}
}
} // namespace dont_skip_vbase_initializers_in_most_derived_class
+
+namespace elementwise_copy_small_array_from_post_initializer_of_cctor {
+struct String {
+ String(const String &) {}
+};
+
+struct MatchComponent {
+ unsigned numbers[2];
+ String prerelease;
+ MatchComponent(MatchComponent const &) = default;
+};
+
+MatchComponent get();
+void consume(MatchComponent const &);
+
+MatchComponent parseMatchComponent() {
+ MatchComponent component = get();
+ component.numbers[0] = 10;
+ component.numbers[1] = 20;
+ return component; // We should have no stack addr escape warning here.
+}
+
+void top() {
+ consume(parseMatchComponent());
+}
+} // namespace elementwise_copy_small_array_from_post_initializer_of_cctor
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change itself looks good to me, but I wonder what is the reason that we only want to do this canonicalization when we access fields?
It took me a while to unwrap your question. Actually, think of it as an invariant. We should never have reference typed value regions wrapped by subregions. This is checked/enforced by how we create subregions (see the asserts). The second is, how can we end up with such subregions. I think the only way is by field regions. If we had something that I missed, the assert would catch it. Do you have something in mind? |
Sorry, I might have been a bit unclear.
My question is what makes the "wrapped by subregions" part special? Why not just "never have reference typed value regions"? |
Quick guesses: can we form a base or derived object subregion from them? Is there some hacky pathway where we create an element region to represent something that's not a real element access but something similar? ( |
Like in the test case:
When calling
consume(parseMatchComponent())
theparseMatchComponent()
would return a copy of a temporary ofcomponent
. That copy would invoke theMatchComponent::MatchComponent(const MatchComponent &)
ctor.That ctor would have a (reference typed) ParamVarRegion, holding the location (lvalue) of the object we are about to copy (&component). So far so good, but just before evaluating the binding operation for initializing the
numbers
field of the temporary, we evaluate the ArrayInitLoopExpr representing the by-value elementwise copy of the arraycomponent.numbers
. This is represented by a LazyCompoundVal, because we (usually) don't just copy large arrays and bind many individual direct bindings. Rather, we take a snapshot by using a LCV.However, notice that the LCV representing this copy would look like this:
lazyCompoundVal{ParamVarRegion{"reference param of cctor"}.numbers}
Notice that it refers to the numbers field of a reference. It would be much better to desugar the reference to the actual object, thus it should be:
lazyCompoundVal{component.numbers}
Actually, when binding the result of the ArrayInitLoopExpr to the
temp_object.numbers
in the compiler-generated member initializer of the cctor, we should have two individual direct bindings because this is a "small array":Where
loadFrom(...)
would be:So the store should look like this, after PostInitializer of
temp_object.numbers
:The lesson is that it's okay to have TypedValueRegions of references as long as we don't form subregions of those. If we ever want to refer to a subregion of a "reference" we actually meant to "desugar" the reference and slice a subregion of the pointee of the reference instead.
Once this canonicalization is in place, we can also drop the special handling of references in
ProcessInitializer
, because now reference TypedValueRegions are eagerly desugared into their referee region when forming a subregion of it.There should be no practical differences, but there are of course bugs that this patch may surface.