-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[analyzer] Refine LCV handling in Store for better taint propagation #114835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) ChangesReview commit by commit, but I'm presenting here a stacked PR to let you know of the motivation, and more importantly how the pieces would fit together. Each commit has it's own description to help you focus on what matters. The original intent was to fix #114270. Full diff: https://github.com/llvm/llvm-project/pull/114835.diff 5 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
index 0bb5739db4b756..e55d064253b844 100644
--- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
@@ -207,6 +207,14 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
return getTaintedSymbolsImpl(State, Sym, Kind, returnFirstOnly);
if (const MemRegion *Reg = V.getAsRegion())
return getTaintedSymbolsImpl(State, Reg, Kind, returnFirstOnly);
+
+ if (auto LCV = V.getAs<nonloc::LazyCompoundVal>()) {
+ StoreManager &StoreMgr = State->getStateManager().getStoreManager();
+ if (auto DefaultVal = StoreMgr.getDefaultBinding(*LCV)) {
+ return getTaintedSymbolsImpl(State, *DefaultVal, Kind, returnFirstOnly);
+ }
+ }
+
return {};
}
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
index ccc3097e8d2f97..17ee1f7c945edd 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
@@ -68,23 +68,15 @@ void ExprEngine::performTrivialCopy(NodeBuilder &Bldr, ExplodedNode *Pred,
Bldr.takeNodes(Pred);
assert(ThisRD);
- if (!ThisRD->isEmpty()) {
- // Load the source value only for non-empty classes.
- // Otherwise it'd retrieve an UnknownVal
- // and bind it and RegionStore would think that the actual value
- // in this region at this offset is unknown.
- SVal V = Call.getArgSVal(0);
-
- // If the value being copied is not unknown, load from its location to get
- // an aggregate rvalue.
- if (std::optional<Loc> L = V.getAs<Loc>())
- V = Pred->getState()->getSVal(*L);
- else
- assert(V.isUnknownOrUndef());
- evalBind(Dst, CallExpr, Pred, ThisVal, V, true);
- } else {
- Dst.Add(Pred);
- }
+ SVal V = Call.getArgSVal(0);
+
+ // If the value being copied is not unknown, load from its location to get
+ // an aggregate rvalue.
+ if (std::optional<Loc> L = V.getAs<Loc>())
+ V = Pred->getState()->getSVal(*L);
+ else
+ assert(V.isUnknownOrUndef());
+ evalBind(Dst, CallExpr, Pred, ThisVal, V, true);
PostStmt PS(CallExpr, LCtx);
for (ExplodedNode *N : Dst) {
diff --git a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
index 674099dd7e1f0f..ef41112c2d5ce5 100644
--- a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -548,6 +548,12 @@ class RegionStoreManager : public StoreManager {
return getBinding(getRegionBindings(S), L, T);
}
+ /// Returns the value of the default binding of region \p BaseR
+ /// if and only if that is the unique binding in the cluster of \p BaseR.
+ /// \p BaseR must be a base region.
+ std::optional<SVal> getUniqueDefaultBinding(Store S,
+ const MemRegion *BaseR) const;
+
std::optional<SVal> getDefaultBinding(Store S, const MemRegion *R) override {
RegionBindingsRef B = getRegionBindings(S);
// Default bindings are always applied over a base region so look up the
@@ -2276,18 +2282,10 @@ NonLoc RegionStoreManager::createLazyBinding(RegionBindingsConstRef B,
return svalBuilder.makeLazyCompoundVal(StoreRef(B.asStore(), *this), R);
}
-static bool isRecordEmpty(const RecordDecl *RD) {
- if (!RD->field_empty())
- return false;
- if (const CXXRecordDecl *CRD = dyn_cast<CXXRecordDecl>(RD))
- return CRD->getNumBases() == 0;
- return true;
-}
-
SVal RegionStoreManager::getBindingForStruct(RegionBindingsConstRef B,
const TypedValueRegion *R) {
const RecordDecl *RD = R->getValueType()->castAs<RecordType>()->getDecl();
- if (!RD->getDefinition() || isRecordEmpty(RD))
+ if (!RD->getDefinition())
return UnknownVal();
return createLazyBinding(B, R);
@@ -2549,9 +2547,42 @@ RegionBindingsRef RegionStoreManager::bindVector(RegionBindingsConstRef B,
return NewB;
}
+std::optional<SVal>
+RegionStoreManager::getUniqueDefaultBinding(Store S,
+ const MemRegion *BaseR) const {
+ assert(BaseR == BaseR->getBaseRegion() && "Expecting a base region");
+ const auto *Cluster = getRegionBindings(S).lookup(BaseR);
+ if (!Cluster || !llvm::hasSingleElement(*Cluster))
+ return std::nullopt;
+
+ const auto [Key, Value] = *Cluster->begin();
+ return Key.isDirect() ? std::optional<SVal>{} : Value;
+}
+
std::optional<RegionBindingsRef> RegionStoreManager::tryBindSmallStruct(
RegionBindingsConstRef B, const TypedValueRegion *R, const RecordDecl *RD,
nonloc::LazyCompoundVal LCV) {
+ // If we try to copy a Conjured value representing the value of the whole
+ // struct, don't try to element-wise copy each field.
+ // That would unnecessarily bind Derived symbols slicing off the subregion for
+ // the field from the whole Conjured symbol.
+ //
+ // struct Window { int width; int height; };
+ // Window getWindow(); <-- opaque fn.
+ // Window w = getWindow(); <-- conjures a new Window.
+ // Window w2 = w; <-- trivial copy "w", calling "tryBindSmallStruct"
+ //
+ // We should not end up with a new Store for "w2" like this:
+ // Direct [ 0..31]: Derived{Conj{}, w.width}
+ // Direct [32..63]: Derived{Conj{}, w.height}
+ // Instead, we should just bind that Conjured value instead.
+ if (LCV.getRegion()->getBaseRegion() == LCV.getRegion()) {
+ if (auto Val = getUniqueDefaultBinding(LCV.getStore(), LCV.getRegion())) {
+ return B.addBinding(BindingKey::Make(R, BindingKey::Default),
+ Val.value());
+ }
+ }
+
FieldVector Fields;
if (const CXXRecordDecl *Class = dyn_cast<CXXRecordDecl>(RD))
diff --git a/clang/test/Analysis/ctor-trivial-copy.cpp b/clang/test/Analysis/ctor-trivial-copy.cpp
index 5ed188aa8f1eae..41d0d97161bba1 100644
--- a/clang/test/Analysis/ctor-trivial-copy.cpp
+++ b/clang/test/Analysis/ctor-trivial-copy.cpp
@@ -1,8 +1,12 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config c++-inlining=constructors -verify %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config c++-inlining=constructors -verify %s \
+// RUN: 2>&1 | FileCheck %s
-template<typename T>
-void clang_analyzer_dump(T&);
+void clang_analyzer_printState();
+template<typename T> void clang_analyzer_dump_lref(T&);
+template<typename T> void clang_analyzer_dump_val(T);
+template <typename T> T conjure();
+template <typename... Ts> void nop(const Ts &... args) {}
struct aggr {
int x;
@@ -15,20 +19,104 @@ struct empty {
void test_copy_return() {
aggr s1 = {1, 2};
aggr const& cr1 = aggr(s1);
- clang_analyzer_dump(cr1); // expected-warning-re {{&lifetime_extended_object{aggr, cr1, S{{[0-9]+}}} }}
+ clang_analyzer_dump_lref(cr1); // expected-warning-re {{&lifetime_extended_object{aggr, cr1, S{{[0-9]+}}} }}
empty s2;
empty const& cr2 = empty{s2};
- clang_analyzer_dump(cr2); // expected-warning-re {{&lifetime_extended_object{empty, cr2, S{{[0-9]+}}} }}
+ clang_analyzer_dump_lref(cr2); // expected-warning-re {{&lifetime_extended_object{empty, cr2, S{{[0-9]+}}} }}
}
void test_assign_return() {
aggr s1 = {1, 2};
aggr d1;
- clang_analyzer_dump(d1 = s1); // expected-warning {{&d1 }}
+ clang_analyzer_dump_lref(d1 = s1); // expected-warning {{&d1 }}
empty s2;
empty d2;
- clang_analyzer_dump(d2 = s2); // expected-warning {{&d2 }} was Unknown
+ clang_analyzer_dump_lref(d2 = s2); // expected-warning {{&d2 }} was Unknown
}
+
+namespace trivial_struct_copy {
+
+void _01_empty_structs() {
+ clang_analyzer_dump_val(conjure<empty>()); // expected-warning {{lazyCompoundVal}}
+ empty Empty = conjure<empty>();
+ empty Empty2 = Empty;
+ empty Empty3 = Empty2;
+ // All of these should refer to the exact same LCV, because all of
+ // these trivial copies refer to the original conjured value.
+ // There were Unknown before:
+ clang_analyzer_dump_val(Empty); // expected-warning {{lazyCompoundVal}}
+ clang_analyzer_dump_val(Empty2); // expected-warning {{lazyCompoundVal}}
+ clang_analyzer_dump_val(Empty3); // expected-warning {{lazyCompoundVal}}
+
+ // We should have the same Conjured symbol for "Empty", "Empty2" and "Empty3".
+ clang_analyzer_printState();
+ // CHECK: "store": { "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "cluster": "GlobalInternalSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "conj_$
+ // CHECK-NEXT: ]},
+ // CHECK-NEXT: { "cluster": "GlobalSystemSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "conj_$
+ // CHECK-NEXT: ]},
+ // CHECK-NEXT: { "cluster": "Empty", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ:conj_\$[0-9]+{int, LC[0-9]+, S[0-9]+, #[0-9]+}]]" }
+ // CHECK-NEXT: ]},
+ // CHECK-NEXT: { "cluster": "Empty2", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ]]" }
+ // CHECK-NEXT: ]},
+ // CHECK-NEXT: { "cluster": "Empty3", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ]]" }
+ // CHECK-NEXT: ]}
+ // CHECK-NEXT: ]},
+
+ nop(Empty, Empty2, Empty3);
+}
+
+void _02_structs_with_members() {
+ clang_analyzer_dump_val(conjure<aggr>()); // expected-warning {{lazyCompoundVal}}
+ aggr Aggr = conjure<aggr>();
+ aggr Aggr2 = Aggr;
+ aggr Aggr3 = Aggr2;
+ // All of these should refer to the exact same LCV, because all of
+ // these trivial copies refer to the original conjured value.
+ clang_analyzer_dump_val(Aggr); // expected-warning {{lazyCompoundVal}}
+ clang_analyzer_dump_val(Aggr2); // expected-warning {{lazyCompoundVal}}
+ clang_analyzer_dump_val(Aggr3); // expected-warning {{lazyCompoundVal}}
+
+ // We should have the same Conjured symbol for "Aggr", "Aggr2" and "Aggr3".
+ // We used to have Derived symbols for the individual fields that were
+ // copied as part of copying the whole struct.
+ clang_analyzer_printState();
+ // CHECK: "store": { "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "cluster": "GlobalInternalSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "conj_$
+ // CHECK-NEXT: ]},
+ // CHECK-NEXT: { "cluster": "GlobalSystemSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "conj_$
+ // CHECK-NEXT: ]},
+ // CHECK-NEXT: { "cluster": "Aggr", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "[[AGGR_CONJ:conj_\$[0-9]+{int, LC[0-9]+, S[0-9]+, #[0-9]+}]]" }
+ // CHECK-NEXT: ]},
+ // CHECK-NEXT: { "cluster": "Aggr2", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "[[AGGR_CONJ]]" }
+ // CHECK-NEXT: ]},
+ // CHECK-NEXT: { "cluster": "Aggr3", "pointer": "0x{{[0-9a-f]+}}", "items": [
+ // CHECK-NEXT: { "kind": "Default", "offset": 0, "value": "[[AGGR_CONJ]]" }
+ // CHECK-NEXT: ]}
+ // CHECK-NEXT: ]},
+
+ nop(Aggr, Aggr2, Aggr3);
+}
+
+// Tests that use `clang_analyzer_printState()` must share the analysis entry
+// point, and have a strict ordering between. This is to meet the different
+// `clang_analyzer_printState()` calls in a fixed relative ordering, thus
+// FileCheck could check the stdouts.
+void entrypoint() {
+ _01_empty_structs();
+ _02_structs_with_members();
+}
+
+} // namespace trivial_struct_copy
diff --git a/clang/test/Analysis/taint-generic.cpp b/clang/test/Analysis/taint-generic.cpp
index 8092ac6f270b2a..881c5baf889f6c 100644
--- a/clang/test/Analysis/taint-generic.cpp
+++ b/clang/test/Analysis/taint-generic.cpp
@@ -1,10 +1,15 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=optin.taint,core,alpha.security.ArrayBoundV2 -analyzer-config optin.taint.TaintPropagation:Config=%S/Inputs/taint-generic-config.yaml -Wno-format-security -verify -std=c++11 %s
+// RUN: %clang_analyze_cc1 -std=c++11 -Wno-format-security \
+// RUN: -analyzer-checker=core,optin.taint,alpha.security.ArrayBoundV2,debug.ExprInspection \
+// RUN: -analyzer-config optin.taint.TaintPropagation:Config=%S/Inputs/taint-generic-config.yaml \
+// RUN: -verify %s
+
+template <typename T> void clang_analyzer_isTainted(T);
#define BUFSIZE 10
int Buffer[BUFSIZE];
int scanf(const char*, ...);
-int mySource1();
+template <typename T = int> T mySource1();
int mySource3();
typedef struct _FILE FILE;
@@ -136,3 +141,23 @@ void testReadingFromStdin(char **p) {
fscanf(stdin, "%d", &n);
Buffer[n] = 1; // expected-warning {{Potential out of bound access }}
}
+
+namespace gh114270 {
+class Empty {};
+class Aggr {
+public:
+ int data;
+};
+
+void top() {
+ int Int = mySource1<int>();
+ clang_analyzer_isTainted(Int); // expected-warning {{YES}}
+
+ Empty E = mySource1<Empty>();
+ clang_analyzer_isTainted(E); // expected-warning {{YES}}
+
+ Aggr A = mySource1<Aggr>();
+ clang_analyzer_isTainted(A); // expected-warning {{YES}}
+ clang_analyzer_isTainted(A.data); // expected-warning {{YES}}
+}
+} // namespace gh114270
|
Thanks for handling this issue! My quick first impression is that I'm satisfied with your changes. However, I'm a bit confused by the first commit because as far as I see, the empty struct is unable to transfer any attacker-controlled data, and therefore I don't know what does it mean that it's tainted. Do you have a practical use case where this would be useful? |
Exactly. Because we don't let the bind operations go through, the copies won't have an identity, thus there is nothing where taint could bind to. However, it only looking at the code users could get confused (rightly so).
I'd say, not really. Another way of looking at that is remaining consistent no matter how many fields we have in a class. But again, I don't really expect this to happen in practice, and we can drop that commit from the stack without affecting the outcome of fixing the issue motivating this stack. However, I'd lean towards a more consistent behavior for structs, no matter their guts. |
Thanks for the explanation -- code example reduction friendliness is a good point that I didn't think about. Based on this, I support keeping that commit, but perhaps add some remarks (in comments or the commit message, wherever you think it's well-placed) that mentions code reduction as a motivation. |
Dump the memory space clusters before the other clusters, in alphabetical order. Then default bindings over direct bindings, and if any has symbolic offset, then those should come before the ones with concrete offsets. In theory, we should either have a symbolic offset OR concrete offsets, but never both at the same time.
We represent copies of structs by LazyCompoundVals, that is basically a snapshot of the Store and Region that your copy would refer to. This snapshot is actually not taken for empty structs (structs that have no non-static data members), because the users won't be able to access any fields anyways, so why bother. However, when it comes to taint propagation, it would be nice if instances of empty structs would behave similar to non-empty structs. For this, we need an identity for which taint can bind, so Unknown - that was used in the past wouldn't work. Consequently, copying the value of an empty struct should behave the same way as a non-empty struct, thus be represented by a LazyCompoundVal.
Previously, ExprEngine would just skip copying empty structs. Let's make trigger the copy event even for empty structs.
Taint propagation rules may want to taint whole objects, that are returned by-value from opaque function calls. If a struct is returned by-value from an opaque call, the "value" of the whole struct is represented by a Conjured symbol. Later fields may slice off smaller subregions by creating Derived symbols of that Conjured symbol, but those are handled well, and "isTainted" returns true as expected. However, passing the whole struct to "isTainted" would be false, because LazyCompoundVals and CompoundVals are not handled. This patch addresses this. Fixes #114270
Added the explanation to the [analyzer] Allow copying empty structs (1/4). |
Dump the memory space clusters before the other clusters, in alphabetical order. Then default bindings over direct bindings, and if any has symbolic offset, then those should come before the ones with concrete offsets. In theory, we should either have a symbolic offset OR concrete offsets, but never both at the same time. Needed for #114835
This is now superseded by the individual commits (PRs). |
We represent copies of structs by LazyCompoundVals, that is basically a snapshot of the Store and Region that your copy would refer to. This snapshot is actually not taken for empty structs (structs that have no non-static data members), because the users won't be able to access any fields anyways, so why bother. However, when it comes to taint propagation, it would be nice if instances of empty structs would behave similar to non-empty structs. For this, we need an identity for which taint can bind, so Unknown - that was used in the past wouldn't work. Consequently, copying the value of an empty struct should behave the same way as a non-empty struct, thus be represented by a LazyCompoundVal. Split from #114835
Previously, ExprEngine would just skip copying empty structs. Let's make trigger the copy event even for empty structs. Split from #114835
returned by-value from opaque function calls. If a struct is returned by-value from an opaque call, the "value" of the whole struct is represented by a Conjured symbol. Later fields may slice off smaller subregions by creating Derived symbols of that Conjured symbol, but those are handled well, and "isTainted" returns true as expected. However, passing the whole struct to "isTainted" would be false, because LazyCompoundVals and CompoundVals are not handled. This patch addresses this. Fixes #114270 Split from #114835
Review commit by commit, but I'm presenting here a stacked PR to let you know of the motivation, and more importantly how the pieces would fit together.
Each commit has it's own description to help you focus on what matters.
Once this PR is reviewed, I'll split this into individual PRs and merge one-by-one.
The original intent was to fix #114270.