Skip to content

[analyzer] Allow copying empty structs (1/4) #115916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 14, 2024
Merged

[analyzer] Allow copying empty structs (1/4) #115916

merged 1 commit into from
Nov 14, 2024

Conversation

steakhal
Copy link
Contributor

We represent copies of structs by LazyCompoundVals, that is basically a snapshot of the Store and Region that your copy would refer to.

This snapshot is actually not taken for empty structs (structs that have no non-static data members), because the users won't be able to access any fields anyways, so why bother.
However, when it comes to taint propagation, it would be nice if instances of empty structs would behave similar to non-empty structs. For this, we need an identity for which taint can bind, so Unknown - that was used in the past wouldn't work.

Consequently, copying the value of an empty struct should behave the same way as a non-empty struct, thus be represented by a LazyCompoundVal.

Split from #114835

We represent copies of structs by LazyCompoundVals, that is basically a
snapshot of the Store and Region that your copy would refer to.

This snapshot is actually not taken for empty structs (structs that have
no non-static data members), because the users won't be able to access
any fields anyways, so why bother.
However, when it comes to taint propagation, it would be nice if
instances of empty structs would behave similar to non-empty structs.
For this, we need an identity for which taint can bind, so Unknown -
that was used in the past wouldn't work.

Consequently, copying the value of an empty struct should behave the
same way as a non-empty struct, thus be represented by a
LazyCompoundVal.
@steakhal steakhal requested a review from NagyDonat November 12, 2024 18:16
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Nov 12, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 12, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

Changes

We represent copies of structs by LazyCompoundVals, that is basically a snapshot of the Store and Region that your copy would refer to.

This snapshot is actually not taken for empty structs (structs that have no non-static data members), because the users won't be able to access any fields anyways, so why bother.
However, when it comes to taint propagation, it would be nice if instances of empty structs would behave similar to non-empty structs. For this, we need an identity for which taint can bind, so Unknown - that was used in the past wouldn't work.

Consequently, copying the value of an empty struct should behave the same way as a non-empty struct, thus be represented by a LazyCompoundVal.

Split from #114835


Full diff: https://github.com/llvm/llvm-project/pull/115916.diff

2 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Core/RegionStore.cpp (+5-9)
  • (modified) clang/test/Analysis/ctor-trivial-copy.cpp (+94-7)
diff --git a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
index 6bad9a93a30169..f46282a73fbe40 100644
--- a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -2336,20 +2336,16 @@ NonLoc RegionStoreManager::createLazyBinding(RegionBindingsConstRef B,
   return svalBuilder.makeLazyCompoundVal(StoreRef(B.asStore(), *this), R);
 }
 
-static bool isRecordEmpty(const RecordDecl *RD) {
-  if (!RD->field_empty())
-    return false;
-  if (const CXXRecordDecl *CRD = dyn_cast<CXXRecordDecl>(RD))
-    return CRD->getNumBases() == 0;
-  return true;
-}
-
 SVal RegionStoreManager::getBindingForStruct(RegionBindingsConstRef B,
                                              const TypedValueRegion *R) {
   const RecordDecl *RD = R->getValueType()->castAs<RecordType>()->getDecl();
-  if (!RD->getDefinition() || isRecordEmpty(RD))
+  if (!RD->getDefinition())
     return UnknownVal();
 
+  // We also create a LCV for copying empty structs because then the store
+  // behavior doesn't depend on the struct layout.
+  // This way even an empty struct can carry taint, no matter if creduce drops
+  // the last field member or not.
   return createLazyBinding(B, R);
 }
 
diff --git a/clang/test/Analysis/ctor-trivial-copy.cpp b/clang/test/Analysis/ctor-trivial-copy.cpp
index 5ed188aa8f1eae..a1209c24136e20 100644
--- a/clang/test/Analysis/ctor-trivial-copy.cpp
+++ b/clang/test/Analysis/ctor-trivial-copy.cpp
@@ -1,8 +1,12 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config c++-inlining=constructors -verify %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config c++-inlining=constructors -verify %s \
+// RUN:   2>&1 | FileCheck %s
 
 
-template<typename T>
-void clang_analyzer_dump(T&);
+void clang_analyzer_printState();
+template<typename T> void clang_analyzer_dump_lref(T&);
+template<typename T> void clang_analyzer_dump_val(T);
+template <typename T> T conjure();
+template <typename... Ts> void nop(const Ts &... args) {}
 
 struct aggr {
   int x;
@@ -15,20 +19,103 @@ struct empty {
 void test_copy_return() {
   aggr s1 = {1, 2};
   aggr const& cr1 = aggr(s1);
-  clang_analyzer_dump(cr1); // expected-warning-re {{&lifetime_extended_object{aggr, cr1, S{{[0-9]+}}} }}
+  clang_analyzer_dump_lref(cr1); // expected-warning-re {{&lifetime_extended_object{aggr, cr1, S{{[0-9]+}}} }}
 
   empty s2;
   empty const& cr2 = empty{s2};
-  clang_analyzer_dump(cr2); // expected-warning-re {{&lifetime_extended_object{empty, cr2, S{{[0-9]+}}} }}
+  clang_analyzer_dump_lref(cr2); // expected-warning-re {{&lifetime_extended_object{empty, cr2, S{{[0-9]+}}} }}
 }
 
 void test_assign_return() {
   aggr s1 = {1, 2};
   aggr d1;
-  clang_analyzer_dump(d1 = s1); // expected-warning {{&d1 }}
+  clang_analyzer_dump_lref(d1 = s1); // expected-warning {{&d1 }}
 
   empty s2;
   empty d2;
-  clang_analyzer_dump(d2 = s2); // expected-warning {{&d2 }} was Unknown
+  clang_analyzer_dump_lref(d2 = s2); // expected-warning {{&d2 }} was Unknown
 }
 
+
+namespace trivial_struct_copy {
+
+void _01_empty_structs() {
+  clang_analyzer_dump_val(conjure<empty>()); // expected-warning {{lazyCompoundVal}}
+  empty Empty = conjure<empty>();
+  empty Empty2 = Empty;
+  empty Empty3 = Empty2;
+  // All of these should refer to the exact same LCV, because all of
+  // these trivial copies refer to the original conjured value.
+  // There were Unknown before:
+  clang_analyzer_dump_val(Empty);  // expected-warning {{lazyCompoundVal}}
+  clang_analyzer_dump_val(Empty2); // expected-warning {{lazyCompoundVal}}
+  clang_analyzer_dump_val(Empty3); // expected-warning {{lazyCompoundVal}}
+
+  // Notice that we don't have entries for the copies, only for the original "Empty" object.
+  // That binding was added by the opaque "conjure" call, directly constructing to the variable "Empty" by copy-elision.
+  // The copies are not present because the ExprEngine skips the evalBind of empty structs.
+  // And even if such binds would reach the Store, the Store copies small structs by copying the individual fields,
+  // and there are no fields within "Empty", thus we wouldn't have any entries anyways.
+  clang_analyzer_printState();
+  // CHECK:       "store": { "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:    { "cluster": "GlobalInternalSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "conj_$
+  // CHECK-NEXT:    ]},
+  // CHECK-NEXT:    { "cluster": "GlobalSystemSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "conj_$
+  // CHECK-NEXT:    ]},
+  // CHECK-NEXT:    { "cluster": "Empty", "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "[[EMPTY_CONJ:conj_\$[0-9]+{int, LC[0-9]+, S[0-9]+, #[0-9]+}]]" }
+  // CHECK-NEXT:    ]}
+  // CHECK-NEXT:  ]},
+
+  nop(Empty, Empty2, Empty3);
+}
+
+void _02_structs_with_members() {
+  clang_analyzer_dump_val(conjure<aggr>()); // expected-warning {{lazyCompoundVal}}
+  aggr Aggr = conjure<aggr>();
+  aggr Aggr2 = Aggr;
+  aggr Aggr3 = Aggr2;
+  // All of these should refer to the exact same LCV, because all of
+  // these trivial copies refer to the original conjured value.
+  clang_analyzer_dump_val(Aggr);  // expected-warning {{lazyCompoundVal}}
+  clang_analyzer_dump_val(Aggr2); // expected-warning {{lazyCompoundVal}}
+  clang_analyzer_dump_val(Aggr3); // expected-warning {{lazyCompoundVal}}
+
+  // We have fields in the struct we copy, thus we also have the entries for the copies
+  // (and for all of their fields).
+  clang_analyzer_printState();
+  // CHECK:       "store": { "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:    { "cluster": "GlobalInternalSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "conj_$
+  // CHECK-NEXT:    ]},
+  // CHECK-NEXT:    { "cluster": "GlobalSystemSpaceRegion", "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "conj_$
+  // CHECK-NEXT:    ]},
+  // CHECK-NEXT:    { "cluster": "Aggr", "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:      { "kind": "Default", "offset": 0, "value": "[[AGGR_CONJ:conj_\$[0-9]+{int, LC[0-9]+, S[0-9]+, #[0-9]+}]]" }
+  // CHECK-NEXT:    ]},
+  // CHECK-NEXT:    { "cluster": "Aggr2", "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:      { "kind": "Direct", "offset": 0, "value": "derived_${{[0-9]+}}{[[AGGR_CONJ]],Aggr.x}" },
+  // CHECK-NEXT:      { "kind": "Direct", "offset": 32, "value": "derived_${{[0-9]+}}{[[AGGR_CONJ]],Aggr.y}" }
+  // CHECK-NEXT:    ]},
+  // CHECK-NEXT:    { "cluster": "Aggr3", "pointer": "0x{{[0-9a-f]+}}", "items": [
+  // CHECK-NEXT:      { "kind": "Direct", "offset": 0, "value": "derived_${{[0-9]+}}{[[AGGR_CONJ]],Aggr.x}" },
+  // CHECK-NEXT:      { "kind": "Direct", "offset": 32, "value": "derived_${{[0-9]+}}{[[AGGR_CONJ]],Aggr.y}" }
+  // CHECK-NEXT:    ]}
+  // CHECK-NEXT:  ]},
+
+  nop(Aggr, Aggr2, Aggr3);
+}
+
+// Tests that use `clang_analyzer_printState()` must share the analysis entry
+// point, and have a strict ordering between. This is to meet the different
+// `clang_analyzer_printState()` calls in a fixed relative ordering, thus
+// FileCheck could check the stdouts.
+void entrypoint() {
+  _01_empty_structs();
+  _02_structs_with_members();
+}
+
+} // namespace trivial_struct_copy

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, as we discussed earlier.

@steakhal steakhal merged commit b96c24b into llvm:main Nov 14, 2024
11 checks passed
@steakhal steakhal deleted the bb/lcv-patches-1 branch November 14, 2024 13:31
steakhal added a commit that referenced this pull request Nov 29, 2024
In #115916 I allowed copying empty structs.
Later in #115917 I changed how objects are copied, and basically when we
would want to copy a struct (an LCV) of a single symbol (likely coming
from an opaque fncall or invalidation), just directly bind that symbol
instead of creating an LCV referring to the symbol. This was an
optimization to skip a layer of indirection.

Now, it turns out I should have apply the same logic in #115916. I
should not have just blindly created an LCV by calling
`createLazyBinding()`, but rather check if I can apply the shortcut
described in #115917 and only create the LCV if the shortcut doesn't
apply.

In this patch I check if there is a single default binding that the copy
would refer to and if so, just return that symbol instead of creating an
LCV.

There shouldn't be any observable changes besides that we should have
fewer LCVs. This change may surface bugs in checkers that were
associating some metadata with entities in a wrong way. Notably,
STLAlgorithmModeling and DebugIteratorModeling checkers would likely
stop working after this change.
I didn't investigate them deeply because they were broken even prior to
this patch. Let me know if I should migrate these checkers to be just as
bugged as they were prior to this patch - thus make the tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants