-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[clang][analyzer] Stable order for SymbolRef-keyed containers #121551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This helps with two things: - make sure NextSymbolID is incremented on every allocation - No SymExpr can be allocated without a unique ID
@llvm/pr-subscribers-clang-static-analyzer-1 Author: Arseniy Zaostrovnykh (necto) ChangesGeneralize the These IDs order is stable across runs because SymExprs are allocated in the same order. Stability of the constraint order is important for the stability of the analyzer results. I evaluated this change on a set of 200+ open-source C and C++ projects with the total number of ~78 000 symbolic-execution issues passing Z3 refutation. This patch reduced the run-to-run churn (flakiness) in SE issues from 80-90 to 30-40 (out of 78K) in our CSA deployment (in our setting flaky issues are mostly due to Z3 refutation instability). Note, most of the issue churn (flakiness) is caused by the mentioned Z3 refutation. With Z3 refutation disabled, issue churn goes down to ~10 issues out of 80K and this patch has no effect on appearing/disappearing issues between runs. It however, seems to reduce the volatility of the execution flow: before we had 40-80 issues with changed execution flow, after - 10-30. Importantly, this change is necessary for the next step in stabilizing analysis results by caching Z3 query outcomes between analysis runs (work in progress). Across our admittedly noisy CI runs, I detected no significant effect on memory footprint or analysis time. This change set supersedes #121347 CPP-5919 Patch is 27.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121551.diff 11 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
index 862a30c0e73633..2b6401eb05b72b 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
@@ -25,6 +25,8 @@ namespace ento {
class MemRegion;
+using SymbolID = unsigned;
+
/// Symbolic value. These values used to capture symbolic execution of
/// the program.
class SymExpr : public llvm::FoldingSetNode {
@@ -39,9 +41,19 @@ class SymExpr : public llvm::FoldingSetNode {
private:
Kind K;
+ /// A unique identifier for this symbol.
+ ///
+ /// It is useful for SymbolData to easily differentiate multiple symbols, but
+ /// also for "ephemeral" symbols, such as binary operations, because this id
+ /// can be used for arranging constraints or equivalence classes instead of
+ /// unstable pointer values.
+ ///
+ /// Note, however, that it can't be used in Profile because SymbolManager
+ /// needs to compute Profile before allocating SymExpr.
+ const SymbolID Sym;
protected:
- SymExpr(Kind k) : K(k) {}
+ SymExpr(Kind k, SymbolID Sym) : K(k), Sym(Sym) {}
static bool isValidTypeForSymbol(QualType T) {
// FIXME: Depending on whether we choose to deprecate structural symbols,
@@ -56,6 +68,8 @@ class SymExpr : public llvm::FoldingSetNode {
Kind getKind() const { return K; }
+ SymbolID getSymbolID() const { return Sym; }
+
virtual void dump() const;
virtual void dumpToStream(raw_ostream &os) const {}
@@ -112,19 +126,14 @@ inline raw_ostream &operator<<(raw_ostream &os,
using SymbolRef = const SymExpr *;
using SymbolRefSmallVectorTy = SmallVector<SymbolRef, 2>;
-using SymbolID = unsigned;
/// A symbol representing data which can be stored in a memory location
/// (region).
class SymbolData : public SymExpr {
- const SymbolID Sym;
-
void anchor() override;
protected:
- SymbolData(Kind k, SymbolID sym) : SymExpr(k), Sym(sym) {
- assert(classof(this));
- }
+ SymbolData(Kind k, SymbolID sym) : SymExpr(k, sym) { assert(classof(this)); }
public:
~SymbolData() override = default;
@@ -132,8 +141,6 @@ class SymbolData : public SymExpr {
/// Get a string representation of the kind of the region.
virtual StringRef getKindStr() const = 0;
- SymbolID getSymbolID() const { return Sym; }
-
unsigned computeComplexity() const override {
return 1;
};
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
index 73732d532f630f..2b31534ce1a9dd 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
@@ -25,6 +25,7 @@
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/FoldingSet.h"
+#include "llvm/ADT/ImmutableSet.h"
#include "llvm/ADT/iterator_range.h"
#include "llvm/Support/Allocator.h"
#include <cassert>
@@ -43,15 +44,16 @@ class StoreManager;
class SymbolRegionValue : public SymbolData {
const TypedValueRegion *R;
-public:
+ friend class SymExprAllocator;
SymbolRegionValue(SymbolID sym, const TypedValueRegion *r)
: SymbolData(SymbolRegionValueKind, sym), R(r) {
assert(r);
assert(isValidTypeForSymbol(r->getValueType()));
}
+public:
LLVM_ATTRIBUTE_RETURNS_NONNULL
- const TypedValueRegion* getRegion() const { return R; }
+ const TypedValueRegion *getRegion() const { return R; }
static void Profile(llvm::FoldingSetNodeID& profile, const TypedValueRegion* R) {
profile.AddInteger((unsigned) SymbolRegionValueKind);
@@ -84,7 +86,7 @@ class SymbolConjured : public SymbolData {
const LocationContext *LCtx;
const void *SymbolTag;
-public:
+ friend class SymExprAllocator;
SymbolConjured(SymbolID sym, const Stmt *s, const LocationContext *lctx,
QualType t, unsigned count, const void *symbolTag)
: SymbolData(SymbolConjuredKind, sym), S(s), T(t), Count(count),
@@ -98,6 +100,7 @@ class SymbolConjured : public SymbolData {
assert(isValidTypeForSymbol(t));
}
+public:
/// It might return null.
const Stmt *getStmt() const { return S; }
unsigned getCount() const { return Count; }
@@ -137,7 +140,7 @@ class SymbolDerived : public SymbolData {
SymbolRef parentSymbol;
const TypedValueRegion *R;
-public:
+ friend class SymExprAllocator;
SymbolDerived(SymbolID sym, SymbolRef parent, const TypedValueRegion *r)
: SymbolData(SymbolDerivedKind, sym), parentSymbol(parent), R(r) {
assert(parent);
@@ -145,6 +148,7 @@ class SymbolDerived : public SymbolData {
assert(isValidTypeForSymbol(r->getValueType()));
}
+public:
LLVM_ATTRIBUTE_RETURNS_NONNULL
SymbolRef getParentSymbol() const { return parentSymbol; }
LLVM_ATTRIBUTE_RETURNS_NONNULL
@@ -180,12 +184,13 @@ class SymbolDerived : public SymbolData {
class SymbolExtent : public SymbolData {
const SubRegion *R;
-public:
+ friend class SymExprAllocator;
SymbolExtent(SymbolID sym, const SubRegion *r)
: SymbolData(SymbolExtentKind, sym), R(r) {
assert(r);
}
+public:
LLVM_ATTRIBUTE_RETURNS_NONNULL
const SubRegion *getRegion() const { return R; }
@@ -222,7 +227,7 @@ class SymbolMetadata : public SymbolData {
unsigned Count;
const void *Tag;
-public:
+ friend class SymExprAllocator;
SymbolMetadata(SymbolID sym, const MemRegion* r, const Stmt *s, QualType t,
const LocationContext *LCtx, unsigned count, const void *tag)
: SymbolData(SymbolMetadataKind, sym), R(r), S(s), T(t), LCtx(LCtx),
@@ -234,6 +239,7 @@ class SymbolMetadata : public SymbolData {
assert(tag);
}
+ public:
LLVM_ATTRIBUTE_RETURNS_NONNULL
const MemRegion *getRegion() const { return R; }
@@ -286,15 +292,16 @@ class SymbolCast : public SymExpr {
/// The type of the result.
QualType ToTy;
-public:
- SymbolCast(const SymExpr *In, QualType From, QualType To)
- : SymExpr(SymbolCastKind), Operand(In), FromTy(From), ToTy(To) {
+ friend class SymExprAllocator;
+ SymbolCast(SymbolID Sym, const SymExpr *In, QualType From, QualType To)
+ : SymExpr(SymbolCastKind, Sym), Operand(In), FromTy(From), ToTy(To) {
assert(In);
assert(isValidTypeForSymbol(From));
// FIXME: GenericTaintChecker creates symbols of void type.
// Otherwise, 'To' should also be a valid type.
}
+public:
unsigned computeComplexity() const override {
if (Complexity == 0)
Complexity = 1 + Operand->computeComplexity();
@@ -332,9 +339,10 @@ class UnarySymExpr : public SymExpr {
UnaryOperator::Opcode Op;
QualType T;
-public:
- UnarySymExpr(const SymExpr *In, UnaryOperator::Opcode Op, QualType T)
- : SymExpr(UnarySymExprKind), Operand(In), Op(Op), T(T) {
+ friend class SymExprAllocator;
+ UnarySymExpr(SymbolID Sym, const SymExpr *In, UnaryOperator::Opcode Op,
+ QualType T)
+ : SymExpr(UnarySymExprKind, Sym), Operand(In), Op(Op), T(T) {
// Note, some unary operators are modeled as a binary operator. E.g. ++x is
// modeled as x + 1.
assert((Op == UO_Minus || Op == UO_Not) && "non-supported unary expression");
@@ -345,6 +353,7 @@ class UnarySymExpr : public SymExpr {
assert(!Loc::isLocType(T) && "unary symbol should be nonloc");
}
+public:
unsigned computeComplexity() const override {
if (Complexity == 0)
Complexity = 1 + Operand->computeComplexity();
@@ -381,8 +390,8 @@ class BinarySymExpr : public SymExpr {
QualType T;
protected:
- BinarySymExpr(Kind k, BinaryOperator::Opcode op, QualType t)
- : SymExpr(k), Op(op), T(t) {
+ BinarySymExpr(SymbolID Sym, Kind k, BinaryOperator::Opcode op, QualType t)
+ : SymExpr(k, Sym), Op(op), T(t) {
assert(classof(this));
// Binary expressions are results of arithmetic. Pointer arithmetic is not
// handled by binary expressions, but it is instead handled by applying
@@ -425,14 +434,15 @@ class BinarySymExprImpl : public BinarySymExpr {
LHSTYPE LHS;
RHSTYPE RHS;
-public:
- BinarySymExprImpl(LHSTYPE lhs, BinaryOperator::Opcode op, RHSTYPE rhs,
- QualType t)
- : BinarySymExpr(ClassKind, op, t), LHS(lhs), RHS(rhs) {
+ friend class SymExprAllocator;
+ BinarySymExprImpl(SymbolID Sym, LHSTYPE lhs, BinaryOperator::Opcode op,
+ RHSTYPE rhs, QualType t)
+ : BinarySymExpr(Sym, ClassKind, op, t), LHS(lhs), RHS(rhs) {
assert(getPointer(lhs));
assert(getPointer(rhs));
}
+public:
void dumpToStream(raw_ostream &os) const override {
dumpToStreamImpl(os, LHS);
dumpToStreamImpl(os, getOpcode());
@@ -478,6 +488,21 @@ using IntSymExpr = BinarySymExprImpl<APSIntPtr, const SymExpr *,
using SymSymExpr = BinarySymExprImpl<const SymExpr *, const SymExpr *,
SymExpr::Kind::SymSymExprKind>;
+class SymExprAllocator {
+ SymbolID NextSymbolID = 0;
+ llvm::BumpPtrAllocator &Alloc;
+
+public:
+ explicit SymExprAllocator(llvm::BumpPtrAllocator &Alloc) : Alloc(Alloc) {}
+
+ template <class SymT, typename... ArgsT> SymT *make(ArgsT &&...Args) {
+ return new (Alloc) SymT(nextID(), std::forward<ArgsT>(Args)...);
+ }
+
+private:
+ SymbolID nextID() { return NextSymbolID++; }
+};
+
class SymbolManager {
using DataSetTy = llvm::FoldingSet<SymExpr>;
using SymbolDependTy =
@@ -489,15 +514,14 @@ class SymbolManager {
/// alive as long as the key is live.
SymbolDependTy SymbolDependencies;
- unsigned SymbolCounter = 0;
- llvm::BumpPtrAllocator& BPAlloc;
+ SymExprAllocator Alloc;
BasicValueFactory &BV;
ASTContext &Ctx;
public:
SymbolManager(ASTContext &ctx, BasicValueFactory &bv,
- llvm::BumpPtrAllocator& bpalloc)
- : SymbolDependencies(16), BPAlloc(bpalloc), BV(bv), Ctx(ctx) {}
+ llvm::BumpPtrAllocator &bpalloc)
+ : SymbolDependencies(16), Alloc(bpalloc), BV(bv), Ctx(ctx) {}
static bool canSymbolicate(QualType T);
@@ -687,4 +711,35 @@ class SymbolVisitor {
} // namespace clang
+// Override the default definition that would use pointer values of SymbolRefs
+// to order them, which is unstable due to ASLR.
+// Use the SymbolID instead which reflect the order in which the symbols were
+// allocated. This is usually stable across runs leading to the stability of
+// ConstraintMap and other containers using SymbolRef as keys.
+template <>
+struct ::llvm::ImutContainerInfo<clang::ento::SymbolRef>
+ : public ImutProfileInfo<clang::ento::SymbolRef> {
+ using value_type =
+ typename ImutProfileInfo<clang::ento::SymbolRef>::value_type;
+ using value_type_ref =
+ typename ImutProfileInfo<clang::ento::SymbolRef>::value_type_ref;
+ using key_type = value_type;
+ using key_type_ref = value_type_ref;
+ using data_type = bool;
+ using data_type_ref = bool;
+
+ static key_type_ref KeyOfValue(value_type_ref D) { return D; }
+ static data_type_ref DataOfValue(value_type_ref) { return true; }
+
+ static bool isEqual(key_type_ref LHS, key_type_ref RHS) {
+ return LHS->getSymbolID() == RHS->getSymbolID();
+ }
+
+ static bool isLess(key_type_ref LHS, key_type_ref RHS) {
+ return LHS->getSymbolID() < RHS->getSymbolID();
+ }
+
+ static bool isDataEqual(data_type_ref, data_type_ref) { return true; }
+};
+
#endif // LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_SYMBOLMANAGER_H
diff --git a/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp b/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
index f21e5c3ad7bd7c..738b6a175ce6de 100644
--- a/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
@@ -170,9 +170,8 @@ SymbolManager::getRegionValueSymbol(const TypedValueRegion* R) {
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolRegionValue(SymbolCounter, R);
+ SD = Alloc.make<SymbolRegionValue>(R);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolRegionValue>(SD);
@@ -188,9 +187,8 @@ const SymbolConjured* SymbolManager::conjureSymbol(const Stmt *E,
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolConjured(SymbolCounter, E, LCtx, T, Count, SymbolTag);
+ SD = Alloc.make<SymbolConjured>(E, LCtx, T, Count, SymbolTag);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolConjured>(SD);
@@ -204,9 +202,8 @@ SymbolManager::getDerivedSymbol(SymbolRef parentSymbol,
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolDerived(SymbolCounter, parentSymbol, R);
+ SD = Alloc.make<SymbolDerived>(parentSymbol, R);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolDerived>(SD);
@@ -219,9 +216,8 @@ SymbolManager::getExtentSymbol(const SubRegion *R) {
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolExtent(SymbolCounter, R);
+ SD = Alloc.make<SymbolExtent>(R);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolExtent>(SD);
@@ -236,9 +232,8 @@ SymbolManager::getMetadataSymbol(const MemRegion* R, const Stmt *S, QualType T,
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolMetadata(SymbolCounter, R, S, T, LCtx, Count, SymbolTag);
+ SD = Alloc.make<SymbolMetadata>(R, S, T, LCtx, Count, SymbolTag);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolMetadata>(SD);
@@ -252,7 +247,7 @@ SymbolManager::getCastSymbol(const SymExpr *Op,
void *InsertPos;
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) SymbolCast(Op, From, To);
+ data = Alloc.make<SymbolCast>(Op, From, To);
DataSet.InsertNode(data, InsertPos);
}
@@ -268,7 +263,7 @@ const SymIntExpr *SymbolManager::getSymIntExpr(const SymExpr *lhs,
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) SymIntExpr(lhs, op, v, t);
+ data = Alloc.make<SymIntExpr>(lhs, op, v, t);
DataSet.InsertNode(data, InsertPos);
}
@@ -284,7 +279,7 @@ const IntSymExpr *SymbolManager::getIntSymExpr(APSIntPtr lhs,
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) IntSymExpr(lhs, op, rhs, t);
+ data = Alloc.make<IntSymExpr>(lhs, op, rhs, t);
DataSet.InsertNode(data, InsertPos);
}
@@ -301,7 +296,7 @@ const SymSymExpr *SymbolManager::getSymSymExpr(const SymExpr *lhs,
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) SymSymExpr(lhs, op, rhs, t);
+ data = Alloc.make<SymSymExpr>(lhs, op, rhs, t);
DataSet.InsertNode(data, InsertPos);
}
@@ -316,7 +311,7 @@ const UnarySymExpr *SymbolManager::getUnarySymExpr(const SymExpr *Operand,
void *InsertPos;
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) UnarySymExpr(Operand, Opc, T);
+ data = Alloc.make<UnarySymExpr>(Operand, Opc, T);
DataSet.InsertNode(data, InsertPos);
}
diff --git a/clang/test/Analysis/dump_egraph.cpp b/clang/test/Analysis/dump_egraph.cpp
index d1229b26346740..13459699a06f6f 100644
--- a/clang/test/Analysis/dump_egraph.cpp
+++ b/clang/test/Analysis/dump_egraph.cpp
@@ -21,7 +21,7 @@ void foo() {
// CHECK: \"location_context\": \"#0 Call\", \"calling\": \"T::T\", \"location\": \{ \"line\": 15, \"column\": 5, \"file\": \"{{.*}}dump_egraph.cpp\" \}, \"items\": [\l \{ \"init_id\": {{[0-9]+}}, \"kind\": \"construct into member variable\", \"argument_index\": null, \"pretty\": \"s\", \"value\": \"&t.s\"
-// CHECK: \"cluster\": \"t\", \"pointer\": \"{{0x[0-9a-f]+}}\", \"items\": [\l \{ \"kind\": \"Default\", \"offset\": 0, \"value\": \"conj_$2\{int, LC5, no stmt, #1\}\"
+// CHECK: \"cluster\": \"t\", \"pointer\": \"{{0x[0-9a-f]+}}\", \"items\": [\l \{ \"kind\": \"Default\", \"offset\": 0, \"value\": \"conj_$3\{int, LC5, no stmt, #1\}\"
// CHECK: \"dynamic_types\": [\l \{ \"region\": \"HeapSymRegion\{conj_$1\{S *, LC1, S{{[0-9]+}}, #1\}\}\", \"dyn_type\": \"S\", \"sub_classable\": false \}\l
diff --git a/clang/test/Analysis/expr-inspection-printState-diseq-info.c b/clang/test/Analysis/expr-inspection-printState-diseq-info.c
index c5c31785a600ef..515fcbbd430791 100644
--- a/clang/test/Analysis/expr-inspection-printState-diseq-info.c
+++ b/clang/test/Analysis/expr-inspection-printState-diseq-info.c
@@ -18,17 +18,17 @@ void test_disequality_info(int e0, int b0, int b1, int c0) {
// CHECK-NEXT: {
// CHECK-NEXT: "class": [ "(reg_$0<int e0>) - 2" ],
// CHECK-NEXT: "disequal_to": [
- // CHECK-NEXT: [ "reg_$2<int b1>" ]]
+ // CHECK-NEXT: [ "reg_$7<int b1>" ]]
// CHECK-NEXT: },
// CHECK-NEXT: {
- // CHECK-NEXT: "class": [ "reg_$2<int b1>" ],
+ // CHECK-NEXT: "class": [ "reg_$15<int c0>" ],
// CHECK-NEXT: "disequal_to": [
- // CHECK-NEXT: [ "(reg_$0<int e0>) - 2" ],
- // CHECK-NEXT: [ "reg_$3<int c0>" ]]
+ // CHECK-NEXT: [ "reg_$7<int b1>" ]]
// CHECK-NEXT: },
// CHECK-NEXT: {
- // CHECK-NEXT: "class": [ "reg_$3<int c0>" ],
+ // CHECK-NEXT: "class": [ "reg_$7<int b1>" ],
// CHECK-NEXT: "disequal_to": [
- // CHECK-NEXT: [ "reg_$2<int b1>" ]]
+ // CHECK-NEXT: [ "(reg_$0<int e0>) - 2" ],
+ // CHECK-NEXT: [ "reg_$15<int c0>" ]]
// CHECK-NEXT: }
// CHECK-NEXT: ],
diff --git a/clang/test/Analysis/expr-inspection-printState-eq-classes.c b/clang/test/Analysis/expr-inspection-printState-eq-classes.c
index 38e23d6e838269..19cc13735ab5a6 100644
--- a/clang/test/Analysis/expr-inspection-printState-eq-classes.c
+++ b/clang/test/Analysis/expr-inspection-printState-eq-classes.c
@@ -16,6 +16,6 @@ void test_equivalence_classes(int a, int b, int c, int d) {
}
// CHECK: "equivalence_classes": [
-// CHECK-NEXT: [ "(reg_$0<int a>) != (reg_$2<int c>)" ],
-// CHECK-NEXT: [ "reg_$0<int a>", "reg_$2<int c>", "reg_$3<int d>" ]
+// CHECK-NEXT: [ "(reg_$0<int a>) != (reg_$5<int c>)" ],
+// CHECK-NEXT: [ "reg_$0<int a>", "reg_$20<int d>", "reg_$5<int c>" ]
// CHECK-NEXT: ],
diff --git a/clang/test/Analysis/ptr-arith.cpp b/clang/test/Analysis/ptr-arith.cpp
index a1264a1f04839c..ec1c75c0c40632 100644
--- a/clang/test/Analysis/ptr-arith.cpp
+++ b/clang/test/Analysis/ptr-arith.cpp
@@ -139,10 +139,10 @@ struct parse_t {
int parse(parse_t *p) {
unsigned copy = p->bits2;
clang_analyzer_dump(copy);
- // expected-warning@-1 {{reg_$1<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>}}
+ // expected-warning@-1 {{reg_$2<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>}}
header *bits = (header *)©
clang_analyzer_dump(bits->b);
- // expected-warning@-1 {{derived_$2{reg_$1<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>,Element{copy,0 S64b,struct Bug_55934::header}.b}}}
+ // expected-warning@-1 {{derived_$4{reg_$2<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>,Element{copy,0 S64b,struct Bug_55934::header}.b}}}
return bits->b; // no-warning
}
} // namespace Bug_55934
diff --git a/clang/test/Analysis/symbol-simplification-disequality-info.cpp b/clang/test/Analysis/symbol-simplification-disequality-info.cpp
index 69238b583eb846..33b8f150f5d021 100644
--- a/clang/test/Analysis/symbol-simplification-disequality-info.cpp
+++ b/clang/test/Analysis/symbol-simplification-disequality-info.cpp
@@ -14,14 +14,14 @@ void test(int a, int b, int c, int d) {
clang_analyzer_printState();
// CHECK: "disequality_info": [
// CHECK-NEXT: {
- // CHECK-NEXT: "class": [ "((reg_$0<int a>...
[truncated]
|
@llvm/pr-subscribers-clang Author: Arseniy Zaostrovnykh (necto) ChangesGeneralize the These IDs order is stable across runs because SymExprs are allocated in the same order. Stability of the constraint order is important for the stability of the analyzer results. I evaluated this change on a set of 200+ open-source C and C++ projects with the total number of ~78 000 symbolic-execution issues passing Z3 refutation. This patch reduced the run-to-run churn (flakiness) in SE issues from 80-90 to 30-40 (out of 78K) in our CSA deployment (in our setting flaky issues are mostly due to Z3 refutation instability). Note, most of the issue churn (flakiness) is caused by the mentioned Z3 refutation. With Z3 refutation disabled, issue churn goes down to ~10 issues out of 80K and this patch has no effect on appearing/disappearing issues between runs. It however, seems to reduce the volatility of the execution flow: before we had 40-80 issues with changed execution flow, after - 10-30. Importantly, this change is necessary for the next step in stabilizing analysis results by caching Z3 query outcomes between analysis runs (work in progress). Across our admittedly noisy CI runs, I detected no significant effect on memory footprint or analysis time. This change set supersedes #121347 CPP-5919 Patch is 27.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121551.diff 11 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
index 862a30c0e73633..2b6401eb05b72b 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h
@@ -25,6 +25,8 @@ namespace ento {
class MemRegion;
+using SymbolID = unsigned;
+
/// Symbolic value. These values used to capture symbolic execution of
/// the program.
class SymExpr : public llvm::FoldingSetNode {
@@ -39,9 +41,19 @@ class SymExpr : public llvm::FoldingSetNode {
private:
Kind K;
+ /// A unique identifier for this symbol.
+ ///
+ /// It is useful for SymbolData to easily differentiate multiple symbols, but
+ /// also for "ephemeral" symbols, such as binary operations, because this id
+ /// can be used for arranging constraints or equivalence classes instead of
+ /// unstable pointer values.
+ ///
+ /// Note, however, that it can't be used in Profile because SymbolManager
+ /// needs to compute Profile before allocating SymExpr.
+ const SymbolID Sym;
protected:
- SymExpr(Kind k) : K(k) {}
+ SymExpr(Kind k, SymbolID Sym) : K(k), Sym(Sym) {}
static bool isValidTypeForSymbol(QualType T) {
// FIXME: Depending on whether we choose to deprecate structural symbols,
@@ -56,6 +68,8 @@ class SymExpr : public llvm::FoldingSetNode {
Kind getKind() const { return K; }
+ SymbolID getSymbolID() const { return Sym; }
+
virtual void dump() const;
virtual void dumpToStream(raw_ostream &os) const {}
@@ -112,19 +126,14 @@ inline raw_ostream &operator<<(raw_ostream &os,
using SymbolRef = const SymExpr *;
using SymbolRefSmallVectorTy = SmallVector<SymbolRef, 2>;
-using SymbolID = unsigned;
/// A symbol representing data which can be stored in a memory location
/// (region).
class SymbolData : public SymExpr {
- const SymbolID Sym;
-
void anchor() override;
protected:
- SymbolData(Kind k, SymbolID sym) : SymExpr(k), Sym(sym) {
- assert(classof(this));
- }
+ SymbolData(Kind k, SymbolID sym) : SymExpr(k, sym) { assert(classof(this)); }
public:
~SymbolData() override = default;
@@ -132,8 +141,6 @@ class SymbolData : public SymExpr {
/// Get a string representation of the kind of the region.
virtual StringRef getKindStr() const = 0;
- SymbolID getSymbolID() const { return Sym; }
-
unsigned computeComplexity() const override {
return 1;
};
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
index 73732d532f630f..2b31534ce1a9dd 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h
@@ -25,6 +25,7 @@
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/FoldingSet.h"
+#include "llvm/ADT/ImmutableSet.h"
#include "llvm/ADT/iterator_range.h"
#include "llvm/Support/Allocator.h"
#include <cassert>
@@ -43,15 +44,16 @@ class StoreManager;
class SymbolRegionValue : public SymbolData {
const TypedValueRegion *R;
-public:
+ friend class SymExprAllocator;
SymbolRegionValue(SymbolID sym, const TypedValueRegion *r)
: SymbolData(SymbolRegionValueKind, sym), R(r) {
assert(r);
assert(isValidTypeForSymbol(r->getValueType()));
}
+public:
LLVM_ATTRIBUTE_RETURNS_NONNULL
- const TypedValueRegion* getRegion() const { return R; }
+ const TypedValueRegion *getRegion() const { return R; }
static void Profile(llvm::FoldingSetNodeID& profile, const TypedValueRegion* R) {
profile.AddInteger((unsigned) SymbolRegionValueKind);
@@ -84,7 +86,7 @@ class SymbolConjured : public SymbolData {
const LocationContext *LCtx;
const void *SymbolTag;
-public:
+ friend class SymExprAllocator;
SymbolConjured(SymbolID sym, const Stmt *s, const LocationContext *lctx,
QualType t, unsigned count, const void *symbolTag)
: SymbolData(SymbolConjuredKind, sym), S(s), T(t), Count(count),
@@ -98,6 +100,7 @@ class SymbolConjured : public SymbolData {
assert(isValidTypeForSymbol(t));
}
+public:
/// It might return null.
const Stmt *getStmt() const { return S; }
unsigned getCount() const { return Count; }
@@ -137,7 +140,7 @@ class SymbolDerived : public SymbolData {
SymbolRef parentSymbol;
const TypedValueRegion *R;
-public:
+ friend class SymExprAllocator;
SymbolDerived(SymbolID sym, SymbolRef parent, const TypedValueRegion *r)
: SymbolData(SymbolDerivedKind, sym), parentSymbol(parent), R(r) {
assert(parent);
@@ -145,6 +148,7 @@ class SymbolDerived : public SymbolData {
assert(isValidTypeForSymbol(r->getValueType()));
}
+public:
LLVM_ATTRIBUTE_RETURNS_NONNULL
SymbolRef getParentSymbol() const { return parentSymbol; }
LLVM_ATTRIBUTE_RETURNS_NONNULL
@@ -180,12 +184,13 @@ class SymbolDerived : public SymbolData {
class SymbolExtent : public SymbolData {
const SubRegion *R;
-public:
+ friend class SymExprAllocator;
SymbolExtent(SymbolID sym, const SubRegion *r)
: SymbolData(SymbolExtentKind, sym), R(r) {
assert(r);
}
+public:
LLVM_ATTRIBUTE_RETURNS_NONNULL
const SubRegion *getRegion() const { return R; }
@@ -222,7 +227,7 @@ class SymbolMetadata : public SymbolData {
unsigned Count;
const void *Tag;
-public:
+ friend class SymExprAllocator;
SymbolMetadata(SymbolID sym, const MemRegion* r, const Stmt *s, QualType t,
const LocationContext *LCtx, unsigned count, const void *tag)
: SymbolData(SymbolMetadataKind, sym), R(r), S(s), T(t), LCtx(LCtx),
@@ -234,6 +239,7 @@ class SymbolMetadata : public SymbolData {
assert(tag);
}
+ public:
LLVM_ATTRIBUTE_RETURNS_NONNULL
const MemRegion *getRegion() const { return R; }
@@ -286,15 +292,16 @@ class SymbolCast : public SymExpr {
/// The type of the result.
QualType ToTy;
-public:
- SymbolCast(const SymExpr *In, QualType From, QualType To)
- : SymExpr(SymbolCastKind), Operand(In), FromTy(From), ToTy(To) {
+ friend class SymExprAllocator;
+ SymbolCast(SymbolID Sym, const SymExpr *In, QualType From, QualType To)
+ : SymExpr(SymbolCastKind, Sym), Operand(In), FromTy(From), ToTy(To) {
assert(In);
assert(isValidTypeForSymbol(From));
// FIXME: GenericTaintChecker creates symbols of void type.
// Otherwise, 'To' should also be a valid type.
}
+public:
unsigned computeComplexity() const override {
if (Complexity == 0)
Complexity = 1 + Operand->computeComplexity();
@@ -332,9 +339,10 @@ class UnarySymExpr : public SymExpr {
UnaryOperator::Opcode Op;
QualType T;
-public:
- UnarySymExpr(const SymExpr *In, UnaryOperator::Opcode Op, QualType T)
- : SymExpr(UnarySymExprKind), Operand(In), Op(Op), T(T) {
+ friend class SymExprAllocator;
+ UnarySymExpr(SymbolID Sym, const SymExpr *In, UnaryOperator::Opcode Op,
+ QualType T)
+ : SymExpr(UnarySymExprKind, Sym), Operand(In), Op(Op), T(T) {
// Note, some unary operators are modeled as a binary operator. E.g. ++x is
// modeled as x + 1.
assert((Op == UO_Minus || Op == UO_Not) && "non-supported unary expression");
@@ -345,6 +353,7 @@ class UnarySymExpr : public SymExpr {
assert(!Loc::isLocType(T) && "unary symbol should be nonloc");
}
+public:
unsigned computeComplexity() const override {
if (Complexity == 0)
Complexity = 1 + Operand->computeComplexity();
@@ -381,8 +390,8 @@ class BinarySymExpr : public SymExpr {
QualType T;
protected:
- BinarySymExpr(Kind k, BinaryOperator::Opcode op, QualType t)
- : SymExpr(k), Op(op), T(t) {
+ BinarySymExpr(SymbolID Sym, Kind k, BinaryOperator::Opcode op, QualType t)
+ : SymExpr(k, Sym), Op(op), T(t) {
assert(classof(this));
// Binary expressions are results of arithmetic. Pointer arithmetic is not
// handled by binary expressions, but it is instead handled by applying
@@ -425,14 +434,15 @@ class BinarySymExprImpl : public BinarySymExpr {
LHSTYPE LHS;
RHSTYPE RHS;
-public:
- BinarySymExprImpl(LHSTYPE lhs, BinaryOperator::Opcode op, RHSTYPE rhs,
- QualType t)
- : BinarySymExpr(ClassKind, op, t), LHS(lhs), RHS(rhs) {
+ friend class SymExprAllocator;
+ BinarySymExprImpl(SymbolID Sym, LHSTYPE lhs, BinaryOperator::Opcode op,
+ RHSTYPE rhs, QualType t)
+ : BinarySymExpr(Sym, ClassKind, op, t), LHS(lhs), RHS(rhs) {
assert(getPointer(lhs));
assert(getPointer(rhs));
}
+public:
void dumpToStream(raw_ostream &os) const override {
dumpToStreamImpl(os, LHS);
dumpToStreamImpl(os, getOpcode());
@@ -478,6 +488,21 @@ using IntSymExpr = BinarySymExprImpl<APSIntPtr, const SymExpr *,
using SymSymExpr = BinarySymExprImpl<const SymExpr *, const SymExpr *,
SymExpr::Kind::SymSymExprKind>;
+class SymExprAllocator {
+ SymbolID NextSymbolID = 0;
+ llvm::BumpPtrAllocator &Alloc;
+
+public:
+ explicit SymExprAllocator(llvm::BumpPtrAllocator &Alloc) : Alloc(Alloc) {}
+
+ template <class SymT, typename... ArgsT> SymT *make(ArgsT &&...Args) {
+ return new (Alloc) SymT(nextID(), std::forward<ArgsT>(Args)...);
+ }
+
+private:
+ SymbolID nextID() { return NextSymbolID++; }
+};
+
class SymbolManager {
using DataSetTy = llvm::FoldingSet<SymExpr>;
using SymbolDependTy =
@@ -489,15 +514,14 @@ class SymbolManager {
/// alive as long as the key is live.
SymbolDependTy SymbolDependencies;
- unsigned SymbolCounter = 0;
- llvm::BumpPtrAllocator& BPAlloc;
+ SymExprAllocator Alloc;
BasicValueFactory &BV;
ASTContext &Ctx;
public:
SymbolManager(ASTContext &ctx, BasicValueFactory &bv,
- llvm::BumpPtrAllocator& bpalloc)
- : SymbolDependencies(16), BPAlloc(bpalloc), BV(bv), Ctx(ctx) {}
+ llvm::BumpPtrAllocator &bpalloc)
+ : SymbolDependencies(16), Alloc(bpalloc), BV(bv), Ctx(ctx) {}
static bool canSymbolicate(QualType T);
@@ -687,4 +711,35 @@ class SymbolVisitor {
} // namespace clang
+// Override the default definition that would use pointer values of SymbolRefs
+// to order them, which is unstable due to ASLR.
+// Use the SymbolID instead which reflect the order in which the symbols were
+// allocated. This is usually stable across runs leading to the stability of
+// ConstraintMap and other containers using SymbolRef as keys.
+template <>
+struct ::llvm::ImutContainerInfo<clang::ento::SymbolRef>
+ : public ImutProfileInfo<clang::ento::SymbolRef> {
+ using value_type =
+ typename ImutProfileInfo<clang::ento::SymbolRef>::value_type;
+ using value_type_ref =
+ typename ImutProfileInfo<clang::ento::SymbolRef>::value_type_ref;
+ using key_type = value_type;
+ using key_type_ref = value_type_ref;
+ using data_type = bool;
+ using data_type_ref = bool;
+
+ static key_type_ref KeyOfValue(value_type_ref D) { return D; }
+ static data_type_ref DataOfValue(value_type_ref) { return true; }
+
+ static bool isEqual(key_type_ref LHS, key_type_ref RHS) {
+ return LHS->getSymbolID() == RHS->getSymbolID();
+ }
+
+ static bool isLess(key_type_ref LHS, key_type_ref RHS) {
+ return LHS->getSymbolID() < RHS->getSymbolID();
+ }
+
+ static bool isDataEqual(data_type_ref, data_type_ref) { return true; }
+};
+
#endif // LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_SYMBOLMANAGER_H
diff --git a/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp b/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
index f21e5c3ad7bd7c..738b6a175ce6de 100644
--- a/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
@@ -170,9 +170,8 @@ SymbolManager::getRegionValueSymbol(const TypedValueRegion* R) {
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolRegionValue(SymbolCounter, R);
+ SD = Alloc.make<SymbolRegionValue>(R);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolRegionValue>(SD);
@@ -188,9 +187,8 @@ const SymbolConjured* SymbolManager::conjureSymbol(const Stmt *E,
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolConjured(SymbolCounter, E, LCtx, T, Count, SymbolTag);
+ SD = Alloc.make<SymbolConjured>(E, LCtx, T, Count, SymbolTag);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolConjured>(SD);
@@ -204,9 +202,8 @@ SymbolManager::getDerivedSymbol(SymbolRef parentSymbol,
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolDerived(SymbolCounter, parentSymbol, R);
+ SD = Alloc.make<SymbolDerived>(parentSymbol, R);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolDerived>(SD);
@@ -219,9 +216,8 @@ SymbolManager::getExtentSymbol(const SubRegion *R) {
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolExtent(SymbolCounter, R);
+ SD = Alloc.make<SymbolExtent>(R);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolExtent>(SD);
@@ -236,9 +232,8 @@ SymbolManager::getMetadataSymbol(const MemRegion* R, const Stmt *S, QualType T,
void *InsertPos;
SymExpr *SD = DataSet.FindNodeOrInsertPos(profile, InsertPos);
if (!SD) {
- SD = new (BPAlloc) SymbolMetadata(SymbolCounter, R, S, T, LCtx, Count, SymbolTag);
+ SD = Alloc.make<SymbolMetadata>(R, S, T, LCtx, Count, SymbolTag);
DataSet.InsertNode(SD, InsertPos);
- ++SymbolCounter;
}
return cast<SymbolMetadata>(SD);
@@ -252,7 +247,7 @@ SymbolManager::getCastSymbol(const SymExpr *Op,
void *InsertPos;
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) SymbolCast(Op, From, To);
+ data = Alloc.make<SymbolCast>(Op, From, To);
DataSet.InsertNode(data, InsertPos);
}
@@ -268,7 +263,7 @@ const SymIntExpr *SymbolManager::getSymIntExpr(const SymExpr *lhs,
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) SymIntExpr(lhs, op, v, t);
+ data = Alloc.make<SymIntExpr>(lhs, op, v, t);
DataSet.InsertNode(data, InsertPos);
}
@@ -284,7 +279,7 @@ const IntSymExpr *SymbolManager::getIntSymExpr(APSIntPtr lhs,
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) IntSymExpr(lhs, op, rhs, t);
+ data = Alloc.make<IntSymExpr>(lhs, op, rhs, t);
DataSet.InsertNode(data, InsertPos);
}
@@ -301,7 +296,7 @@ const SymSymExpr *SymbolManager::getSymSymExpr(const SymExpr *lhs,
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) SymSymExpr(lhs, op, rhs, t);
+ data = Alloc.make<SymSymExpr>(lhs, op, rhs, t);
DataSet.InsertNode(data, InsertPos);
}
@@ -316,7 +311,7 @@ const UnarySymExpr *SymbolManager::getUnarySymExpr(const SymExpr *Operand,
void *InsertPos;
SymExpr *data = DataSet.FindNodeOrInsertPos(ID, InsertPos);
if (!data) {
- data = new (BPAlloc) UnarySymExpr(Operand, Opc, T);
+ data = Alloc.make<UnarySymExpr>(Operand, Opc, T);
DataSet.InsertNode(data, InsertPos);
}
diff --git a/clang/test/Analysis/dump_egraph.cpp b/clang/test/Analysis/dump_egraph.cpp
index d1229b26346740..13459699a06f6f 100644
--- a/clang/test/Analysis/dump_egraph.cpp
+++ b/clang/test/Analysis/dump_egraph.cpp
@@ -21,7 +21,7 @@ void foo() {
// CHECK: \"location_context\": \"#0 Call\", \"calling\": \"T::T\", \"location\": \{ \"line\": 15, \"column\": 5, \"file\": \"{{.*}}dump_egraph.cpp\" \}, \"items\": [\l \{ \"init_id\": {{[0-9]+}}, \"kind\": \"construct into member variable\", \"argument_index\": null, \"pretty\": \"s\", \"value\": \"&t.s\"
-// CHECK: \"cluster\": \"t\", \"pointer\": \"{{0x[0-9a-f]+}}\", \"items\": [\l \{ \"kind\": \"Default\", \"offset\": 0, \"value\": \"conj_$2\{int, LC5, no stmt, #1\}\"
+// CHECK: \"cluster\": \"t\", \"pointer\": \"{{0x[0-9a-f]+}}\", \"items\": [\l \{ \"kind\": \"Default\", \"offset\": 0, \"value\": \"conj_$3\{int, LC5, no stmt, #1\}\"
// CHECK: \"dynamic_types\": [\l \{ \"region\": \"HeapSymRegion\{conj_$1\{S *, LC1, S{{[0-9]+}}, #1\}\}\", \"dyn_type\": \"S\", \"sub_classable\": false \}\l
diff --git a/clang/test/Analysis/expr-inspection-printState-diseq-info.c b/clang/test/Analysis/expr-inspection-printState-diseq-info.c
index c5c31785a600ef..515fcbbd430791 100644
--- a/clang/test/Analysis/expr-inspection-printState-diseq-info.c
+++ b/clang/test/Analysis/expr-inspection-printState-diseq-info.c
@@ -18,17 +18,17 @@ void test_disequality_info(int e0, int b0, int b1, int c0) {
// CHECK-NEXT: {
// CHECK-NEXT: "class": [ "(reg_$0<int e0>) - 2" ],
// CHECK-NEXT: "disequal_to": [
- // CHECK-NEXT: [ "reg_$2<int b1>" ]]
+ // CHECK-NEXT: [ "reg_$7<int b1>" ]]
// CHECK-NEXT: },
// CHECK-NEXT: {
- // CHECK-NEXT: "class": [ "reg_$2<int b1>" ],
+ // CHECK-NEXT: "class": [ "reg_$15<int c0>" ],
// CHECK-NEXT: "disequal_to": [
- // CHECK-NEXT: [ "(reg_$0<int e0>) - 2" ],
- // CHECK-NEXT: [ "reg_$3<int c0>" ]]
+ // CHECK-NEXT: [ "reg_$7<int b1>" ]]
// CHECK-NEXT: },
// CHECK-NEXT: {
- // CHECK-NEXT: "class": [ "reg_$3<int c0>" ],
+ // CHECK-NEXT: "class": [ "reg_$7<int b1>" ],
// CHECK-NEXT: "disequal_to": [
- // CHECK-NEXT: [ "reg_$2<int b1>" ]]
+ // CHECK-NEXT: [ "(reg_$0<int e0>) - 2" ],
+ // CHECK-NEXT: [ "reg_$15<int c0>" ]]
// CHECK-NEXT: }
// CHECK-NEXT: ],
diff --git a/clang/test/Analysis/expr-inspection-printState-eq-classes.c b/clang/test/Analysis/expr-inspection-printState-eq-classes.c
index 38e23d6e838269..19cc13735ab5a6 100644
--- a/clang/test/Analysis/expr-inspection-printState-eq-classes.c
+++ b/clang/test/Analysis/expr-inspection-printState-eq-classes.c
@@ -16,6 +16,6 @@ void test_equivalence_classes(int a, int b, int c, int d) {
}
// CHECK: "equivalence_classes": [
-// CHECK-NEXT: [ "(reg_$0<int a>) != (reg_$2<int c>)" ],
-// CHECK-NEXT: [ "reg_$0<int a>", "reg_$2<int c>", "reg_$3<int d>" ]
+// CHECK-NEXT: [ "(reg_$0<int a>) != (reg_$5<int c>)" ],
+// CHECK-NEXT: [ "reg_$0<int a>", "reg_$20<int d>", "reg_$5<int c>" ]
// CHECK-NEXT: ],
diff --git a/clang/test/Analysis/ptr-arith.cpp b/clang/test/Analysis/ptr-arith.cpp
index a1264a1f04839c..ec1c75c0c40632 100644
--- a/clang/test/Analysis/ptr-arith.cpp
+++ b/clang/test/Analysis/ptr-arith.cpp
@@ -139,10 +139,10 @@ struct parse_t {
int parse(parse_t *p) {
unsigned copy = p->bits2;
clang_analyzer_dump(copy);
- // expected-warning@-1 {{reg_$1<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>}}
+ // expected-warning@-1 {{reg_$2<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>}}
header *bits = (header *)©
clang_analyzer_dump(bits->b);
- // expected-warning@-1 {{derived_$2{reg_$1<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>,Element{copy,0 S64b,struct Bug_55934::header}.b}}}
+ // expected-warning@-1 {{derived_$4{reg_$2<unsigned int Element{SymRegion{reg_$0<parse_t * p>},0 S64b,struct Bug_55934::parse_t}.bits2>,Element{copy,0 S64b,struct Bug_55934::header}.b}}}
return bits->b; // no-warning
}
} // namespace Bug_55934
diff --git a/clang/test/Analysis/symbol-simplification-disequality-info.cpp b/clang/test/Analysis/symbol-simplification-disequality-info.cpp
index 69238b583eb846..33b8f150f5d021 100644
--- a/clang/test/Analysis/symbol-simplification-disequality-info.cpp
+++ b/clang/test/Analysis/symbol-simplification-disequality-info.cpp
@@ -14,14 +14,14 @@ void test(int a, int b, int c, int d) {
clang_analyzer_printState();
// CHECK: "disequality_info": [
// CHECK-NEXT: {
- // CHECK-NEXT: "class": [ "((reg_$0<int a>...
[truncated]
|
@steakhal @NagyDonat, I revised and improved #121347. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xazax-hun I'm pretty happy with this PR, but I'm curious about your opinion on adding the SymbolID Sym
member to the SymExprs.
using value_type = | ||
typename ImutProfileInfo<clang::ento::SymbolRef>::value_type; | ||
using value_type_ref = | ||
typename ImutProfileInfo<clang::ento::SymbolRef>::value_type_ref; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just desugar these. We don't need these indirections here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inlined
d32f6ab [NFC] Explain why isDataEqual is necessary; inline type aliases.
static bool isEqual(key_type_ref LHS, key_type_ref RHS) { | ||
return LHS->getSymbolID() == RHS->getSymbolID(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these member functions, I'd prefer using directly the aliased types, such as SymbolRef - or whatever key_type_ref
. I don't think we have solid reasons using the type aliases here and in the rest of the mfns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined
d32f6ab [NFC] Explain why isDataEqual is necessary; inline type aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with this change. Having more stable analysis results is valuable.
That being said, do we know how exactly the unstable addresses result in unstable results?
I cannot pinpoint the responsible code fragment, but in general this is a typical property of associative data structures. [Edit: Oops, I didn't notice the "exactly" in your question and wrote a general answer, which you probably already know.] A typical associative map doesn't preserve the order of insertion, and instead stores the key-value pairs in some arbitrary order that facilitates quick and effective access (for example, many implementations sort them according to the keys -- or the hash values of the keys). When the program iterates over a data structure of this kind, it will see the key-value pairs in this arbitrary order, which may depend on pointer values, whose order relationships are undefined and may change if the program is reran with the same input. If the calculations performed by the loop are non-commutative (e.g. "sum all the values" is commutative, "return the first value which is larger than 10" is not), then this unstable iteration order produces unstable output. By the way, the Python language (where performance is not prioritized) rules out these subtle bugs by recording the order of insertion within its |
One mechanism that we studied in the past weeks is the SMT issue refutation: assertion order is that of the constraint order in the There are likely other mechanisms, for example, the SymbolRef
Interesting parallel, as this patch is applying the same strategy for SymbolRef maps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this improvement. Using the "raw" ordering of pointers is always a red flag, I strongly support eliminating it if possible.
By the way, what fraction of the results is perturbed (replaced with other random results) when this commit is activated? Did you run any "without this commit vs with this commit" comparisons?
As a tangential remark, I noticed that there seems to be heavy code duplication between the getXSymbol
methods of SymbolManager
-- it would be nice to unify them into a single template method that takes the symbol class and the argument list types as template parameters (analogously to SymExprAllocator::make
which does similar forwarding). However, this probably belongs to a separate NFC refactoring commit (and I can also implement it if you don't have time for it). What do you think? Do you see any obstacle (or perhaps conflict with changes that you're planning).
The difference is around 5-10 appearing/disappearing issues, and 50-100 issues with changed execution path (out of 83 K issues, with no Z3 refutation).
It makes sense and I have no future developments that would be incompatible with this refactoring. I did not do it in this PR to avoid scope creep. I will see on Monday, if I have some spare time to do this refactoring. @steakhal, I believe I addressed all your feedback. If you are happy with the PR, feel free to merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/14875 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/163/builds/11485 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/9825 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/181/builds/11084 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/13104 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/4120 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/105/builds/5409 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/97/builds/4057 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/28/builds/5692 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/133/builds/9104 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/75/builds/4259 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/99/builds/3968 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/158/builds/5232 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/118/builds/4016 Here is the relevant piece of the build log for the reference
|
#121592) Reverts #121551 We had a bunch of build errors caused by this PR. https://lab.llvm.org/buildbot/#/builders/144/builds/14875
@necto Please have a look at the build failures. There are plenty. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/8/builds/9271 Here is the relevant piece of the build log for the reference
|
…s" (#121749) Generalize the SymbolIDs used for SymbolData to all SymExprs and use these IDs for comparison SymbolRef keys in various containers, such as ConstraintMap. These IDs are superior to raw pointer values because they are more controllable and are not randomized across executions (unlike [pointers](https://en.wikipedia.org/wiki/Address_space_layout_randomization)). These IDs order is stable across runs because SymExprs are allocated in the same order. Stability of the constraint order is important for the stability of the analyzer results. I evaluated this change on a set of 200+ open-source C and C++ projects with the total number of ~78 000 symbolic-execution issues passing Z3 refutation. This patch reduced the run-to-run churn (flakiness) in SE issues from 80-90 to 30-40 (out of 78K) in our CSA deployment (in our setting flaky issues are mostly due to Z3 refutation instability). Note, most of the issue churn (flakiness) is caused by the mentioned Z3 refutation. With Z3 refutation disabled, issue churn goes down to ~10 issues out of 83K and this patch has no effect on appearing/disappearing issues between runs. It however, seems to reduce the volatility of the execution flow: before we had 40-80 issues with changed execution flow, after - 10-30. Importantly, this change is necessary for the next step in stabilizing analysis results by caching Z3 query outcomes between analysis runs (work in progress). Across our admittedly noisy CI runs, I detected no significant effect on memory footprint or analysis time. This PR reapplies #121551 with a fix to a g++ compiler error reported on some build bots CPP-5919
…s" (llvm#121749) Generalize the SymbolIDs used for SymbolData to all SymExprs and use these IDs for comparison SymbolRef keys in various containers, such as ConstraintMap. These IDs are superior to raw pointer values because they are more controllable and are not randomized across executions (unlike [pointers](https://en.wikipedia.org/wiki/Address_space_layout_randomization)). These IDs order is stable across runs because SymExprs are allocated in the same order. Stability of the constraint order is important for the stability of the analyzer results. I evaluated this change on a set of 200+ open-source C and C++ projects with the total number of ~78 000 symbolic-execution issues passing Z3 refutation. This patch reduced the run-to-run churn (flakiness) in SE issues from 80-90 to 30-40 (out of 78K) in our CSA deployment (in our setting flaky issues are mostly due to Z3 refutation instability). Note, most of the issue churn (flakiness) is caused by the mentioned Z3 refutation. With Z3 refutation disabled, issue churn goes down to ~10 issues out of 83K and this patch has no effect on appearing/disappearing issues between runs. It however, seems to reduce the volatility of the execution flow: before we had 40-80 issues with changed execution flow, after - 10-30. Importantly, this change is necessary for the next step in stabilizing analysis results by caching Z3 query outcomes between analysis runs (work in progress). Across our admittedly noisy CI runs, I detected no significant effect on memory footprint or analysis time. This PR reapplies llvm#121551 with a fix to a g++ compiler error reported on some build bots CPP-5919
… containers" (#121592) Reverts llvm/llvm-project#121551 We had a bunch of build errors caused by this PR. https://lab.llvm.org/buildbot/#/builders/144/builds/14875
…d containers" (#121749) Generalize the SymbolIDs used for SymbolData to all SymExprs and use these IDs for comparison SymbolRef keys in various containers, such as ConstraintMap. These IDs are superior to raw pointer values because they are more controllable and are not randomized across executions (unlike [pointers](https://en.wikipedia.org/wiki/Address_space_layout_randomization)). These IDs order is stable across runs because SymExprs are allocated in the same order. Stability of the constraint order is important for the stability of the analyzer results. I evaluated this change on a set of 200+ open-source C and C++ projects with the total number of ~78 000 symbolic-execution issues passing Z3 refutation. This patch reduced the run-to-run churn (flakiness) in SE issues from 80-90 to 30-40 (out of 78K) in our CSA deployment (in our setting flaky issues are mostly due to Z3 refutation instability). Note, most of the issue churn (flakiness) is caused by the mentioned Z3 refutation. With Z3 refutation disabled, issue churn goes down to ~10 issues out of 83K and this patch has no effect on appearing/disappearing issues between runs. It however, seems to reduce the volatility of the execution flow: before we had 40-80 issues with changed execution flow, after - 10-30. Importantly, this change is necessary for the next step in stabilizing analysis results by caching Z3 query outcomes between analysis runs (work in progress). Across our admittedly noisy CI runs, I detected no significant effect on memory footprint or analysis time. This PR reapplies llvm/llvm-project#121551 with a fix to a g++ compiler error reported on some build bots CPP-5919
Generalize the
SymbolID
s used forSymbolData
to allSymExpr
s and use these IDs for comparisonSymbolRef
keys in various containers, such asConstraintMap
. These IDs are superior to raw pointer values because they are more controllable and are not randomized across executions (unlike pointers).These IDs order is stable across runs because SymExprs are allocated in the same order.
Stability of the constraint order is important for the stability of the analyzer results. I evaluated this change on a set of 200+ open-source C and C++ projects with the total number of ~78 000 symbolic-execution issues passing Z3 refutation.
This patch reduced the run-to-run churn (flakiness) in SE issues from 80-90 to 30-40 (out of 78K) in our CSA deployment (in our setting flaky issues are mostly due to Z3 refutation instability).
Note, most of the issue churn (flakiness) is caused by the mentioned Z3 refutation. With Z3 refutation disabled, issue churn goes down to ~10 issues out of 83K and this patch has no effect on appearing/disappearing issues between runs. It however, seems to reduce the volatility of the execution flow: before we had 40-80 issues with changed execution flow, after - 10-30.
Importantly, this change is necessary for the next step in stabilizing analysis results by caching Z3 query outcomes between analysis runs (work in progress).
Across our admittedly noisy CI runs, I detected no significant effect on memory footprint or analysis time.
This change set supersedes #121347
CPP-5919