Skip to content

[analyzer][NFC] Make RegionStore dumps deterministic #115615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 12, 2024
Merged

[analyzer][NFC] Make RegionStore dumps deterministic #115615

merged 3 commits into from
Nov 12, 2024

Conversation

steakhal
Copy link
Contributor

@steakhal steakhal commented Nov 9, 2024

Dump the memory space clusters before the other clusters, in alphabetical order. Then default bindings over direct bindings, and if any has symbolic offset, then those should come before the ones with concrete offsets.
In theory, we should either have a symbolic offset OR concrete offsets, but never both at the same time.

Needed for #114835

Dump the memory space clusters before the other clusters, in
alphabetical order. Then default bindings over direct bindings, and if
any has symbolic offset, then those should come before the ones with
concrete offsets.
In theory, we should either have a symbolic offset OR concrete offsets,
but never both at the same time.
@steakhal steakhal requested a review from NagyDonat November 9, 2024 18:31
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Nov 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 9, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

Changes

Dump the memory space clusters before the other clusters, in alphabetical order. Then default bindings over direct bindings, and if any has symbolic offset, then those should come before the ones with concrete offsets.
In theory, we should either have a symbolic offset OR concrete offsets, but never both at the same time.

Needed for #114835


Full diff: https://github.com/llvm/llvm-project/pull/115615.diff

1 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Core/RegionStore.cpp (+73-13)
diff --git a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
index 674099dd7e1f0f..6bad9a93a30169 100644
--- a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -67,9 +67,10 @@ class BindingKey {
             isa<ObjCIvarRegion, CXXDerivedObjectRegion>(r)) &&
            "Not a base");
   }
-public:
 
+public:
   bool isDirect() const { return P.getInt() & Direct; }
+  bool isDefault() const { return !isDirect(); }
   bool hasSymbolicOffset() const { return P.getInt() & Symbolic; }
 
   const MemRegion *getRegion() const { return P.getPointer(); }
@@ -232,27 +233,86 @@ class RegionBindingsRef : public llvm::ImmutableMapRef<const MemRegion *,
 
   void printJson(raw_ostream &Out, const char *NL = "\n",
                  unsigned int Space = 0, bool IsDot = false) const {
-    for (iterator I = begin(), E = end(); I != E; ++I) {
-      // TODO: We might need a .printJson for I.getKey() as well.
+    using namespace llvm;
+    DenseMap<const MemRegion *, std::string> StringifyCache;
+    auto ToString = [&StringifyCache](const MemRegion *R) {
+      auto [Place, Inserted] = StringifyCache.try_emplace(R);
+      if (!Inserted)
+        return Place->second;
+      std::string Res;
+      raw_string_ostream OS(Res);
+      OS << R;
+      Place->second = Res;
+      return Res;
+    };
+
+    using Cluster =
+        std::pair<const MemRegion *, ImmutableMap<BindingKey, SVal>>;
+    using Binding = std::pair<BindingKey, SVal>;
+
+    const auto MemSpaceBeforeRegionName = [&ToString](const Cluster *L,
+                                                      const Cluster *R) {
+      if (isa<MemSpaceRegion>(L->first) && !isa<MemSpaceRegion>(R->first))
+        return true;
+      if (!isa<MemSpaceRegion>(L->first) && isa<MemSpaceRegion>(R->first))
+        return false;
+      return ToString(L->first) < ToString(R->first);
+    };
+
+    const auto SymbolicBeforeOffset = [&ToString](const BindingKey &L,
+                                                  const BindingKey &R) {
+      if (L.hasSymbolicOffset() && !R.hasSymbolicOffset())
+        return true;
+      if (!L.hasSymbolicOffset() && R.hasSymbolicOffset())
+        return false;
+      if (L.hasSymbolicOffset() && R.hasSymbolicOffset())
+        return ToString(L.getRegion()) < ToString(R.getRegion());
+      return L.getOffset() < R.getOffset();
+    };
+
+    const auto DefaultBindingBeforeDirectBindings =
+        [&SymbolicBeforeOffset](const Binding *LPtr, const Binding *RPtr) {
+          const BindingKey &L = LPtr->first;
+          const BindingKey &R = RPtr->first;
+          if (L.isDefault() && !R.isDefault())
+            return true;
+          if (!L.isDefault() && R.isDefault())
+            return false;
+          assert(L.isDefault() == R.isDefault());
+          return SymbolicBeforeOffset(L, R);
+        };
+
+    const auto AddrOf = [](const auto &Item) { return &Item; };
+
+    std::vector<const Cluster *> SortedClusters;
+    SortedClusters.reserve(std::distance(begin(), end()));
+    append_range(SortedClusters, map_range(*this, AddrOf));
+    llvm::sort(SortedClusters, MemSpaceBeforeRegionName);
+
+    for (auto [Idx, C] : llvm::enumerate(SortedClusters)) {
+      const auto &[BaseRegion, Bindings] = *C;
       Indent(Out, Space, IsDot)
-          << "{ \"cluster\": \"" << I.getKey() << "\", \"pointer\": \""
-          << (const void *)I.getKey() << "\", \"items\": [" << NL;
+          << "{ \"cluster\": \"" << BaseRegion << "\", \"pointer\": \""
+          << (const void *)BaseRegion << "\", \"items\": [" << NL;
+
+      std::vector<const Binding *> SortedBindings;
+      SortedBindings.reserve(std::distance(Bindings.begin(), Bindings.end()));
+      append_range(SortedBindings, map_range(Bindings, AddrOf));
+      llvm::sort(SortedBindings, DefaultBindingBeforeDirectBindings);
 
       ++Space;
-      const ClusterBindings &CB = I.getData();
-      for (ClusterBindings::iterator CI = CB.begin(), CE = CB.end(); CI != CE;
-           ++CI) {
-        Indent(Out, Space, IsDot) << "{ " << CI.getKey() << ", \"value\": ";
-        CI.getData().printJson(Out, /*AddQuotes=*/true);
+      for (auto [Idx, B] : llvm::enumerate(SortedBindings)) {
+        const auto &[Key, Value] = *B;
+        Indent(Out, Space, IsDot) << "{ " << Key << ", \"value\": ";
+        Value.printJson(Out, /*AddQuotes=*/true);
         Out << " }";
-        if (std::next(CI) != CE)
+        if (Idx != SortedBindings.size() - 1)
           Out << ',';
         Out << NL;
       }
-
       --Space;
       Indent(Out, Space, IsDot) << "]}";
-      if (std::next(I) != E)
+      if (Idx != SortedClusters.size() - 1)
         Out << ',';
       Out << NL;
     }

@@ -232,27 +233,86 @@ class RegionBindingsRef : public llvm::ImmutableMapRef<const MemRegion *,

void printJson(raw_ostream &Out, const char *NL = "\n",
unsigned int Space = 0, bool IsDot = false) const {
for (iterator I = begin(), E = end(); I != E; ++I) {
// TODO: We might need a .printJson for I.getKey() as well.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped this TODO. It's not relevant. The key is always a base memory region, which obeys the json rules.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, feel free to add these NFC improvements.

My only comment is a suggestion to define the comparison logic with a higher-level approach -- but the existing code is also OK if you prefer that.

@steakhal steakhal merged commit e05d91b into llvm:main Nov 12, 2024
5 of 7 checks passed
@steakhal steakhal deleted the bb/region-store-deterministic-dumps branch November 12, 2024 13:50
steakhal added a commit that referenced this pull request Nov 12, 2024
steakhal added a commit that referenced this pull request Nov 12, 2024
Reverts #115615

There are two problems with this PR:
1) If any of the dumps contains a store with a symbolic binding, we
crash.
2) The memory space clusters come last among the clusters, which is not
what I intended.

I'm reverting because of the crash.
steakhal added a commit that referenced this pull request Nov 12, 2024
This is reapplies #115615 without using tuples. The eager call of
`getRegion()` and `getOffset()` could cause crashes when the Store had
symbolic bindings.

Here I'm fixing the crash by lazily calling those getters.

Also, the tuple version poorly sorted the Clusters. The memory spaces
should have come before the regular clusters.
Now, that is also fixed here, demonstrated by the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants