Skip to content

[analyzer][NFC] Rework SVal kind representation #71039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 4, 2023
Merged

[analyzer][NFC] Rework SVal kind representation #71039

merged 1 commit into from
Nov 4, 2023

Conversation

steakhal
Copy link
Contributor

@steakhal steakhal commented Nov 2, 2023

The goal of this patch is to refine how the SVal base and sub-kinds are represented by forming one unified enum describing the possible SVals. This means that the unsigned SVal::Kind and the attached bit-packing semantics would be replaced by a single unified enum. This is more conventional and leads to a better debugging experience by default. This eases the need of using debug pretty-printers, or the use of runtime functions doing the printing for us like we do today by calling Val.dump() whenever we inspect the values.

Previously, the first 2 bits of the unsigned SVal::Kind discriminated the following quartet: UndefinedVal, UnknownVal, Loc, or NonLoc. The rest of the upper bits represented the sub-kind, where the value represented the index among only the Locs or NonLocs, effectively attaching 2 meanings of the upper bits depending on the base-kind. We don't need to pack these bits, as we have plenty even if we would use just a plan-old unsigned char.

Consequently, in this patch, I propose to lay out all the (non-abstract) SVal kinds into a single enum, along with some metadata (BEGIN_Loc, END_Loc, BEGIN_NonLoc, END_NonLoc) artificial enum values, similar how we do with the MemRegions.

Note that in the unified SVal::Kind enum, to differentiate nonloc::ConcreteInt from loc::ConcreteInt, I had to prefix them with Loc and NonLoc to resolve this ambiguity.
This should not surface in general, because I'm replacing the nonloc::Kind enum items with inline constexpr global constants to mimic the original behavior - and offer nicer spelling to these enum values.

Some SVal constructors were not marked explicit, which I now mark as such to follow best practices, and marked others as /*implicit*/ to clarify the intent.
During refactoring, I also found at least one function not marked LLVM_ATTRIBUTE_RETURNS_NONNULL, so I did that.

The TypeRetrievingVisitor visitor had some accidental dead code, namely: VisitNonLocConcreteInt and VisitLocConcreteInt.

Previously, the SValVisitor expected visit handlers of VisitNonLocXXXXX(nonloc::XXXXX) and VisitLocXXXXX(loc::XXXXX), where I felt that envoding NonLoc and Loc in the name is not necessary as the type of the parameter would select the right overload anyways, so I simplified the naming of those visit functions.

The rest of the diff is a lot of times just formatting, because getKind() by nature, frequently appears in switches, which means that the whole switch gets automatically reformatted. I could probably undo the formatting, but I didn't want to deviate from the rule unless explicitly requested.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Nov 2, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 2, 2023

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

Changes

The goal of this patch is to refine how the SVal base and sub-kinds are represented by forming one unified enum describing the possible SVals. This means that the unsigned SVal::Kind and the attached bit-packing semantics would be replaced by a single unified enum. This is more conventional and leads to a better debugging experience by default. This eases the need of using debug pretty-printers, or the use of runtime functions doing the printing for us like we do today by calling Val.dump() whenever we inspect the values.

Previously, the first 2 bits of the unsigned SVal::Kind discriminated the following quartet: UndefinedVal, UnknownVal, Loc, or NonLoc. The rest of the upper bits represented the sub-kind, where the value represented the index among only the Locs or NonLocs, effectively attaching 2 meanings of the upper bits depending on the base-kind. We don't need to pack these bits, as we have plenty even if we would use just a plan-old unsigned char.

Consequently, in this patch, I propose to lay out all the (non-abstract) SVal kinds into a single enum, along with some metadata (BEGIN_Loc, END_Loc, BEGIN_NonLoc, END_NonLoc) artificial enum values, similar how we do with the MemRegions.

Note that in the unified SVal::Kind enum, to differentiate nonloc::ConcreteInt from loc::ConcreteInt, I had to prefix them with Loc and NonLoc to resolve this ambiguity.
This should not surface in general, because I'm replacing the nonloc::Kind enum items with inline constexpr global constants to mimic the original behavior - and offer nicer spelling to these enum values.

Some SVal constructors were not marked explicit, which I now mark as such to follow best practices, and marked others as /*implicit*/ to clarify the intent.
During refactoring, I also found at least one function not marked LLVM_ATTRIBUTE_RETURNS_NONNULL, so I did that.

The TypeRetrievingVisitor visitor had some accidental dead code, namely: VisitNonLocConcreteInt and VisitLocConcreteInt.

Previously, the SValVisitor expected visit handlers of VisitNonLocXXXXX(nonloc::XXXXX) and VisitLocXXXXX(loc::XXXXX), where I felt that envoding NonLoc and Loc in the name is not necessary as the type of the parameter would select the right overload anyways, so I simplified the naming of those visit functions.

The rest of the diff is a lot of times just formatting, because getKind() by nature, frequently appears in switches, which means that the whole switch gets automatically reformatted. I could probably undo the formatting, but I didn't want to deviate from the rule unless explicitly requested.


Patch is 40.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/71039.diff

9 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h (+5-5)
  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h (+25-32)
  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.def (+18-20)
  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h (+77-148)
  • (modified) clang/lib/StaticAnalyzer/Core/SValBuilder.cpp (+14-14)
  • (modified) clang/lib/StaticAnalyzer/Core/SVals.cpp (+42-45)
  • (modified) clang/lib/StaticAnalyzer/Core/SimpleConstraintManager.cpp (+2-2)
  • (modified) clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp (+36-37)
  • (modified) clang/lib/StaticAnalyzer/Core/Store.cpp (+1-1)
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h b/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
index 3ae28c1dba3eb5a..43a70f596a4da28 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
+++ b/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
@@ -65,7 +65,7 @@ class SValExplainer : public FullSValVisitor<SValExplainer, std::string> {
     return "undefined value";
   }
 
-  std::string VisitLocMemRegionVal(loc::MemRegionVal V) {
+  std::string VisitMemRegionVal(loc::MemRegionVal V) {
     const MemRegion *R = V.getRegion();
     // Avoid the weird "pointer to pointee of ...".
     if (auto SR = dyn_cast<SymbolicRegion>(R)) {
@@ -76,7 +76,7 @@ class SValExplainer : public FullSValVisitor<SValExplainer, std::string> {
     return "pointer to " + Visit(R);
   }
 
-  std::string VisitLocConcreteInt(loc::ConcreteInt V) {
+  std::string VisitConcreteInt(loc::ConcreteInt V) {
     const llvm::APSInt &I = V.getValue();
     std::string Str;
     llvm::raw_string_ostream OS(Str);
@@ -84,11 +84,11 @@ class SValExplainer : public FullSValVisitor<SValExplainer, std::string> {
     return Str;
   }
 
-  std::string VisitNonLocSymbolVal(nonloc::SymbolVal V) {
+  std::string VisitSymbolVal(nonloc::SymbolVal V) {
     return Visit(V.getSymbol());
   }
 
-  std::string VisitNonLocConcreteInt(nonloc::ConcreteInt V) {
+  std::string VisitConcreteInt(nonloc::ConcreteInt V) {
     const llvm::APSInt &I = V.getValue();
     std::string Str;
     llvm::raw_string_ostream OS(Str);
@@ -97,7 +97,7 @@ class SValExplainer : public FullSValVisitor<SValExplainer, std::string> {
     return Str;
   }
 
-  std::string VisitNonLocLazyCompoundVal(nonloc::LazyCompoundVal V) {
+  std::string VisitLazyCompoundVal(nonloc::LazyCompoundVal V) {
     return "lazily frozen compound value of " + Visit(V.getRegion());
   }
 
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h
index fc83e26183b3e7d..f7bb37f6591671f 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h
@@ -14,9 +14,9 @@
 #ifndef LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_SVALVISITOR_H
 #define LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_SVALVISITOR_H
 
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h"
-#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
 
 namespace clang {
 
@@ -25,49 +25,42 @@ namespace ento {
 /// SValVisitor - this class implements a simple visitor for SVal
 /// subclasses.
 template <typename ImplClass, typename RetTy = void> class SValVisitor {
-public:
-
-#define DISPATCH(NAME, CLASS) \
-  return static_cast<ImplClass *>(this)->Visit ## NAME(V.castAs<CLASS>())
+  ImplClass &derived() { return *static_cast<ImplClass *>(this); }
 
+public:
   RetTy Visit(SVal V) {
     // Dispatch to VisitFooVal for each FooVal.
-    // Take namespaces (loc:: and nonloc::) into account.
-    switch (V.getBaseKind()) {
-#define BASIC_SVAL(Id, Parent) case SVal::Id ## Kind: DISPATCH(Id, Id);
-#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.def"
-    case SVal::LocKind:
-      switch (V.getSubKind()) {
-#define LOC_SVAL(Id, Parent) \
-      case loc::Id ## Kind: DISPATCH(Loc ## Id, loc :: Id);
+    switch (V.getKind()) {
+#define BASIC_SVAL(Id, Parent)                                                 \
+  case SVal::Id##Kind:                                                         \
+    return derived().Visit##Id(V.castAs<Id>());
+#define LOC_SVAL(Id, Parent)                                                   \
+  case SVal::Loc##Id##Kind:                                                    \
+    return derived().Visit##Id(V.castAs<loc::Id>());
+#define NONLOC_SVAL(Id, Parent)                                                \
+  case SVal::NonLoc##Id##Kind:                                                 \
+    return derived().Visit##Id(V.castAs<nonloc::Id>());
 #include "clang/StaticAnalyzer/Core/PathSensitive/SVals.def"
-      }
-      llvm_unreachable("Unknown Loc sub-kind!");
-    case SVal::NonLocKind:
-      switch (V.getSubKind()) {
-#define NONLOC_SVAL(Id, Parent) \
-      case nonloc::Id ## Kind: DISPATCH(NonLoc ## Id, nonloc :: Id);
-#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.def"
-      }
-      llvm_unreachable("Unknown NonLoc sub-kind!");
     }
     llvm_unreachable("Unknown SVal kind!");
   }
 
-#define BASIC_SVAL(Id, Parent) \
-  RetTy Visit ## Id(Id V) { DISPATCH(Parent, Id); }
-#define ABSTRACT_SVAL(Id, Parent) \
-  BASIC_SVAL(Id, Parent)
-#define LOC_SVAL(Id, Parent) \
-  RetTy VisitLoc ## Id(loc::Id V) { DISPATCH(Parent, Parent); }
-#define NONLOC_SVAL(Id, Parent) \
-  RetTy VisitNonLoc ## Id(nonloc::Id V) { DISPATCH(Parent, Parent); }
+  // Dispatch to the more generic handler as a default implementation.
+#define BASIC_SVAL(Id, Parent)                                                 \
+  RetTy Visit##Id(Id V) { return derived().Visit##Parent(V.castAs<Id>()); }
+#define ABSTRACT_SVAL(Id, Parent) BASIC_SVAL(Id, Parent)
+#define LOC_SVAL(Id, Parent)                                                   \
+  RetTy Visit##Id(loc::Id V) {                                                 \
+    return derived().Visit##Parent(V.castAs<Parent>());                        \
+  }
+#define NONLOC_SVAL(Id, Parent)                                                \
+  RetTy Visit##Id(nonloc::Id V) {                                              \
+    return derived().Visit##Parent(V.castAs<Parent>());                        \
+  }
 #include "clang/StaticAnalyzer/Core/PathSensitive/SVals.def"
 
   // Base case, ignore it. :)
   RetTy VisitSVal(SVal V) { return RetTy(); }
-
-#undef DISPATCH
 };
 
 /// SymExprVisitor - this class implements a simple visitor for SymExpr
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.def b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.def
index eb05de6d9933342..36d2425d155a929 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.def
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.def
@@ -6,28 +6,24 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// The list of symbolic values (SVal kinds and sub-kinds) used in the Static
-// Analyzer. The distinction between loc:: and nonloc:: SVal namespaces is
+// The list of symbolic values (SVal kinds) used in the Static Analyzer.
+// The distinction between `loc::` and `nonloc::` SVal namespaces is
 // currently hardcoded, because it is too peculiar and explicit to be handled
 // uniformly. In order to use this information, users of this file must define
 // one or more of the following macros:
 //
-// BASIC_SVAL(Id, Parent) - for specific SVal sub-kinds, which are
-// neither in loc:: nor in nonloc:: namespace; these classes occupy
-// their own base kind IdKind.
+// BASIC_SVAL(Id, Parent) - for specific SVal kinds, which are
+// neither in `loc::` nor in `nonloc::` namespace.
 //
 // ABSTRACT_SVAL(Id, Parent) - for abstract SVal classes which are
-// neither in loc:: nor in nonloc:: namespace,
+// neither in `loc::` nor in `nonloc::` namespace,
 //
-// ABSTRACT_SVAL_WITH_KIND(Id, Parent) - for SVal classes which are also
-// neither in loc:: nor in nonloc:: namespace, but occupy a whole base kind
-// identifier IdKind, much like BASIC_SVALs.
+// LOC_SVAL(Id, Parent) - for values in `loc::` namespace.
 //
-// LOC_SVAL(Id, Parent) - for values in loc:: namespace, which occupy a sub-kind
-// loc::IdKind.
+// NONLOC_SVAL(Id, Parent) - for values in `nonloc::` namespace.
 //
-// NONLOC_SVAL(Id, Parent) - for values in nonloc:: namespace, which occupy a
-// sub-kind nonloc::IdKind.
+// SVAL_RANGE(Id, First, Last) - for defining range of subtypes of
+// the abstract class `Id`.
 //
 //===----------------------------------------------------------------------===//
 
@@ -39,10 +35,6 @@
 #define ABSTRACT_SVAL(Id, Parent)
 #endif
 
-#ifndef ABSTRACT_SVAL_WITH_KIND
-#define ABSTRACT_SVAL_WITH_KIND(Id, Parent) ABSTRACT_SVAL(Id, Parent)
-#endif
-
 #ifndef LOC_SVAL
 #define LOC_SVAL(Id, Parent)
 #endif
@@ -51,24 +43,30 @@
 #define NONLOC_SVAL(Id, Parent)
 #endif
 
+#ifndef SVAL_RANGE
+#define SVAL_RANGE(Id, First, Last)
+#endif
+
 BASIC_SVAL(UndefinedVal, SVal)
 ABSTRACT_SVAL(DefinedOrUnknownSVal, SVal)
   BASIC_SVAL(UnknownVal, DefinedOrUnknownSVal)
   ABSTRACT_SVAL(DefinedSVal, DefinedOrUnknownSVal)
-    ABSTRACT_SVAL_WITH_KIND(Loc, DefinedSVal)
+    ABSTRACT_SVAL(Loc, DefinedSVal)
       LOC_SVAL(ConcreteInt, Loc)
       LOC_SVAL(GotoLabel, Loc)
       LOC_SVAL(MemRegionVal, Loc)
-    ABSTRACT_SVAL_WITH_KIND(NonLoc, DefinedSVal)
+      SVAL_RANGE(Loc, ConcreteInt, MemRegionVal)
+    ABSTRACT_SVAL(NonLoc, DefinedSVal)
       NONLOC_SVAL(CompoundVal, NonLoc)
       NONLOC_SVAL(ConcreteInt, NonLoc)
       NONLOC_SVAL(LazyCompoundVal, NonLoc)
       NONLOC_SVAL(LocAsInteger, NonLoc)
       NONLOC_SVAL(SymbolVal, NonLoc)
       NONLOC_SVAL(PointerToMember, NonLoc)
+      SVAL_RANGE(NonLoc, CompoundVal, PointerToMember)
 
+#undef SVAL_RANGE
 #undef NONLOC_SVAL
 #undef LOC_SVAL
-#undef ABSTRACT_SVAL_WITH_KIND
 #undef ABSTRACT_SVAL
 #undef BASIC_SVAL
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
index 00cce21151a7073..78239d4060c47d0 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
@@ -18,6 +18,7 @@
 #include "clang/AST/Type.h"
 #include "clang/Basic/LLVM.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h"
+#include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/FoldingSet.h"
 #include "llvm/ADT/ImmutableList.h"
 #include "llvm/ADT/PointerUnion.h"
@@ -47,50 +48,30 @@ class PointerToMemberData;
 class SValBuilder;
 class TypedValueRegion;
 
-namespace nonloc {
-
-/// Sub-kinds for NonLoc values.
-enum Kind {
-#define NONLOC_SVAL(Id, Parent) Id ## Kind,
-#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.def"
-};
-
-} // namespace nonloc
-
-namespace loc {
-
-/// Sub-kinds for Loc values.
-enum Kind {
-#define LOC_SVAL(Id, Parent) Id ## Kind,
-#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.def"
-};
-
-} // namespace loc
-
 /// SVal - This represents a symbolic expression, which can be either
 ///  an L-value or an R-value.
 ///
 class SVal {
 public:
-  enum BaseKind {
-    // The enumerators must be representable using 2 bits.
-#define BASIC_SVAL(Id, Parent) Id ## Kind,
-#define ABSTRACT_SVAL_WITH_KIND(Id, Parent) Id ## Kind,
+  enum SValKind : unsigned char {
+#define BASIC_SVAL(Id, Parent) Id##Kind,
+#define LOC_SVAL(Id, Parent) Parent##Id##Kind,
+#define NONLOC_SVAL(Id, Parent) Parent##Id##Kind,
+#define SVAL_RANGE(Id, First, Last)                                            \
+  BEGIN_##Id = Id##First##Kind, END_##Id = Id##Last##Kind,
 #include "clang/StaticAnalyzer/Core/PathSensitive/SVals.def"
   };
-  enum { BaseBits = 2, BaseMask = 0b11 };
 
 protected:
   const void *Data = nullptr;
+  SValKind Kind = UndefinedValKind;
 
-  /// The lowest 2 bits are a BaseKind (0 -- 3).
-  ///  The higher bits are an unsigned "kind" value.
-  unsigned Kind = 0;
-
-  explicit SVal(const void *d, bool isLoc, unsigned ValKind)
-      : Data(d), Kind((isLoc ? LocKind : NonLocKind) | (ValKind << BaseBits)) {}
+  explicit SVal(SValKind Kind, const void *Data = nullptr)
+      : Data(Data), Kind(Kind) {}
 
-  explicit SVal(BaseKind k, const void *D = nullptr) : Data(D), Kind(k) {}
+  template <typename T> const T *castDataAs() const {
+    return static_cast<const T *>(Data);
+  }
 
 public:
   explicit SVal() = default;
@@ -105,38 +86,25 @@ class SVal {
     return llvm::dyn_cast<T>(*this);
   }
 
-  unsigned getRawKind() const { return Kind; }
-  BaseKind getBaseKind() const { return (BaseKind) (Kind & BaseMask); }
-  unsigned getSubKind() const { return Kind >> BaseBits; }
+  SValKind getKind() const { return Kind; }
 
   // This method is required for using SVal in a FoldingSetNode.  It
   // extracts a unique signature for this SVal object.
   void Profile(llvm::FoldingSetNodeID &ID) const {
-    ID.AddInteger((unsigned) getRawKind());
+    ID.AddInteger((unsigned)getKind());
     ID.AddPointer(Data);
   }
 
-  bool operator==(SVal R) const {
-    return getRawKind() == R.getRawKind() && Data == R.Data;
-  }
-
+  bool operator==(SVal R) const { return Kind == R.Kind && Data == R.Data; }
   bool operator!=(SVal R) const { return !(*this == R); }
 
-  bool isUnknown() const {
-    return getRawKind() == UnknownValKind;
-  }
+  bool isUnknown() const { return getKind() == UnknownValKind; }
 
-  bool isUndef() const {
-    return getRawKind() == UndefinedValKind;
-  }
+  bool isUndef() const { return getKind() == UndefinedValKind; }
 
-  bool isUnknownOrUndef() const {
-    return getRawKind() <= UnknownValKind;
-  }
+  bool isUnknownOrUndef() const { return isUnknown() || isUndef(); }
 
-  bool isValid() const {
-    return getRawKind() > UnknownValKind;
-  }
+  bool isValid() const { return !isUnknownOrUndef(); }
 
   bool isConstant() const;
 
@@ -207,10 +175,24 @@ inline raw_ostream &operator<<(raw_ostream &os, clang::ento::SVal V) {
   return os;
 }
 
+namespace nonloc {
+/// Sub-kinds for NonLoc values.
+#define NONLOC_SVAL(Id, Parent)                                                \
+  inline constexpr auto Id##Kind = SVal::SValKind::NonLoc##Id##Kind;
+#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.def"
+} // namespace nonloc
+
+namespace loc {
+/// Sub-kinds for Loc values.
+#define LOC_SVAL(Id, Parent)                                                   \
+  inline constexpr auto Id##Kind = SVal::SValKind::Loc##Id##Kind;
+#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.def"
+} // namespace loc
+
 class UndefinedVal : public SVal {
 public:
   UndefinedVal() : SVal(UndefinedValKind) {}
-  static bool classof(SVal V) { return V.getBaseKind() == UndefinedValKind; }
+  static bool classof(SVal V) { return V.getKind() == UndefinedValKind; }
 };
 
 class DefinedOrUnknownSVal : public SVal {
@@ -223,16 +205,15 @@ class DefinedOrUnknownSVal : public SVal {
   static bool classof(SVal V) { return !V.isUndef(); }
 
 protected:
-  explicit DefinedOrUnknownSVal(const void *d, bool isLoc, unsigned ValKind)
-      : SVal(d, isLoc, ValKind) {}
-  explicit DefinedOrUnknownSVal(BaseKind k, void *D = nullptr) : SVal(k, D) {}
+  explicit DefinedOrUnknownSVal(SValKind Kind, const void *Data = nullptr)
+      : SVal(Kind, Data) {}
 };
 
 class UnknownVal : public DefinedOrUnknownSVal {
 public:
   explicit UnknownVal() : DefinedOrUnknownSVal(UnknownValKind) {}
 
-  static bool classof(SVal V) { return V.getBaseKind() == UnknownValKind; }
+  static bool classof(SVal V) { return V.getKind() == UnknownValKind; }
 };
 
 class DefinedSVal : public DefinedOrUnknownSVal {
@@ -246,22 +227,21 @@ class DefinedSVal : public DefinedOrUnknownSVal {
   static bool classof(SVal V) { return !V.isUnknownOrUndef(); }
 
 protected:
-  explicit DefinedSVal(const void *d, bool isLoc, unsigned ValKind)
-      : DefinedOrUnknownSVal(d, isLoc, ValKind) {}
+  explicit DefinedSVal(SValKind Kind, const void *Data)
+      : DefinedOrUnknownSVal(Kind, Data) {}
 };
 
 /// Represents an SVal that is guaranteed to not be UnknownVal.
 class KnownSVal : public SVal {
 public:
-  KnownSVal(const DefinedSVal &V) : SVal(V) {}
-  KnownSVal(const UndefinedVal &V) : SVal(V) {}
+  /*implicit*/ KnownSVal(DefinedSVal V) : SVal(V) {}
+  /*implicit*/ KnownSVal(UndefinedVal V) : SVal(V) {}
   static bool classof(SVal V) { return !V.isUnknown(); }
 };
 
 class NonLoc : public DefinedSVal {
 protected:
-  explicit NonLoc(unsigned SubKind, const void *d)
-      : DefinedSVal(d, false, SubKind) {}
+  NonLoc(SValKind Kind, const void *Data) : DefinedSVal(Kind, Data) {}
 
 public:
   void dumpToStream(raw_ostream &Out) const;
@@ -271,13 +251,14 @@ class NonLoc : public DefinedSVal {
            T->isAnyComplexType() || T->isVectorType();
   }
 
-  static bool classof(SVal V) { return V.getBaseKind() == NonLocKind; }
+  static bool classof(SVal V) {
+    return BEGIN_NonLoc <= V.getKind() && V.getKind() <= END_NonLoc;
+  }
 };
 
 class Loc : public DefinedSVal {
 protected:
-  explicit Loc(unsigned SubKind, const void *D)
-      : DefinedSVal(const_cast<void *>(D), true, SubKind) {}
+  Loc(SValKind Kind, const void *Data) : DefinedSVal(Kind, Data) {}
 
 public:
   void dumpToStream(raw_ostream &Out) const;
@@ -287,7 +268,9 @@ class Loc : public DefinedSVal {
            T->isReferenceType() || T->isNullPtrType();
   }
 
-  static bool classof(SVal V) { return V.getBaseKind() == LocKind; }
+  static bool classof(SVal V) {
+    return BEGIN_Loc <= V.getKind() && V.getKind() <= END_Loc;
+  }
 };
 
 //==------------------------------------------------------------------------==//
@@ -300,9 +283,9 @@ namespace nonloc {
 class SymbolVal : public NonLoc {
 public:
   SymbolVal() = delete;
-  SymbolVal(SymbolRef sym) : NonLoc(SymbolValKind, sym) {
-    assert(sym);
-    assert(!Loc::isLocType(sym->getType()));
+  explicit SymbolVal(SymbolRef Sym) : NonLoc(SymbolValKind, Sym) {
+    assert(Sym);
+    assert(!Loc::isLocType(Sym->getType()));
   }
 
   LLVM_ATTRIBUTE_RETURNS_NONNULL
@@ -314,27 +297,17 @@ class SymbolVal : public NonLoc {
     return !isa<SymbolData>(getSymbol());
   }
 
-  static bool classof(SVal V) {
-    return V.getBaseKind() == NonLocKind && V.getSubKind() == SymbolValKind;
-  }
-
-  static bool classof(NonLoc V) { return V.getSubKind() == SymbolValKind; }
+  static bool classof(SVal V) { return V.getKind() == SymbolValKind; }
 };
 
 /// Value representing integer constant.
 class ConcreteInt : public NonLoc {
 public:
-  explicit ConcreteInt(const llvm::APSInt& V) : NonLoc(ConcreteIntKind, &V) {}
-
-  const llvm::APSInt& getValue() const {
-    return *static_cast<const llvm::APSInt *>(Data);
-  }
+  explicit ConcreteInt(const llvm::APSInt &V) : NonLoc(ConcreteIntKind, &V) {}
 
-  static bool classof(SVal V) {
-    return V.getBaseKind() == NonLocKind && V.getSubKind() == ConcreteIntKind;
-  }
+  const llvm::APSInt &getValue() const { return *castDataAs<llvm::APSInt>(); }
 
-  static bool classof(NonLoc V) { return V.getSubKind() == ConcreteIntKind; }
+  static bool classof(SVal V) { return V.getKind() == ConcreteIntKind; }
 };
 
 class LocAsInteger : public NonLoc {
@@ -344,29 +317,20 @@ class LocAsInteger : public NonLoc {
       : NonLoc(LocAsIntegerKind, &data) {
     // We do not need to represent loc::ConcreteInt as LocAsInteger,
     // as it'd collapse into a nonloc::ConcreteInt instead.
-    assert(data.first.getBaseKind() == LocKind &&
-           (data.first.getSubKind() == loc::MemRegionValKind ||
-            data.first.getSubKind() == loc::GotoLabelKind));
+    SValKind K = data.first.getKind();
+    assert(K == loc::MemRegionValKind || K == loc::GotoLabelKind);
   }
 
 public:
   Loc getLoc() const {
-    const std::pair<SVal, uintptr_t> *D =
-      static_cast<const std::pair<SVal, uintptr_t> *>(Data);
-    return D->first.castAs<Loc>();
+    return castDataAs<std::pair<SVal, uintptr_t>>()->first.castAs<Loc>();
   }
 
   unsigned getNumBits() const {
-    const std::pair<SVal, uintptr_t> *D =
-      static_cast<const std::pair<SVal, uintptr_t> *>(Data);
-    return D->second;
+    return castDataAs<std::pair<SVal, uintptr_t>>()->second;
   }
 
-  static bool classof(SVal V) {
-    return V.getBaseKind() == NonLocKind && V.getSubKind() == LocAsIntegerKind;
-  }
-
-  static bool classof(NonLoc V) { return V.getSubKind() == LocAsIntegerKind; }
+  static bool classof(SVal V) { return V.getKind() == LocAsIntegerKind; }
 };
 
 class CompoundVal : public NonLoc {
@@ -379,19 +343,14 @@ class CompoundVal : public NonLoc {
 public:
   LLVM_ATTRIBUTE_RETURNS_NONNULL
   const CompoundValData* getValue() const {
-    return static_cast<const CompoundValData *>(Data);
+    return castDataAs<CompoundValData>();
   }
 
   using iterator = llvm::ImmutableList<SVal>::iterator;
-
   iterator begin() const;
   iterator end() const;
 
-  static bool classof(SVal V) {
-    return V.getBaseKind() == NonLocKind ...
[truncated]

@steakhal
Copy link
Contributor Author

steakhal commented Nov 2, 2023

This PR relates to #69835 (comment).

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting packed unsigned Kind into non-packed SValKind Kind is definitely going to help debuggers to display the value correctly.

But I have to point out that this patch doesn't address the fact that const void* Data is not friendly to debuggers, especially with type information encoded in another member. So even with this patch applied, someone would still have to write (and maintain) a custom formatter on debugger side to display Data correctly.

If you refactor const void* Data to be a llvm::PointerUnion, then it will be picked up automatically (PointerUnion is a popular type, so I've already written a formatter for it.) Together with removing BaseBits from Kind and making the latter non-packed, SVal will have a debugger-friendly layout.

That said, debugger-friendliness is probably not among top priorities, so I'm not going to block this patch, if this is the direction you and reviewers want to take.

@steakhal
Copy link
Contributor Author

steakhal commented Nov 2, 2023

But I have to point out that this patch doesn't address the fact that const void* Data is not friendly to debuggers, especially with type information encoded in another member. So even with this patch applied, someone would still have to write (and maintain) a custom formatter on debugger side to display Data correctly.

That's correct. I'll think about how to resolve that as well.

If you refactor const void* Data to be a llvm::PointerUnion, then it will be picked up automatically (PointerUnion is a popular type, so I've already written a formatter for it.) Together with removing BaseBits from Kind and making the latter non-packed, SVal will have a debugger-friendly layout.

I've considered this but I found the number of alternatives too large to make it feasible. Consider that we have 11 possible SValKinds, which would require 4 bits to encode. Requiring all Data pointers to be aligned as such seems rough - although not impossible.

That said, debugger-friendliness is probably not among top priorities, so I'm not going to block this patch, if this is the direction you and reviewers want to take.

I'd say it's one baby step in that direction, but not the end of the journey. How about looking at this like that?

@Endilll
Copy link
Contributor

Endilll commented Nov 2, 2023

I've considered this but I found the number of alternatives too large to make it feasible. Consider that we have 11 possible SValKinds, which would require 4 bits to encode. Requiring all Data pointers to be aligned as such seems rough - although not impossible.

Sorry, I missed the part that you can have 11 different types there. alignas(16) might not be unreasonable, but it depends and requires consideration.

I'd say it's one baby step in that direction, but not the end of the journey. How about looking at this like that?

No objections to this point of view. I just wanted to make sure debugger struggles are brought up in the discussion and understood.

@steakhal
Copy link
Contributor Author

steakhal commented Nov 3, 2023

But I have to point out that this patch doesn't address the fact that const void* Data is not friendly to debuggers, especially with type information encoded in another member. So even with this patch applied, someone would still have to write (and maintain) a custom formatter on debugger side to display Data correctly.

If you refactor const void* Data to be a llvm::PointerUnion, then it will be picked up automatically (PointerUnion is a popular type, so I've already written a formatter for it.) Together with removing BaseBits from Kind and making the latter non-packed, SVal will have a debugger-friendly layout.

What other ways are to make a const void * debugger-friendly - other than llvm::PointerUnion?
I've considered using plain-old unions, but I found that approach pretty ugly, and I'm not even sure if it would help debugging experience.

@Endilll
Copy link
Contributor

Endilll commented Nov 3, 2023

What other ways are to make a const void * debugger-friendly - other than llvm::PointerUnion?

I'm not aware of any that wouldn't require writing a custom formatter. Ultimately it boils to how can I extract type information. For llvm::PointerUnion I'm matching discriminator against template argument list after jumping through some hoops to get discriminator as a number. If this patch allows me to extract type information based on prefix or suffix of Kind enumerator, then it's probably as easy as you can get it without a radical change. But read on, there's more to this.

I've considered using plain-old unions, but I found that approach pretty ugly, and I'm not even sure if it would help debugging experience.

Given I extracted type information from Kind, now I need a handle to type to cast the pointer to.

In order to declare a union, you have to specify type of every member. It's possible to extract type from such declaration, saving debugger the time to run global search for a type by name, which in my experience is about two orders of magnitudes more expensive operation than anything else formatter usually does (anecdotal evidence of 100 ms per search). So in order to keep debugger responsive with formatters enabled, I'm avoiding global searches as much as I can.

For debuggers having union with all possible types listed is strictly an improvement over status quo, but I get why this is ugly.

The goal of this patch is to refine how the `SVal` base and sub-kinds
are represented by forming one unified enum describing the possible SVals.
This means that the `unsigned SVal::Kind` and the attached bit-packing
semantics would be replaced by a single unified enum.
This is more conventional and leads to a better debugging experience by default.
This eases the need of using debug pretty-printers, or the use of
runtime functions doing the printing for us like we do today by calling
`Val.dump()` whenever we inspect the values.

Previously, the first 2 bits of the `unsigned SVal::Kind` discriminated
the following quartet: `UndefinedVal`, `UnknownVal`, `Loc`, or `NonLoc`.
The rest of the upper bits represented the sub-kind, where the value
represented the index among only the `Loc`s or `NonLoc`s, effectively
attaching 2 meanings of the upper bits depending on the base-kind.
We don't need to pack these bits, as we have plenty even if we would use
just a plan-old `unsigned char`.

Consequently, in this patch, I propose to lay out all the (non-abstract)
`SVal` kinds into a single enum, along with some metadata (`BEGIN_Loc`,
`END_Loc`, `BEGIN_NonLoc`, `END_NonLoc`) artificial enum values, similar
how we do with the `MemRegions`.

Note that in the unified `SVal::Kind` enum, to differentiate
`nonloc::ConcreteInt` from `loc::ConcreteInt`, I had to prefix them with
`Loc` and `NonLoc` to resolve this ambiguity.
This should not surface in general, because I'm replacing the
`nonloc::Kind` enum items with `inline constexpr` global constants to
mimic the original behavior - and offer nicer spelling to these enum
values.

Some `SVal` constructors were not marked explicit, which I now mark as
such to follow best practices, and marked others as `/*implicit*/` to
clarify the intent.
During refactoring, I also found at least one function not marked
`LLVM_ATTRIBUTE_RETURNS_NONNULL`, so I did that.

The `TypeRetrievingVisitor` visitor had some accidental dead code,
namely: `VisitNonLocConcreteInt` and `VisitLocConcreteInt`.

Previously, the `SValVisitor` expected visit handlers of
`VisitNonLocXXXXX(nonloc::XXXXX)` and `VisitLocXXXXX(loc::XXXXX)`, where
I felt that envoding `NonLoc` and `Loc` in the name is not necessary as
the type of the parameter would select the right overload anyways, so I
simplified the naming of those visit functions.

The rest of the diff is a lot of times just formatting, because
`getKind()` by nature, frequently appears in switches, which means that
the whole switch gets automatically reformatted. I could probably undo
the formatting, but I didn't want to deviate from the rule unless
explicitly requested.
@steakhal steakhal merged commit bde5717 into llvm:main Nov 4, 2023
@steakhal steakhal deleted the remove-sval-kind-separation branch January 19, 2024 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants