Skip to content

[analyzer] Refactor CallDescription match mode (NFC) #83432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 4, 2024

Conversation

NagyDonat
Copy link
Contributor

@NagyDonat NagyDonat commented Feb 29, 2024

The class CallDescription is used to define patterns that are used for matching CallEvents. For example, a CallDescription{{"std", "find_if"}, 3} matches a call to std::find_if with 3 arguments.

However, these patterns are somewhat fuzzy, so this pattern could also match something like std::__1::find_if (with an additional namespace layer), or, unfortunately, a CallDescription for the well-known function free() can match a C++ method named free(): #81597

To prevent this kind of ambiguity this commit introduces the enum CallDescription::Mode which can limit the pattern matching to non-method function calls (or method calls etc.). After this NFC change, one or more follow-up commits will apply the right pattern matching modes in the ~30 checkers that use CallDescriptions.

Note that CallDescription previously had a Flags field which had only two supported values:

  • CDF_None was the default "match anything" mode,
  • CDF_MaybeBuiltin was a "match only C library functions and accept some inexact matches" mode. This commit preserves CDF_MaybeBuiltin under the more descriptive name CallDescription::Mode::CLibrary (or CDM::CLibrary).

Instead of this "Flags" model I'm switching to a plain enumeration becasue I don't think that there is a natural usecase to combine the different matching modes. (Except for the default "match anything" mode, which is currently kept for compatibility, but will be phased out in the follow-up commits.)

The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a `CallEvent{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.

However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
llvm#81597

To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.

Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
 - CDF_None was the default "match anything" mode,
 - CDF_MaybeBuiltin was a "match only C library functions and accept
   some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive name
`CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).

Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Feb 29, 2024
@llvmbot
Copy link
Member

llvmbot commented Feb 29, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: None (NagyDonat)

Changes

The class CallDescription is used to define patterns that are used for matching CallEvents. For example, a CallEvent{{"std", "find_if"}, 3} matches a call to std::find_if with 3 arguments.

However, these patterns are somewhat fuzzy, so this pattern could also match something like std::__1::find_if (with an additional namespace layer), or, unfortunately, a CallDescription for the well-known function free() can match a C++ method named free(): #81597

To prevent this kind of ambiguity this commit introduces the enum CallDescription::Mode which can limit the pattern matching to non-method function calls (or method calls etc.). After this NFC change, one or more follow-up commits will apply the right pattern matching modes in the ~30 checkers that use CallDescriptions.

Note that CallDescription previously had a Flags field which had only two supported values:

  • CDF_None was the default "match anything" mode,
  • CDF_MaybeBuiltin was a "match only C library functions and accept some inexact matches" mode. This commit preserves CDF_MaybeBuiltin under the more descriptive name CallDescription::Mode::CLibrary (or CDM::CLibrary).

Instead of this "Flags" model I'm switching to a plain enumeration becasue I don't think that there is a natural usecase to combine the different matching modes. (Except for the default "match anything" mode, which is currently kept for compatibility, but will be phased out in the follow-up commits.)


Patch is 21.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83432.diff

6 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h (+54-18)
  • (modified) clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp (+32-32)
  • (modified) clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp (+22-22)
  • (modified) clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp (+4-4)
  • (modified) clang/lib/StaticAnalyzer/Core/CallDescription.cpp (+13-6)
  • (modified) clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp (+8-8)
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
index 965838a4408c23..d4985238a3c73b 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
@@ -27,20 +27,46 @@ class IdentifierInfo;
 
 namespace clang {
 namespace ento {
-
-enum CallDescriptionFlags : unsigned {
-  CDF_None = 0,
-
-  /// Describes a C standard function that is sometimes implemented as a macro
-  /// that expands to a compiler builtin with some __builtin prefix.
-  /// The builtin may as well have a few extra arguments on top of the requested
-  /// number of arguments.
-  CDF_MaybeBuiltin = 1 << 0,
-};
-
-/// This class represents a description of a function call using the number of
-/// arguments and the name of the function.
+/// A `CallDescription` is a pattern that can be used to _match_ calls
+/// based on the qualified name and the argument/parameter counts.
 class CallDescription {
+public:
+  enum class Mode {
+    /// Match calls to functions from the C standard library. On some platforms
+    /// some functions may be implemented as macros that expand to calls to
+    /// built-in variants of the given functions, so in this mode we use some
+    /// heuristics to recognize these implementation-defined variants:
+    ///  - We also accept calls where the name is derived from the specified
+    ///    name by adding "__builtin" or similar prefixes/suffixes.
+    ///  - We also accept calls where the number of arguments or parameters is
+    ///    greater than the specified value.
+    /// For the exact heuristics, see CheckerContext::isCLibraryFunction().
+    /// Note that functions whose declaration context is not a TU (e.g.
+    /// methods, functions in namespaces) are not accepted as C library functions.
+    /// FIXME: If I understand it correctly, this discards calls where C++ code
+    /// refers a C library function through the namespace `std::` via headers
+    /// like <cstdlib>.
+    CLibrary,
+
+    /// Matches "simple" functions that are not methods. (Static methods are
+    /// methods.)
+    SimpleFunc,
+
+    /// Matches a C+ method (may be static, may be virtual, may be an
+    /// overloaded operator, a constructor or a destructor).
+    CXXMethod,
+
+    /// Match any CallEvent that is not an ObjCMethodCall.
+    /// FIXME: Previously this was the default behavior of CallDescription, but
+    /// its use should be replaced by a more specific mode almost everywhere.
+    Unspecified,
+
+    /// FIXME: Add support for ObjCMethodCall events (I'm not adding it because
+    /// I'm not familiar with Objective-C). Note that currently an early return
+    /// in `bool matches(const CallEvent &Call) const;` discards all
+    /// Objective-C method calls.
+  };
+private:
   friend class CallEvent;
   using MaybeCount = std::optional<unsigned>;
 
@@ -50,20 +76,26 @@ class CallDescription {
   std::vector<std::string> QualifiedName;
   MaybeCount RequiredArgs;
   MaybeCount RequiredParams;
-  int Flags;
+  Mode MatchAs;
 
 public:
   /// Constructs a CallDescription object.
   ///
+  /// @param MatchAs Specifies the kind of the call that should be matched.
+  ///
   /// @param QualifiedName The list of the name qualifiers of the function that
   /// will be matched. The user is allowed to skip any of the qualifiers.
   /// For example, {"std", "basic_string", "c_str"} would match both
   /// std::basic_string<...>::c_str() and std::__1::basic_string<...>::c_str().
   ///
-  /// @param RequiredArgs The number of arguments that is expected to match a
-  /// call. Omit this parameter to match every occurrence of call with a given
-  /// name regardless the number of arguments.
-  CallDescription(CallDescriptionFlags Flags, ArrayRef<StringRef> QualifiedName,
+  /// @param RequiredArgs The expected number of arguments that are passed to
+  /// the function. Omit this parameter (or pass std::nullopt) to match every
+  /// occurrence without checking the argument count in the call.
+  ///
+  /// @param RequiredParams The expected number of parameters in the function
+  /// definition that is called. Omit this parameter to match every occurrence
+  /// without checking the parameter count in the definition.
+  CallDescription(Mode MatchAs, ArrayRef<StringRef> QualifiedName,
                   MaybeCount RequiredArgs = std::nullopt,
                   MaybeCount RequiredParams = std::nullopt);
 
@@ -222,6 +254,10 @@ template <typename T> class CallDescriptionMap {
   }
 };
 
+/// Enumerators of this enum class are used to construct CallDescription
+/// objects; in that context the fully qualified name is needlessly verbose.
+using CDM = CallDescription::Mode;
+
 /// An immutable set of CallDescriptions.
 /// Checkers can efficiently decide if a given CallEvent matches any
 /// CallDescription in the set.
diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index b7b64c3da4f6c8..cd49a44a748a9c 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -124,48 +124,48 @@ class CStringChecker : public Checker< eval::Call,
                                      const CallEvent &)>;
 
   CallDescriptionMap<FnCheck> Callbacks = {
-      {{CDF_MaybeBuiltin, {"memcpy"}, 3},
+      {{CDM::CLibrary, {"memcpy"}, 3},
        std::bind(&CStringChecker::evalMemcpy, _1, _2, _3, CK_Regular)},
-      {{CDF_MaybeBuiltin, {"wmemcpy"}, 3},
+      {{CDM::CLibrary, {"wmemcpy"}, 3},
        std::bind(&CStringChecker::evalMemcpy, _1, _2, _3, CK_Wide)},
-      {{CDF_MaybeBuiltin, {"mempcpy"}, 3},
+      {{CDM::CLibrary, {"mempcpy"}, 3},
        std::bind(&CStringChecker::evalMempcpy, _1, _2, _3, CK_Regular)},
-      {{CDF_None, {"wmempcpy"}, 3},
+      {{CDM::Unspecified, {"wmempcpy"}, 3},
        std::bind(&CStringChecker::evalMempcpy, _1, _2, _3, CK_Wide)},
-      {{CDF_MaybeBuiltin, {"memcmp"}, 3},
+      {{CDM::CLibrary, {"memcmp"}, 3},
        std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Regular)},
-      {{CDF_MaybeBuiltin, {"wmemcmp"}, 3},
+      {{CDM::CLibrary, {"wmemcmp"}, 3},
        std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Wide)},
-      {{CDF_MaybeBuiltin, {"memmove"}, 3},
+      {{CDM::CLibrary, {"memmove"}, 3},
        std::bind(&CStringChecker::evalMemmove, _1, _2, _3, CK_Regular)},
-      {{CDF_MaybeBuiltin, {"wmemmove"}, 3},
+      {{CDM::CLibrary, {"wmemmove"}, 3},
        std::bind(&CStringChecker::evalMemmove, _1, _2, _3, CK_Wide)},
-      {{CDF_MaybeBuiltin, {"memset"}, 3}, &CStringChecker::evalMemset},
-      {{CDF_MaybeBuiltin, {"explicit_memset"}, 3}, &CStringChecker::evalMemset},
-      {{CDF_MaybeBuiltin, {"strcpy"}, 2}, &CStringChecker::evalStrcpy},
-      {{CDF_MaybeBuiltin, {"strncpy"}, 3}, &CStringChecker::evalStrncpy},
-      {{CDF_MaybeBuiltin, {"stpcpy"}, 2}, &CStringChecker::evalStpcpy},
-      {{CDF_MaybeBuiltin, {"strlcpy"}, 3}, &CStringChecker::evalStrlcpy},
-      {{CDF_MaybeBuiltin, {"strcat"}, 2}, &CStringChecker::evalStrcat},
-      {{CDF_MaybeBuiltin, {"strncat"}, 3}, &CStringChecker::evalStrncat},
-      {{CDF_MaybeBuiltin, {"strlcat"}, 3}, &CStringChecker::evalStrlcat},
-      {{CDF_MaybeBuiltin, {"strlen"}, 1}, &CStringChecker::evalstrLength},
-      {{CDF_MaybeBuiltin, {"wcslen"}, 1}, &CStringChecker::evalstrLength},
-      {{CDF_MaybeBuiltin, {"strnlen"}, 2}, &CStringChecker::evalstrnLength},
-      {{CDF_MaybeBuiltin, {"wcsnlen"}, 2}, &CStringChecker::evalstrnLength},
-      {{CDF_MaybeBuiltin, {"strcmp"}, 2}, &CStringChecker::evalStrcmp},
-      {{CDF_MaybeBuiltin, {"strncmp"}, 3}, &CStringChecker::evalStrncmp},
-      {{CDF_MaybeBuiltin, {"strcasecmp"}, 2}, &CStringChecker::evalStrcasecmp},
-      {{CDF_MaybeBuiltin, {"strncasecmp"}, 3},
+      {{CDM::CLibrary, {"memset"}, 3}, &CStringChecker::evalMemset},
+      {{CDM::CLibrary, {"explicit_memset"}, 3}, &CStringChecker::evalMemset},
+      {{CDM::CLibrary, {"strcpy"}, 2}, &CStringChecker::evalStrcpy},
+      {{CDM::CLibrary, {"strncpy"}, 3}, &CStringChecker::evalStrncpy},
+      {{CDM::CLibrary, {"stpcpy"}, 2}, &CStringChecker::evalStpcpy},
+      {{CDM::CLibrary, {"strlcpy"}, 3}, &CStringChecker::evalStrlcpy},
+      {{CDM::CLibrary, {"strcat"}, 2}, &CStringChecker::evalStrcat},
+      {{CDM::CLibrary, {"strncat"}, 3}, &CStringChecker::evalStrncat},
+      {{CDM::CLibrary, {"strlcat"}, 3}, &CStringChecker::evalStrlcat},
+      {{CDM::CLibrary, {"strlen"}, 1}, &CStringChecker::evalstrLength},
+      {{CDM::CLibrary, {"wcslen"}, 1}, &CStringChecker::evalstrLength},
+      {{CDM::CLibrary, {"strnlen"}, 2}, &CStringChecker::evalstrnLength},
+      {{CDM::CLibrary, {"wcsnlen"}, 2}, &CStringChecker::evalstrnLength},
+      {{CDM::CLibrary, {"strcmp"}, 2}, &CStringChecker::evalStrcmp},
+      {{CDM::CLibrary, {"strncmp"}, 3}, &CStringChecker::evalStrncmp},
+      {{CDM::CLibrary, {"strcasecmp"}, 2}, &CStringChecker::evalStrcasecmp},
+      {{CDM::CLibrary, {"strncasecmp"}, 3},
        &CStringChecker::evalStrncasecmp},
-      {{CDF_MaybeBuiltin, {"strsep"}, 2}, &CStringChecker::evalStrsep},
-      {{CDF_MaybeBuiltin, {"bcopy"}, 3}, &CStringChecker::evalBcopy},
-      {{CDF_MaybeBuiltin, {"bcmp"}, 3},
+      {{CDM::CLibrary, {"strsep"}, 2}, &CStringChecker::evalStrsep},
+      {{CDM::CLibrary, {"bcopy"}, 3}, &CStringChecker::evalBcopy},
+      {{CDM::CLibrary, {"bcmp"}, 3},
        std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Regular)},
-      {{CDF_MaybeBuiltin, {"bzero"}, 2}, &CStringChecker::evalBzero},
-      {{CDF_MaybeBuiltin, {"explicit_bzero"}, 2}, &CStringChecker::evalBzero},
-      {{CDF_MaybeBuiltin, {"sprintf"}, 2}, &CStringChecker::evalSprintf},
-      {{CDF_MaybeBuiltin, {"snprintf"}, 2}, &CStringChecker::evalSnprintf},
+      {{CDM::CLibrary, {"bzero"}, 2}, &CStringChecker::evalBzero},
+      {{CDM::CLibrary, {"explicit_bzero"}, 2}, &CStringChecker::evalBzero},
+      {{CDM::CLibrary, {"sprintf"}, 2}, &CStringChecker::evalSprintf},
+      {{CDM::CLibrary, {"snprintf"}, 2}, &CStringChecker::evalSnprintf},
   };
 
   // These require a bit of special handling.
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index 4ceaf933d0bfc8..55e5877d694b73 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -718,28 +718,28 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
       {{{"isupper"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{{"isxdigit"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
 
-      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrncat)}},
+      {{CDM::CLibrary, {BI.getName(Builtin::BIstrncat)}},
        TR::Prop({{1, 2}}, {{0, ReturnValueIndex}})},
-      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrlcpy)}},
+      {{CDM::CLibrary, {BI.getName(Builtin::BIstrlcpy)}},
        TR::Prop({{1, 2}}, {{0}})},
-      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrlcat)}},
+      {{CDM::CLibrary, {BI.getName(Builtin::BIstrlcat)}},
        TR::Prop({{1, 2}}, {{0}})},
-      {{CDF_MaybeBuiltin, {{"snprintf"}}},
+      {{CDM::CLibrary, {{"snprintf"}}},
        TR::Prop({{1}, 3}, {{0, ReturnValueIndex}})},
-      {{CDF_MaybeBuiltin, {{"sprintf"}}},
+      {{CDM::CLibrary, {{"sprintf"}}},
        TR::Prop({{1}, 2}, {{0, ReturnValueIndex}})},
-      {{CDF_MaybeBuiltin, {{"strcpy"}}},
+      {{CDM::CLibrary, {{"strcpy"}}},
        TR::Prop({{1}}, {{0, ReturnValueIndex}})},
-      {{CDF_MaybeBuiltin, {{"stpcpy"}}},
+      {{CDM::CLibrary, {{"stpcpy"}}},
        TR::Prop({{1}}, {{0, ReturnValueIndex}})},
-      {{CDF_MaybeBuiltin, {{"strcat"}}},
+      {{CDM::CLibrary, {{"strcat"}}},
        TR::Prop({{1}}, {{0, ReturnValueIndex}})},
-      {{CDF_MaybeBuiltin, {{"wcsncat"}}},
+      {{CDM::CLibrary, {{"wcsncat"}}},
        TR::Prop({{1}}, {{0, ReturnValueIndex}})},
-      {{CDF_MaybeBuiltin, {{"strdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-      {{CDF_MaybeBuiltin, {{"strdupa"}}},
+      {{CDM::CLibrary, {{"strdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{CDM::CLibrary, {{"strdupa"}}},
        TR::Prop({{0}}, {{ReturnValueIndex}})},
-      {{CDF_MaybeBuiltin, {{"wcsdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{CDM::CLibrary, {{"wcsdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
 
       // Sinks
       {{{"system"}}, TR::Sink({{0}}, MsgSanitizeSystemArgs)},
@@ -753,31 +753,31 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
       {{{"execvp"}}, TR::Sink({{0, 1}}, MsgSanitizeSystemArgs)},
       {{{"execvpe"}}, TR::Sink({{0, 1, 2}}, MsgSanitizeSystemArgs)},
       {{{"dlopen"}}, TR::Sink({{0}}, MsgSanitizeSystemArgs)},
-      {{CDF_MaybeBuiltin, {{"malloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"calloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"alloca"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"memccpy"}}},
+      {{CDM::CLibrary, {{"malloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
+      {{CDM::CLibrary, {{"calloc"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
+      {{CDM::CLibrary, {{"alloca"}}}, TR::Sink({{0}}, MsgTaintedBufferSize)},
+      {{CDM::CLibrary, {{"memccpy"}}},
        TR::Sink({{3}}, MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"realloc"}}},
+      {{CDM::CLibrary, {{"realloc"}}},
        TR::Sink({{1}}, MsgTaintedBufferSize)},
       {{{{"setproctitle"}}}, TR::Sink({{0}, 1}, MsgUncontrolledFormatString)},
       {{{{"setproctitle_fast"}}},
        TR::Sink({{0}, 1}, MsgUncontrolledFormatString)},
 
       // SinkProps
-      {{CDF_MaybeBuiltin, BI.getName(Builtin::BImemcpy)},
+      {{CDM::CLibrary, BI.getName(Builtin::BImemcpy)},
        TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
                     MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BImemmove)}},
+      {{CDM::CLibrary, {BI.getName(Builtin::BImemmove)}},
        TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
                     MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrncpy)}},
+      {{CDM::CLibrary, {BI.getName(Builtin::BIstrncpy)}},
        TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
                     MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {BI.getName(Builtin::BIstrndup)}},
+      {{CDM::CLibrary, {BI.getName(Builtin::BIstrndup)}},
        TR::SinkProp({{1}}, {{0, 1}}, {{ReturnValueIndex}},
                     MsgTaintedBufferSize)},
-      {{CDF_MaybeBuiltin, {{"bcopy"}}},
+      {{CDM::CLibrary, {{"bcopy"}}},
        TR::SinkProp({{2}}, {{0, 2}}, {{1}}, MsgTaintedBufferSize)}};
 
   // `getenv` returns taint only in untrusted environments.
diff --git a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
index 79ab05f2c7866a..b27ca6a4495974 100644
--- a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
@@ -410,13 +410,13 @@ class MallocChecker
       {{{"malloc"}, 3}, &MallocChecker::checkKernelMalloc},
       {{{"calloc"}, 2}, &MallocChecker::checkCalloc},
       {{{"valloc"}, 1}, &MallocChecker::checkBasicAlloc},
-      {{CDF_MaybeBuiltin, {"strndup"}, 2}, &MallocChecker::checkStrdup},
-      {{CDF_MaybeBuiltin, {"strdup"}, 1}, &MallocChecker::checkStrdup},
+      {{CDM::CLibrary, {"strndup"}, 2}, &MallocChecker::checkStrdup},
+      {{CDM::CLibrary, {"strdup"}, 1}, &MallocChecker::checkStrdup},
       {{{"_strdup"}, 1}, &MallocChecker::checkStrdup},
       {{{"kmalloc"}, 2}, &MallocChecker::checkKernelMalloc},
       {{{"if_nameindex"}, 1}, &MallocChecker::checkIfNameIndex},
-      {{CDF_MaybeBuiltin, {"wcsdup"}, 1}, &MallocChecker::checkStrdup},
-      {{CDF_MaybeBuiltin, {"_wcsdup"}, 1}, &MallocChecker::checkStrdup},
+      {{CDM::CLibrary, {"wcsdup"}, 1}, &MallocChecker::checkStrdup},
+      {{CDM::CLibrary, {"_wcsdup"}, 1}, &MallocChecker::checkStrdup},
       {{{"g_malloc"}, 1}, &MallocChecker::checkBasicAlloc},
       {{{"g_malloc0"}, 1}, &MallocChecker::checkGMalloc0},
       {{{"g_try_malloc"}, 1}, &MallocChecker::checkBasicAlloc},
diff --git a/clang/lib/StaticAnalyzer/Core/CallDescription.cpp b/clang/lib/StaticAnalyzer/Core/CallDescription.cpp
index 94b2fde0a6f395..f1b632ea3d53bb 100644
--- a/clang/lib/StaticAnalyzer/Core/CallDescription.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CallDescription.cpp
@@ -35,13 +35,13 @@ static MaybeCount readRequiredParams(MaybeCount RequiredArgs,
   return std::nullopt;
 }
 
-ento::CallDescription::CallDescription(CallDescriptionFlags Flags,
+ento::CallDescription::CallDescription(Mode MatchAs,
                                        ArrayRef<StringRef> QualifiedName,
                                        MaybeCount RequiredArgs /*= None*/,
                                        MaybeCount RequiredParams /*= None*/)
     : RequiredArgs(RequiredArgs),
       RequiredParams(readRequiredParams(RequiredArgs, RequiredParams)),
-      Flags(Flags) {
+      MatchAs(MatchAs) {
   assert(!QualifiedName.empty());
   this->QualifiedName.reserve(QualifiedName.size());
   llvm::transform(QualifiedName, std::back_inserter(this->QualifiedName),
@@ -52,7 +52,7 @@ ento::CallDescription::CallDescription(CallDescriptionFlags Flags,
 ento::CallDescription::CallDescription(ArrayRef<StringRef> QualifiedName,
                                        MaybeCount RequiredArgs /*= None*/,
                                        MaybeCount RequiredParams /*= None*/)
-    : CallDescription(CDF_None, QualifiedName, RequiredArgs, RequiredParams) {}
+    : CallDescription(Mode::Unspecified, QualifiedName, RequiredArgs, RequiredParams) {}
 
 bool ento::CallDescription::matches(const CallEvent &Call) const {
   // FIXME: Add ObjC Message support.
@@ -74,14 +74,21 @@ bool ento::CallDescription::matchesAsWritten(const CallExpr &CE) const {
   return matchesImpl(FD, CE.getNumArgs(), FD->param_size());
 }
 
-bool ento::CallDescription::matchesImpl(const FunctionDecl *Callee,
+bool ento::CallDescription::matchesImpl(const FunctionDecl *FD,
                                         size_t ArgCount,
                                         size_t ParamCount) const {
-  const auto *FD = Callee;
   if (!FD)
     return false;
 
-  if (Flags & CDF_MaybeBuiltin) {
+  const bool isMethod = isa<CXXMethodDecl>(FD);
+
+  if (MatchAs == Mode::SimpleFunc && isMethod)
+    return false;
+
+  if (MatchAs == Mode::CXXMethod && !isMethod)
+    return false;
+
+  if (MatchAs == Mode::CLibrary) {
     return CheckerContext::isCLibraryFunction(FD, getFunctionName()) &&
            (!RequiredArgs || *RequiredArgs <= ArgCount) &&
            (!RequiredParams || *RequiredParams <= ParamCount);
diff --git a/clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp b/clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
index e8b237b891ca8b..fa1cf72fd75135 100644
--- a/clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
+++ b/clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
@@ -488,11 +488,11 @@ TEST(CallDescription, NegativeMatchQualifiedNames) {
 }
 
 TEST(CallDescription, MatchBuiltins) {
-  // Test CDF_MaybeBuiltin - a flag that allows matching weird builtins.
+  // Test CDM::CLibrary - a flag that allows matching weird builtins.
   EXPECT_TRUE(tooling::runToolOnCode(
       std::unique_ptr<FrontendAction>(new CallDescriptionAction<>(
           {{{{"memset"}, 3}, false},
-           {{CDF_MaybeBuiltin, {"memset"}, 3}, true}})),
+           {{CDM::CLibrary, {"memset"}, 3}, true}})),
       "void foo() {"
       "  int x;"
       "  __builtin___memset_chk(&x, 0, sizeof(x),"
@@ -503,8 +503,8 @@ TEST(CallDescription, MatchBuiltins) {
     SCOPED_TRACE("multiple similar builtins");
     EXPECT_TRUE(tooling::runToolOnCode(
         std::unique_ptr<FrontendAction>(new CallDescriptionAction<>(
-            {{{CDF_MaybeBuiltin, {"memcpy"}, 3}, false},
-             {{CDF_MaybeBuiltin, {"wmemcpy"}, 3}, true}})),
+            {{{CDM::CLibrary, {"memcpy"}, 3}, false},
+             {{CDM::CLibrary, {"wmemcpy"}, 3}, true}})),
         R"(void foo(wchar_t *x, wchar_t *y) {
             __builtin_wmemcpy(x, y, sizeof(wchar_t));
  ...
[truncated]

Copy link

github-actions bot commented Feb 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@NagyDonat NagyDonat requested a review from gamesh411 March 1, 2024 09:54
Copy link
Collaborator

@balazske balazske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag approach can probably make a sense for namespace handling (match the exact specified namespace, or allow a prefix before, or even something in between).

/// Note that functions whose declaration context is not a TU (e.g.
/// methods, functions in namespaces) are not accepted as C library
/// functions.
/// FIXME: If I understand it correctly, this discards calls where C++ code
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A FIXME comment should not be a documentation (only // comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I agree, but in this particular case I think this FIXME note is an important disclaimer and without it the auto-generated documentation would be misleading.

/// - We also accept calls where the name is derived from the specified
/// name by adding "__builtin" or similar prefixes/suffixes.
/// - We also accept calls where the number of arguments or parameters is
/// greater than the specified value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check how this looks in doxygen documentation (produces a list?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that it does, but I don't know how to generate the doxygen docs and I feel that messing with them is a waste of time.

/// Match any CallEvent that is not an ObjCMethodCall.
/// FIXME: Previously this was the default behavior of CallDescription, but
/// its use should be replaced by a more specific mode almost everywhere.
Unspecified,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have a better name like "Any" (or an other), "Unspecified" can mean that it works in a default way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I also wanted to name it "Any", but I discarded that idea because it doesn't match the Objective-C method calls. (And AFAIK Objective-C method calls are very weird stuff, whose AST/CallEvent representation is significantly different from the "normal" function/method calls, so I don't want to extend CallDescription support for them.)

I'd say that reading Unspecified as "works in a default way" is also fine for this enumerator, as this is the old default mode that should be eventually eliminated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was really that we can think that "Unspecified" means one from the other modes is used. But this is likely not a big problem.

Copy link
Contributor

@gamesh411 gamesh411 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we go with the enumerator name 'Default', as the name of the enum is Mode, so it would be easily understood that this is something that comes out of the box.
(Those interested in how or what it matches will look at the docs of this enum anyway. I find this to be a conventional and non-surprising nomenclature.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked the somewhat scary name "Unspecified" for the "match any kind of call" enumerator because it's too vague for practical use (if a checker wants to match something, it will practically always know if it's a function or a method) and I hope that I can eliminate its default-ness in follow-up commits.

/// FIXME: If I understand it correctly, this discards calls where C++ code
/// refers a C library function through the namespace `std::` via headers
/// like <cstdlib>.
CLibrary,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLibrary may not be the best name for this, because many functions that are "C library" calls are not matched with this mode. Probably MaybeBuiltin can remain for this mode. (It is another question if there are similar builtin things in C++ or special STL functions that we want to match.)

Copy link
Contributor Author

@NagyDonat NagyDonat Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many functions that are "C library" calls are not matched with this mode

I'm planning to ensure that:

  • this mode works "out of the box" for matching the C standard library functions and
  • the checkers that are looking for a C standard library function do use it.

This is an NFC commit so I didn't change the behavior of this mode yet, but I'm planning to do so in a followup commit. I picked the name CLibrary because it reflects the name of the method isCLibraryFunction() which is used to implement this mode. I'd say that if isCLibraryFunction() returns false for a C library function, then it's buggy and I'll fix its behavior.

Do you see any difficulties that would hinder or block this path?

By the way I dislike the name MaybeBuiltin because at first I thought that that it clearly meant "this may be a builtin function (with a modified name), but it may be anything else as well", while in fact it always calls isCLibraryFunction (and doesn't accept anything that isn't a C library function).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good if it can be used for any C library (or like) function. I saw cases at MallocChecker where it is not used, then it should be used in these cases too. If for some reason we want to find C library functions that are not "builtin" another mode (the "Unspecified") can be used. Probably it is OK if later this mode can find functions in the std:: namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw cases at MallocChecker where it is not used, then it should be used in these cases too.
In fact that was the original reason why I started to work on CallDescriptions: there was an open source bug report (#81597) where MallocChecker recognized a C++ method named free as if it was the well-known function free.

Now I realize that it would've been possible to resolve that bug by just adding CDF_MaybeBuiltin to the CallDescription for free, but I was misled by the name CDF_MaybeBuiltin and thought that I need to introduce a new matching mode that says "must be function".

After merging this commit, I'll quickly do a followup commit that fixes MallocChecker, but then I'll continue with cleaning up the other checkers as well.

@NagyDonat
Copy link
Contributor Author

The flag approach can probably make a sense for namespace handling (match the exact specified namespace, or allow a prefix before, or even something in between).

Even in that case, I'd prefer a separate second parameter (that's either boolean or a different enum). Squeezing unrelated things into the same flag only makes sense if (1) memory use is strongly limited (2) there would be too many separate parameters.

@gamesh411
Copy link
Contributor

The flag approach can probably make a sense for namespace handling (match the exact specified namespace, or allow a prefix before, or even something in between).

Even in that case, I'd prefer a separate second parameter (that's either boolean or a different enum). Squeezing unrelated things into the same flag only makes sense if (1) memory use is strongly limited (2) there would be too many separate parameters.

I lean towards the separate enum for the namespace handling scenario even if it means adding another enum to the constructor, or maybe packing the two enums into a configuration object because these are separate concerns.
I left a suggestion inline as well, but otherwise, LGTM.

/// Match any CallEvent that is not an ObjCMethodCall.
/// FIXME: Previously this was the default behavior of CallDescription, but
/// its use should be replaced by a more specific mode almost everywhere.
Unspecified,
Copy link
Contributor

@gamesh411 gamesh411 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we go with the enumerator name 'Default', as the name of the enum is Mode, so it would be easily understood that this is something that comes out of the box.
(Those interested in how or what it matches will look at the docs of this enum anyway. I find this to be a conventional and non-surprising nomenclature.)

@NagyDonat NagyDonat merged commit 52a460f into llvm:main Mar 4, 2024
@NagyDonat NagyDonat deleted the CallDescription_NFC_cleanup branch March 4, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants