Skip to content

[clang] Implement __attribute__((format_matches)) #116708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 25, 2025

Conversation

apple-fcloutier
Copy link
Contributor

This implements __attribute__((format_matches)), as described in the RFC: https://discourse.llvm.org/t/rfc-format-attribute-attribute-format-like/83076

The format attribute only allows the compiler to check that a format string matches its arguments. If the format string is passed independently of its arguments, there is no way to have the compiler check it. format_matches(flavor, fmtidx, example) allows the compiler to check format strings against the example format string instead of against format arguments. See the changes to AttrDocs.td in this diff for more information.

Implementation-wise, this change subclasses CheckPrintfHandler and CheckScanfHandler to allow them to collect specifiers into arrays, and implements comparing that two specifiers are equivalent. checkFormatStringExpr gets a new ReferenceFormatString argument that is piped down when calling a function with the format_matches attribute (and is nullptr otherwise); this is the string that the actual format string is compared against.

Although this change does not enable -Wformat-nonliteral by default, IMO, all the pieces are now in place such that it could be.

@apple-fcloutier apple-fcloutier requested review from erichkeane and removed request for Endilll November 18, 2024 22:53
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 18, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 18, 2024

@llvm/pr-subscribers-clang

Author: None (apple-fcloutier)

Changes

This implements __attribute__((format_matches)), as described in the RFC: https://discourse.llvm.org/t/rfc-format-attribute-attribute-format-like/83076

The format attribute only allows the compiler to check that a format string matches its arguments. If the format string is passed independently of its arguments, there is no way to have the compiler check it. format_matches(flavor, fmtidx, example) allows the compiler to check format strings against the example format string instead of against format arguments. See the changes to AttrDocs.td in this diff for more information.

Implementation-wise, this change subclasses CheckPrintfHandler and CheckScanfHandler to allow them to collect specifiers into arrays, and implements comparing that two specifiers are equivalent. checkFormatStringExpr gets a new ReferenceFormatString argument that is piped down when calling a function with the format_matches attribute (and is nullptr otherwise); this is the string that the actual format string is compared against.

Although this change does not enable -Wformat-nonliteral by default, IMO, all the pieces are now in place such that it could be.


Patch is 80.51 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/116708.diff

12 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+44)
  • (modified) clang/include/clang/AST/FormatString.h (+2-1)
  • (modified) clang/include/clang/Basic/Attr.td (+10)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+97)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+21)
  • (modified) clang/include/clang/Sema/Sema.h (+27-10)
  • (modified) clang/lib/AST/AttrImpl.cpp (+5)
  • (modified) clang/lib/AST/FormatString.cpp (+91)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+607-147)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+93-27)
  • (modified) clang/lib/Sema/SemaObjC.cpp (+2-1)
  • (added) clang/test/Sema/format-string-matches.c (+122)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 0efe62f1804cd0a..66e4758fe89e243 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -357,6 +357,50 @@ Non-comprehensive list of changes in this release
 
 - ``__builtin_reduce_add`` function can now be used in constant expressions.
 
+- There is a new ``format_matches`` attribute to complement the existing
+  ``format`` attribute. ``format_matches`` allows the compiler to verify that
+  a format string argument is equivalent to a reference format string: it is
+  useful when a function accepts a format string without its accompanying
+  arguments to format. For instance:
+
+  .. code-block:: c
+
+    static int status_code;
+    static const char *status_string;
+
+    void print_status(const char *fmt) {
+      fprintf(stderr, fmt, status_code, status_string);
+      // ^ warning: format string is not a string literal [-Wformat-nonliteral]
+    }
+
+    void stuff(void) {
+      print_status("%s (%#08x)\n");
+      // order of %s and %x is swapped but there is no diagnostic
+    }
+  
+  Before the introducion of ``format_matches``, this code cannot be verified
+  at compile-time. ``format_matches`` plugs that hole:
+
+  .. code-block:: c
+
+    __attribute__((format_matches(printf, 1, "%x %s")))
+    void print_status(const char *fmt) {
+      fprintf(stderr, fmt, status_code, status_string);
+      // ^ `fmt` verified as if it was "%x %s" here; no longer triggers
+      //   -Wformat-nonliteral, would warn if arguments did not match "%x %s"
+    }
+
+    void stuff(void) {
+      print_status("%s (%#08x)\n");
+      // warning: format specifier 's' is incompatible with 'x'
+      // warning: format specifier 'x' is incompatible with 's'
+    }
+
+  Like with ``format``, the first argument is the format string flavor and the
+  second argument is the index of the format string parameter.
+  ``format_matches`` accepts an example valid format string as its third
+  argument. For more information, see the Clang attributes documentation.
+
 New Compiler Flags
 ------------------
 
diff --git a/clang/include/clang/AST/FormatString.h b/clang/include/clang/AST/FormatString.h
index a074dd23e2ad4c7..3560766433fe220 100644
--- a/clang/include/clang/AST/FormatString.h
+++ b/clang/include/clang/AST/FormatString.h
@@ -292,7 +292,7 @@ class ArgType {
   };
 
 private:
-  const Kind K;
+  Kind K;
   QualType T;
   const char *Name = nullptr;
   bool Ptr = false;
@@ -338,6 +338,7 @@ class ArgType {
   }
 
   MatchKind matchesType(ASTContext &C, QualType argTy) const;
+  MatchKind matchesArgType(ASTContext &C, const ArgType &other) const;
 
   QualType getRepresentativeType(ASTContext &C) const;
 
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 6035a563d5fce77..7723e631aa2529c 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1807,6 +1807,16 @@ def Format : InheritableAttr {
   let Documentation = [FormatDocs];
 }
 
+def FormatMatches : InheritableAttr {
+  let Spellings = [GCC<"format_matches">];
+  let Args = [IdentifierArgument<"Type">, IntArgument<"FormatIdx">, ExprArgument<"ExpectedFormat">];
+  let AdditionalMembers = [{
+    StringLiteral *getFormatString() const;
+  }];
+  let Subjects = SubjectList<[ObjCMethod, Block, HasFunctionProto]>;
+  let Documentation = [FormatMatchesDocs];
+}
+
 def FormatArg : InheritableAttr {
   let Spellings = [GCC<"format_arg">];
   let Args = [ParamIdxArgument<"FormatIdx">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 2fdceca163ee63b..2ca1bf1a454271a 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -3738,6 +3738,103 @@ behavior of the program is undefined.
   }];
 }
 
+def FormatMatchesDocs : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+
+The ``format`` attribute is the basis for the enforcement of diagnostics in the
+``-Wformat`` family, but it only handles the case where the format string is
+passed along with the arguments it is going to format. It cannot handle the case
+where the format string and the format arguments are passed separately from each
+other. For instance:
+
+.. code-block:: c
+
+  static const char *first_name;
+  static double todays_temperature;
+  static int wind_speed;
+
+  void say_hi(const char *fmt) {
+    printf(fmt, first_name, todays_temperature); // warning: format string is not a string literal
+    printf(fmt, first_name, wind_speed); // warning: format string is not a string literal
+  }
+
+  int main() {
+    say_hi("hello %s, it is %g degrees outside");
+    say_hi("hello %s, it is %d degrees outside!"); // no diagnostic
+  }
+
+In this example, ``fmt`` is expected to format a ``const char *`` and a
+``double``, but these values are not passed to ``say_hi``. Without the
+``format`` attribute (which cannot apply in this case), the -Wformat-nonliteral
+diagnostic triggers in the body of ``say_hi``, and incorrect ``say_hi`` call
+sites do not trigger a diagnostic.
+
+To complement the ``format`` attribute, Clang also defines the
+``format_matches`` attribute. Its syntax is similar to the ``format``
+attribute's, but instead of taking the index of the first formatted value
+argument, it takes a C string literal with the expected specifiers:
+
+.. code-block:: c
+
+  static const char *first_name;
+  static double todays_temperature;
+  static int wind_speed;
+
+  __attribute__((__format_matches__(printf, 1, "%s %g")))
+  void say_hi(const char *fmt) {
+    printf(fmt, first_name, todays_temperature); // no dignostic
+    printf(fmt, first_name, wind_speed); // warning: format specifies type 'int' but the argument has type 'double'
+  }
+
+  int main() {
+    say_hi("hello %s, it is %g degrees outside");
+    say_hi("it is %g degrees outside, have a good day %s!");
+    // warning: format specifies 'double' where 'const char *' is required
+    // warning: format specifies 'const char *' where 'double' is required
+  }
+
+The third argument to ``format_matches`` is expected to evaluate to a **C string
+literal** even when the format string would normally be a different type for the
+given flavor, like a ``CFStringRef`` or a ``NSString *``.
+
+In the implementation of a function with the ``format_matches`` attribute,
+format verification works as if the format string was identical to the one
+specified in the attribute.
+
+At the call sites of functions with the ``format_matches`` attribute, format
+verification instead compares the two format strings to evaluate their
+equivalence. Each format flavor defines equivalence between format specifiers.
+Generally speaking, two specifiers are equivalent if they format the same type.
+For instance, in the ``printf`` flavor, ``%2i`` and ``%-0.5d`` are compatible.
+When ``-Wformat-signedness`` is disabled, ``%d`` and ``%u`` are compatible. For
+a negative example, ``%ld`` is incompatible with ``%d``.
+
+Do note the following un-obvious cases:
+
+* Passing ``NULL`` as the format string is accepted without diagnostics.
+* When the format string is not NULL, it cannot _miss_ specifiers, even in
+  trailing positions. For instance, ``%d`` is not accepted when the required
+  format is ``%d %d %d``.
+* Specifiers for integers as small or smaller than ``int`` (such as ``%hhd``)
+  are all mutually compatible because standard argument promotion ensures that
+  integers are at least the size of ``int`` when passed as variadic arguments.
+  With ``-Wformat-signedness``, mixing specifier for types with a different
+  signedness still results in a diagnostic.
+* Format strings expecting a variable modifier (such as ``%*s``) are
+  incompatible with format strings that would itemize the variable modifiers
+  (such as ``%i %s``), even if the two specify ABI-compatible argument lists.
+* All pointer specifiers, modifiers aside, are mutually incompatible. For
+  instance, ``%s`` is not compatible with ``%p``, and ``%p`` is not compatible
+  with ``%n``, and ``%hhn`` is incompatible with ``%s``, even if the pointers
+  are ABI-compatible or identical on the selected platform. However, ``%0.5s``
+  is compatible with ``%s``, since the difference only exists in modifier flags.
+  This is not overridable with ``-Wformat-pedantic`` or its inverse, which
+  control similar behavior in ``-Wformat``.
+
+  }];
+}
+
 def FlagEnumDocs : Documentation {
   let Category = DocCatDecl;
   let Content = [{
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 3caf471d3037f9e..0e7ffc95f8b53ef 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -7687,6 +7687,7 @@ def warn_format_nonliteral_noargs : Warning<
 def warn_format_nonliteral : Warning<
   "format string is not a string literal">,
   InGroup<FormatNonLiteral>, DefaultIgnore;
+def err_format_nonliteral : Error<"format string is not a string literal">;
 
 def err_unexpected_interface : Error<
   "unexpected interface name %0: expected expression">;
@@ -9998,6 +9999,8 @@ def note_previous_declaration_as : Note<
 
 def warn_printf_insufficient_data_args : Warning<
   "more '%%' conversions than data arguments">, InGroup<FormatInsufficientArgs>;
+def warn_format_cmp_insufficient_specifiers : Warning<
+  "fewer specifiers in format string than expected">, InGroup<FormatInsufficientArgs>;
 def warn_printf_data_arg_not_used : Warning<
   "data argument not used by format string">, InGroup<FormatExtraArgs>;
 def warn_format_invalid_conversion : Warning<
@@ -10115,6 +10118,24 @@ def note_format_fix_specifier : Note<"did you mean to use '%0'?">;
 def note_printf_c_str: Note<"did you mean to call the %0 method?">;
 def note_format_security_fixit: Note<
   "treat the string as an argument to avoid this">;
+def warn_format_cmp_role_mismatch : Warning<
+  "format argument is %select{a value|an indirect field width|an indirect "
+  "precision|an auxiliary value}0, but it should be %select{a value|an indirect "
+  "field width|an indirect precision|an auxiliary value}1">, InGroup<Format>;
+def warn_format_cmp_modifierfor_mismatch : Warning<
+  "format argument modifies specifier at position %0, but it should modify "
+  "specifier at position %1">, InGroup<Format>;
+def warn_format_cmp_sensitivity_mismatch : Warning<
+  "argument sensitivity is %select{unspecified|private|public|sensitive}0, but "
+  "it should be %select{unspecified|private|public|sensitive}1">, InGroup<Format>;
+def warn_format_cmp_specifier_mismatch : Warning<
+  "format specifier '%0' is incompatible with '%1'">, InGroup<Format>;
+def warn_format_cmp_specifier_sign_mismatch : Warning<
+  "signedness of format specifier '%0' is incompatible with '%1'">, InGroup<Format>;
+def warn_format_cmp_specifier_mismatch_pedantic : Extension<
+  warn_format_cmp_specifier_sign_mismatch.Summary>, InGroup<FormatPedantic>;
+def note_format_cmp_with : Note<
+  "comparing with this %select{specifier|format string}0">;
 
 def warn_null_arg : Warning<
   "null passed to a callee that requires a non-null argument">,
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index d6f3508a5243f36..22df0bc2923db4e 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -2150,6 +2150,7 @@ class Sema final : public SemaBase {
     FAPK_Fixed,    // values to format are fixed (no C-style variadic arguments)
     FAPK_Variadic, // values to format are passed as variadic arguments
     FAPK_VAList,   // values to format are passed in a va_list
+    FAPK_Elsewhere, // values to format are not passed to this function
   };
 
   // Used to grab the relevant information from a FormatAttr and a
@@ -2160,12 +2161,15 @@ class Sema final : public SemaBase {
     FormatArgumentPassingKind ArgPassingKind;
   };
 
-  /// Given a FunctionDecl's FormatAttr, attempts to populate the
-  /// FomatStringInfo parameter with the FormatAttr's correct format_idx and
-  /// firstDataArg. Returns true when the format fits the function and the
-  /// FormatStringInfo has been populated.
-  static bool getFormatStringInfo(const FormatAttr *Format, bool IsCXXMember,
-                                  bool IsVariadic, FormatStringInfo *FSI);
+  /// Given a function and its FormatAttr or FormatMatchesAttr info, attempts to
+  /// populate the FomatStringInfo parameter with the attribute's correct
+  /// format_idx and firstDataArg. Returns true when the format fits the
+  /// function and the FormatStringInfo has been populated.
+  static bool getFormatStringInfo(const Decl *Function, unsigned FormatIdx,
+                                  unsigned FirstArg, FormatStringInfo *FSI);
+  static bool getFormatStringInfo(unsigned FormatIdx, unsigned FirstArg,
+                                  bool IsCXXMember, bool IsVariadic,
+                                  FormatStringInfo *FSI);
 
   // Used by C++ template instantiation.
   ExprResult BuiltinShuffleVector(CallExpr *TheCall);
@@ -2188,7 +2192,9 @@ class Sema final : public SemaBase {
     FST_Syslog,
     FST_Unknown
   };
+  static FormatStringType GetFormatStringType(StringRef FormatFlavor);
   static FormatStringType GetFormatStringType(const FormatAttr *Format);
+  static FormatStringType GetFormatStringType(const FormatMatchesAttr *Format);
 
   bool FormatStringHasSArg(const StringLiteral *FExpr);
 
@@ -2535,11 +2541,17 @@ class Sema final : public SemaBase {
                             VariadicCallType CallType, SourceLocation Loc,
                             SourceRange Range,
                             llvm::SmallBitVector &CheckedVarArgs);
+  bool CheckFormatString(const FormatMatchesAttr *Format,
+                         ArrayRef<const Expr *> Args, bool IsCXXMember,
+                         VariadicCallType CallType, SourceLocation Loc,
+                         SourceRange Range,
+                         llvm::SmallBitVector &CheckedVarArgs);
   bool CheckFormatArguments(ArrayRef<const Expr *> Args,
-                            FormatArgumentPassingKind FAPK, unsigned format_idx,
-                            unsigned firstDataArg, FormatStringType Type,
-                            VariadicCallType CallType, SourceLocation Loc,
-                            SourceRange range,
+                            FormatArgumentPassingKind FAPK,
+                            const StringLiteral *ReferenceFormatString,
+                            unsigned format_idx, unsigned firstDataArg,
+                            FormatStringType Type, VariadicCallType CallType,
+                            SourceLocation Loc, SourceRange range,
                             llvm::SmallBitVector &CheckedVarArgs);
 
   void CheckInfNaNFunction(const CallExpr *Call, const FunctionDecl *FDecl);
@@ -4527,6 +4539,11 @@ class Sema final : public SemaBase {
   FormatAttr *mergeFormatAttr(Decl *D, const AttributeCommonInfo &CI,
                               IdentifierInfo *Format, int FormatIdx,
                               int FirstArg);
+  FormatMatchesAttr *mergeFormatMatchesAttr(Decl *D,
+                                            const AttributeCommonInfo &CI,
+                                            IdentifierInfo *Format,
+                                            int FormatIdx,
+                                            StringLiteral *FormatStr);
 
   /// AddAlignedAttr - Adds an aligned attribute to a particular declaration.
   void AddAlignedAttr(Decl *D, const AttributeCommonInfo &CI, Expr *E,
diff --git a/clang/lib/AST/AttrImpl.cpp b/clang/lib/AST/AttrImpl.cpp
index f198a9acf8481fd..5a169f815637f6f 100644
--- a/clang/lib/AST/AttrImpl.cpp
+++ b/clang/lib/AST/AttrImpl.cpp
@@ -270,4 +270,9 @@ unsigned AlignedAttr::getAlignment(ASTContext &Ctx) const {
   return Ctx.getTargetDefaultAlignForAttributeAligned();
 }
 
+StringLiteral *FormatMatchesAttr::getFormatString() const {
+  // This cannot go in headers because StringLiteral and Expr may be incomplete.
+  return cast<StringLiteral>(getExpectedFormat());
+}
+
 #include "clang/AST/AttrImpl.inc"
diff --git a/clang/lib/AST/FormatString.cpp b/clang/lib/AST/FormatString.cpp
index e892c1592df9869..5d3b56fc4e71377 100644
--- a/clang/lib/AST/FormatString.cpp
+++ b/clang/lib/AST/FormatString.cpp
@@ -595,6 +595,97 @@ ArgType::matchesType(ASTContext &C, QualType argTy) const {
   llvm_unreachable("Invalid ArgType Kind!");
 }
 
+static analyze_format_string::ArgType::MatchKind
+integerTypeMatch(ASTContext &C, QualType A, QualType B, bool CheckSign) {
+  using MK = analyze_format_string::ArgType::MatchKind;
+
+  uint64_t IntSize = C.getTypeSize(C.IntTy);
+  uint64_t ASize = C.getTypeSize(A);
+  uint64_t BSize = C.getTypeSize(B);
+  if (std::max(ASize, IntSize) != std::max(BSize, IntSize))
+    return MK::NoMatch;
+  if (CheckSign && A->isSignedIntegerType() != B->isSignedIntegerType())
+    return MK::NoMatchSignedness;
+  if (ASize != BSize)
+    return MK::MatchPromotion;
+  return MK::Match;
+}
+
+analyze_format_string::ArgType::MatchKind
+ArgType::matchesArgType(ASTContext &C, const ArgType &Other) const {
+  using AK = analyze_format_string::ArgType::Kind;
+
+  // Per matchesType.
+  if (K == AK::InvalidTy || Other.K == AK::InvalidTy)
+    return NoMatch;
+  if (K == AK::UnknownTy || Other.K == AK::UnknownTy)
+    return Match;
+
+  // Handle whether either (or both, or neither) sides has Ptr set,
+  // in addition to whether either (or both, or neither) sides is a SpecificTy
+  // that is a pointer.
+  ArgType Left = *this;
+  bool LeftWasPointer = false;
+  ArgType Right = Other;
+  bool RightWasPointer = false;
+  if (Left.Ptr) {
+    Left.Ptr = false;
+    LeftWasPointer = true;
+  } else if (Left.K == AK::SpecificTy && Left.T->isPointerType()) {
+    Left.T = Left.T->getPointeeType();
+    LeftWasPointer = true;
+  }
+  if (Right.Ptr) {
+    Right.Ptr = false;
+    RightWasPointer = true;
+  } else if (Right.K == AK::SpecificTy && Right.T->isPointerType()) {
+    Right.T = Right.T->getPointeeType();
+    RightWasPointer = true;
+  }
+
+  if (LeftWasPointer != RightWasPointer)
+    return NoMatch;
+
+  // Ensure that if at least one side is a SpecificTy, then Left is a
+  // SpecificTy.
+  if (Right.K == AK::SpecificTy)
+    std::swap(Left, Right);
+
+  if (Left.K == AK::SpecificTy) {
+    if (Right.K == AK::SpecificTy) {
+      auto Canon1 = C.getCanonicalType(Left.T);
+      auto Canon2 = C.getCanonicalType(Right.T);
+      if (Canon1 == Canon2)
+        return Match;
+
+      auto *BT1 = QualType(Canon1)->getAs<BuiltinType>();
+      auto *BT2 = QualType(Canon2)->getAs<BuiltinType>();
+      if (BT1 == nullptr || BT2 == nullptr)
+        return NoMatch;
+      if (BT1 == BT2)
+        return Match;
+
+      if (!LeftWasPointer && BT1->isInteger() && BT2->isInteger())
+        return integerTypeMatch(C, Canon1, Canon2, true);
+      return NoMatch;
+    } else if (Right.K == AK::AnyCharTy) {
+      if (!LeftWasPointer && Left.T->isIntegerType())
+        return integerTypeMatch(C, Left.T, C.CharTy, false);
+      return NoMatch;
+    } else if (Right.K == AK::WIntTy) {
+      if (!LeftWasPointer && Left.T->isIntegerType())
+        return integerTypeMatch(C, Left.T, C.WIntTy, true);
+      return NoMatch;
+    }
+    // It's hypothetically possible to create an AK::SpecificTy ArgType
+    // that matches another kind of ArgType, but in practice Clang doesn't
+    // do that, so ignore that case.
+    return NoMatch;
+  }
+
+  return Left.K == Right.K ? Match : NoMatch;
+}
+
 ArgType ArgType::makeVectorType(ASTContext &C, unsigned NumElts) const {
   // Check for valid vector element types.
   if (T.isNull())
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 2d4a7cd287b70d7..27fd147fda7708b 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -3016,17 +3016,33 @@ bool Sema::ValueIsRunOfOnes(CallExpr *TheCall, unsigned ArgNum) {
          << ArgNum << Arg->getSourceRange();
 }
 
-bool Sema::getFormatStringInfo(const FormatAttr *Format, bool IsCXXMember,
-                               bool IsVariadic, FormatStringInfo *FSI) {
-  if (Format->getFirstArg() == 0)
+bool Sema::getFormatStringInfo(const Decl *D, unsigned FormatIdx,
+                               unsigned FirstArg, FormatStringInfo *FSI) {
+  ...
[truncated]

@apple-fcloutier apple-fcloutier force-pushed the apple-fcloutier/84936554 branch 2 times, most recently from e7ce50f to 9e466b8 Compare November 19, 2024 04:22
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level look... I don't see any huge problems with this. I'm concerned with teh complexity of the SemaChecking changes, that is a huge change for something I don't think needs to be that complicated. It isn't clear where all the complexity is coming from either.

@@ -3738,6 +3738,103 @@ behavior of the program is undefined.
}];
}

def FormatMatchesDocs : Documentation {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hopeful someone comes along and does a better 'english/documentation-quality' review of this. I don't have the mental acuity at the moment.

@apple-fcloutier
Copy link
Contributor Author

apple-fcloutier commented Dec 2, 2024

There's a handful of drivers for the size of the SemaChecking change:

  • generalize things that were previously specific to FormatAttr, pipe through expected format where applicable;
  • tolerate the new FAPK_External for when the format arguments are invisible to the caller;
  • implement comparing format strings.

The last one is the biggest one. SemaChecking knows how to check that a format string matches a provided list of arguments, but it doesn't know (before this change) how to check that two format strings are equivalent. This is because checking format strings is done conceptually by zipping format specifiers and argument types and seeing if they match with some fuzziness, rather than converting a format string to a list of types and then checking them together. I needed to implement a way to match two format strings "in abstract" for FormatMatches, since we don't see arguments.

The second one causes changes that look large in the diff but that are actually rather simple (for instance, wrapping a large-ish block of code in CheckPrintfHandler::HandlePrintfSpecifier with if (HasFormatArguments()) {).

@apple-fcloutier
Copy link
Contributor Author

Happy new year, @erichkeane! Here's a summary of actions taken in response to RFC feedback and review feedback:

  • We no longer accept different integer sizes (%hhi and %i are now incompatible, for instance), as agreed.
  • Removed support for scanf, as Aaron brought up a few things I wasn't clear about, and I don't need it, and it wasn't really part of the RFC, and it's probably better to leave it alone until somebody who knows scanf well wants to take it over.
  • Documentation was updated to clarify these 2 points.

Notably, I didn't change the handling of signedness mismatches (diagnostics are controlled by -Wformat-signedness): I understood this was OK when you said the RFC was good as it sat, without an agreement on changing this.

I recommend reviewing with ?w=1, as it brings down the size of the SemaChecking.cpp diff a good amount (down to 550ish lines).

@apple-fcloutier apple-fcloutier force-pushed the apple-fcloutier/84936554 branch from b7817c1 to f96aa63 Compare January 2, 2025 22:37
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did as deep of a look-through as I could, though this is obviously a big patch. I didn't come up with any real comments, so everythign looks alright to me.

I'd VERY much like to hold off and give others a chance to review this for an extended period of time due to its size, so if you could hold off committing based on my approval until at least Wednesday, that would be appreciated. Else, LGTM.

@apple-fcloutier
Copy link
Contributor Author

I lost commit access during the purge, so either you or @ahatanak will need to press the button for me once we feel good about the change. As a result, there should be no concern that I might merge too early.

@apple-fcloutier
Copy link
Contributor Author

As an update, I've been adding more tests and I found that I can make improvements with follow-up work. Since these aren't stability problems or false positives, and we think the change is already big enough, I am currently planning to submit them as a separate PR once this lands. The things in question that are addressed:

  • Passing a format string with the wrong type (for instance, a NSString format string to a function accepting a printf format string) terminates verification without emitting a diagnostic. This is an issue with the format attribute too (it emits a -Wformat-nonliteral warning for it, which is bad categorization and unspecific about what needs to happen).
  • If you have a function with format_matches that calls another function with format_matches passing its own format argument, Clang doesn't verify that the two format strings are compatible.
  • format_matches diagnostics always point at the format string: for instance, directly at the format string argument in printf("%s", "hello"), but at the fmt variable definition in const char *const fmt = "%s"; printf(fmt, "hello"). This can create situations where the diagnostic shows the problematic format strings, but not the format function call that references them. Format attribute checking uses CheckFormatHandler::EmitFormatDiagnostic to ensure that the format function call is consistently there (either directly where the warning points, or with a note if the call is elsewhere). Diagnostics for format_matches should do that too.
  • Reusing the "data argument not used by format string" diagnostic in the context of format_matches is not a great experience, this should be "more specifiers in format string than expected".

Let me know if you would like me to fold anything from this list into this PR.

@erichkeane
Copy link
Collaborator

As an update, I've been adding more tests and I found that I can make improvements with follow-up work. Since these aren't stability problems or false positives, and we think the change is already big enough, I am currently planning to submit them as a separate PR once this lands. The things in question that are addressed:

* Passing a format string with the wrong type (for instance, a `NSString` format string to a function accepting a `printf` format string) terminates verification without emitting a diagnostic. This is an issue with the `format` attribute too (it emits a -Wformat-nonliteral warning for it, which is bad categorization and unspecific about what needs to happen).

* If you have a function with `format_matches` that calls another function with `format_matches` passing its own format argument, Clang doesn't verify that the two format strings are compatible.

* `format_matches` diagnostics always point at the format string: for instance, directly at the format string argument in `printf("%s", "hello")`, but at the `fmt` variable definition in `const char *const fmt = "%s"; printf(fmt, "hello")`. This can create situations where the diagnostic shows the problematic format strings, but not the format function call that references them. `Format` attribute checking uses `CheckFormatHandler::EmitFormatDiagnostic` to ensure that the format function call is consistently there (either directly where the warning points, or with a note if the call is elsewhere). Diagnostics for `format_matches` should do that too.

* Reusing the "data argument not used by format string" diagnostic in the context of `format_matches` is not a great experience, this should be "more specifiers in format string than expected".

Let me know if you would like me to fold anything from this list into this PR.

2 & 3 are pretty significant to me, but I'd be ok doing 3 as a followup.

4 seems easy enough to just roll into it anyway.

Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I somehow missed it but it looks like we are not really covering the diagnostics fully and we need to in order to prevent regressions over time and to make sure we don't have latent bugs.

@apple-fcloutier apple-fcloutier force-pushed the apple-fcloutier/84936554 branch from dbfef7f to 41d6af3 Compare February 6, 2025 04:36
@apple-fcloutier
Copy link
Contributor Author

apple-fcloutier commented Feb 6, 2025

Happy February everyone! @shafik, I added tests for the situations you called out, which did reveal a number of weaknesses and caused changes as a result. @erichkeane, in January I had plans to make a second PR, but since some of my changes would address Shafik's comments, I folded them in entirely.

I can address any new feedback you have, of course. As last time, I recommend checking the diff with ?w=1.

As I said before, I lost the ability to merge PRs due to my sparse activity. We may need you (or @ahatanak) to click the button when we are ready.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately you caught me right before I'm leaving for the WG21 meeting, so I won't have much chance to do a thorough review on this until I come back.

That said,
There are quite a few places with manual loops that I suspect some work learning the LLVM ADT/STLExtras.h and the STL Algorithms would greatly simplify.

if (Sensitivity != Other.Sensitivity) {
// diagnose and continue
EmitDiagnostic(S,
S.PDiag(diag::warn_format_cmp_sensitivity_mismatch)
<< Sensitivity << Other.Sensitivity,
FmtExpr, InFunctionCall);
S.Diag(Other.ElementLoc, diag::note_format_cmp_with) << 0 << Other.Range;
HadError = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a nit: HadError = S.Diag(....) would improve readability IMO, and allow skipping some curleys.

bool HadError = false;
auto ArgIter = Args.begin();
auto ArgEnd = Args.end();
while (ArgIter != ArgEnd) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to understand this yet... but this looks like it should be composible with some std/ADT algorithms, and should be.

Copy link
Contributor Author

@apple-fcloutier apple-fcloutier Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, this groups Args by value of getPosition(), and for each group, we check that all arguments starting from index 1 are compatible with the argument at index 0. Args is already sorted by getPosition(), so the act of grouping is just to find the end iterator of each range. In std::, we have lower_bound/upper_bound that can be used to find the first and the one-past-the-end iterators for any given getPosition() value, but since we're going through the whole array anyways, I started at begin() and used find_if instead of upper_bound.

Now that you challenged me to find another way to write the same thing, I came up with this:

  bool HadError = false;
  auto Iter = Args.begin();
  auto End = Args.end();
  // Group arguments by getPosition() value, and check that each member of the group is
  // compatible with the first member. This verifies that when positional arguments are
  // used multiple times (such as %2$i %2$d), all uses are mutually compatible. As an
  // optimization, don't test the first member against itself.
  while (Iter != End) {
    const auto &First = *Iter;
    for (++Iter; Iter != End && Iter->getPosition() == First.getPosition(); ++Iter) {
      HadError |= !Iter->VerifyCompatible(*this, First, Str, true);
    }
  }

which I think I like better, and if you like it better too, I will go with that.

@@ -145,6 +145,7 @@ void test_multiple_format_strings(const char *fmt1, const char *fmt2) {
expected-warning{{passing 'os_log' format string where 'printf' format string is expected}}
}

// MARK: -
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably an Xcode-ism. Xcode interprets // MARK: ... and #pragma mark ... as a directive to add an entry to the symbol menu. "-" creates a separator. It's harmless but I can remove them if it's not customary.

bool HadError = false;
auto Iter = Args.begin();
auto End = Args.end();
while (Iter != End) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too looks like it could be better composed of algorithms, right now there is a lot of searching going on...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FWIW, this is the same thing as the other place you commented, but I moved it around between two commits)

@apple-fcloutier
Copy link
Contributor Author

@erichkeane, congrats on the C++26 milestone. I made changes to address your comment. Appreciate your time reviewing this PR.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I was ok with it last time (and a quick review this time confirms), so no need to wait for me. Though @shafik had some comments on this, so please let him approve this before submitting.

This implements ``__attribute__((format_matches))``, as described in the RFC:
https://discourse.llvm.org/t/rfc-format-attribute-attribute-format-like/83076

The ``format`` attribute only allows the compiler to check that a format
string matches its arguments. If the format string is passed independently
of its arguments, there is no way to have the compiler check it.
``format_matches(flavor, fmtidx, example)`` allows the compiler to check
format strings against the ``example`` format string instead of against
format arguments. See the changes to AttrDocs.td in this diff for more
information.

Implementation-wise, this change subclasses CheckPrintfHandler to allow
it to collect specifiers into arrays, and implements comparing that two
specifiers are equivalent.

Radar-Id: 84936554
@apple-fcloutier apple-fcloutier force-pushed the apple-fcloutier/84936554 branch from f4291a9 to 8909cd9 Compare February 24, 2025 23:42
@apple-fcloutier
Copy link
Contributor Author

Rebased and added the missing test for os_log specifier sensitivity.

Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hnrklssn hnrklssn merged commit c710118 into llvm:main Feb 25, 2025
12 checks passed
@hnrklssn hnrklssn deleted the apple-fcloutier/84936554 branch February 25, 2025 02:59
apple-fcloutier added a commit to swiftlang/llvm-project that referenced this pull request Feb 25, 2025
This implements ``__attribute__((format_matches))``, as described in the
RFC:
https://discourse.llvm.org/t/rfc-format-attribute-attribute-format-like/83076

The ``format`` attribute only allows the compiler to check that a format
string matches its arguments. If the format string is passed
independently of its arguments, there is no way to have the compiler
check it. ``format_matches(flavor, fmtidx, example)`` allows the
compiler to check format strings against the ``example`` format string
instead of against format arguments. See the changes to AttrDocs.td in
this diff for more information.

Implementation-wise, this change subclasses CheckPrintfHandler to allow it to collect specifiers into arrays, and
implements comparing that two specifiers are equivalent.
`checkFormatStringExpr` gets a new `ReferenceFormatString` argument that
is piped down when calling a function with the `format_matches`
attribute (and is `nullptr` otherwise); this is the string that the
actual format string is compared against.

Although this change does not enable -Wformat-nonliteral by default,
IMO, all the pieces are now in place such that it could be.

rdar://84936554
@thurstond
Copy link
Contributor

The newly added format-string-matches.c has been failing in the sanitizer builds (https://lab.llvm.org/buildbot/#/builders/52/builds/6312/steps/12/logs/stdio). Could you please take a look?

==clang==2272405==ERROR: AddressSanitizer: stack-use-after-scope on address 0x72d4dddb2570 at pc 0x57fb61e308a1 bp 0x7fff837cb2b0 sp 0x7fff837cb2a8
READ of size 8 at 0x72d4dddb2570 thread T0
    #0 0x57fb61e308a0 in EmitFormatDiagnostic<clang::CharSourceRange> /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6925:43
    #1 0x57fb61e308a0 in (anonymous namespace)::CheckFormatHandler::HandleInvalidConversionSpecifier(unsigned int, clang::SourceLocation, char const*, unsigned int, char const*, unsigned int) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6880:3
    #2 0x57fb61e2cf26 in (anonymous namespace)::CheckPrintfHandler::HandleInvalidPrintfConversionSpecifier(clang::analyze_printf::PrintfSpecifier const&, char const*, unsigned int) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:7152:10
    #3 0x57fb61e2d899 in (anonymous namespace)::CheckPrintfHandler::HandlePrintfSpecifier(clang::analyze_printf::PrintfSpecifier const&, char const*, unsigned int, clang::TargetInfo const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp
    #4 0x57fb61e40876 in (anonymous namespace)::DecomposePrintfHandler::HandlePrintfSpecifier(clang::analyze_printf::PrintfSpecifier const&, char const*, unsigned int, clang::TargetInfo const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:7416:28
    #5 0x57fb6496f402 in clang::analyze_format_string::ParsePrintfString(clang::analyze_format_string::FormatStringHandler&, char const*, char const*, clang::LangOptions const&, clang::TargetInfo const&, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/AST/PrintfFormatString.cpp:449:12
    #6 0x57fb61df40d7 in (anonymous namespace)::DecomposePrintfHandler::GetSpecifiers(clang::Sema&, (anonymous namespace)::FormatStringLiteral const*, clang::Expr const*, clang::Sema::FormatStringType, bool, bool, llvm::SmallVectorImpl<(anonymous namespace)::EquatableFormatArgument>&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:7398:8
    #7 0x57fb61df3322 in clang::Sema::CheckFormatStringsCompatible(clang::Sema::FormatStringType, clang::StringLiteral const*, clang::StringLiteral const*, clang::Expr const*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:8660:7
    #8 0x57fb61e274b2 in CheckFormatString(clang::Sema&, (anonymous namespace)::FormatStringLiteral const*, clang::StringLiteral const*, clang::Expr const*, llvm::ArrayRef<clang::Expr const*>, clang::Sema::FormatArgumentPassingKind, unsigned int, unsigned int, clang::Sema::FormatStringType, bool, clang::Sema::VariadicCallType, llvm::SmallBitVector&, (anonymous namespace)::UncoveredArgHandler&, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:8620:9
    #9 0x57fb61df0052 in checkFormatStringExpr(clang::Sema&, clang::StringLiteral const*, clang::Expr const*, llvm::ArrayRef<clang::Expr const*>, clang::Sema::FormatArgumentPassingKind, unsigned int, unsigned int, clang::Sema::FormatStringType, clang::Sema::VariadicCallType, bool, llvm::SmallBitVector&, (anonymous namespace)::UncoveredArgHandler&, llvm::APSInt, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6270:7
    #10 0x57fb61deba4b in clang::Sema::CheckFormatArguments(llvm::ArrayRef<clang::Expr const*>, clang::Sema::FormatArgumentPassingKind, clang::StringLiteral const*, unsigned int, unsigned int, clang::Sema::FormatStringType, clang::Sema::VariadicCallType, clang::SourceLocation, clang::SourceRange, llvm::SmallBitVector&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6455:31
    #11 0x57fb61dd4429 in CheckFormatString /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6418:12
    #12 0x57fb61dd4429 in clang::Sema::checkCall(clang::NamedDecl*, clang::FunctionProtoType const*, clang::Expr const*, llvm::ArrayRef<clang::Expr const*>, bool, clang::SourceLocation, clang::SourceRange, clang::Sema::VariadicCallType) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:3307:7
    #13 0x57fb61dda4cb in clang::Sema::CheckFunctionCall(clang::FunctionDecl*, clang::CallExpr*, clang::FunctionProtoType const*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:3546:3
    #14 0x57fb6234c9b6 in clang::Sema::BuildResolvedCallExpr(clang::Expr*, clang::NamedDecl*, clang::SourceLocation, llvm::ArrayRef<clang::Expr*>, clang::SourceLocation, clang::Expr*, bool, clang::CallExpr::ADLCallKind) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaExpr.cpp:7036:9
    #15 0x57fb62310d1d in clang::Sema::BuildCallExpr(clang::Scope*, clang::Expr*, clang::SourceLocation, llvm::MutableArrayRef<clang::Expr*>, clang::SourceLocation, clang::Expr*, bool, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaExpr.cpp:6693:10
    #16 0x57fb623474fe in clang::Sema::ActOnCallExpr(clang::Scope*, clang::Expr*, clang::SourceLocation, llvm::MutableArrayRef<clang::Expr*>, clang::SourceLocation, clang::Expr*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaExpr.cpp:6456:7
    #17 0x57fb6194c787 in clang::Parser::ParsePostfixExpressionSuffix(clang::ActionResult<clang::Expr*, true>) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:2269:23
    #18 0x57fb61953272 in clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, bool&, clang::Parser::TypeCastState, bool, bool*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:1962:9
    #19 0x57fb61945e62 in ParseCastExpression /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:729:20
    #20 0x57fb61945e62 in clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:184:20
    #21 0x57fb61945acd in clang::Parser::ParseExpression(clang::Parser::TypeCastState) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:135:18
    #22 0x57fb61ac41fd in clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:564:19
    #23 0x57fb61abcdfa in clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::ParsedAttributes&, clang::ParsedAttributes&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:293:14
    #24 0x57fb61abb924 in clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:125:20
    #25 0x57fb61ad775b in clang::Parser::ParseCompoundStatementBody(bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:1267:11
    #26 0x57fb61adae8e in clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:2577:21
    #27 0x57fb618f7866 in clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:1520:10
    #28 0x57fb61a18176 in clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::ParsedAttributes&, clang::Parser::ParsedTemplateInfo&, clang::SourceLocation*, clang::Parser::ForRangeInit*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseDecl.cpp:2461:17
    #29 0x57fb618f4a13 in clang::Parser::ParseDeclOrFunctionDefInternal(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec&, clang::AccessSpecifier) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:1244:10
    #30 0x57fb618f3529 in clang::Parser::ParseDeclarationOrFunctionDefinition(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*, clang::AccessSpecifier) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:1266:12
    #31 0x57fb618f0528 in clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:1069:14
    #32 0x57fb618eb6ab in clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:758:12
    #33 0x57fb618dd76d in clang::ParseAST(clang::Sema&, bool, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseAST.cpp:171:20
    #34 0x57fb5e5f68d8 in clang::FrontendAction::Execute() /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Frontend/FrontendAction.cpp:1072:3
    #35 0x57fb5e4f8d0f in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:1056:33
    #36 0x57fb5e8488a5 in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:280:25
    #37 0x57fb556a7754 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/tools/driver/cc1_main.cpp:290:15
    #38 0x57fb5569f1bf in ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/tools/driver/driver.cpp:218:12
    #39 0x57fb5569d657 in clang_main(int, char**, llvm::ToolContext const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/tools/driver/driver.cpp:259:12
    #40 0x57fb556bf501 in main /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/tools/clang/tools/driver/clang-driver.cpp:17:10
    #41 0x76d4dfc2a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 91f01b4ad171c80b6303d08d1f08cba8b990413d)
    #42 0x76d4dfc2a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 91f01b4ad171c80b6303d08d1f08cba8b990413d)
    #43 0x57fb555b3ae4 in _start (/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-21+0xc1caae4)
Address 0x72d4dddb2570 is located in stack of thread T0 at offset 368 in frame
    #0 0x57fb61df3d5f in (anonymous namespace)::DecomposePrintfHandler::GetSpecifiers(clang::Sema&, (anonymous namespace)::FormatStringLiteral const*, clang::Expr const*, clang::Sema::FormatStringType, bool, bool, llvm::SmallVectorImpl<(anonymous namespace)::EquatableFormatArgument>&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:7388

@thurstond
Copy link
Contributor

Benjamin Kramer fixed it at head (85eb725)

The newly added format-string-matches.c has been failing in the sanitizer builds (https://lab.llvm.org/buildbot/#/builders/52/builds/6312/steps/12/logs/stdio). Could you please take a look?

==clang==2272405==ERROR: AddressSanitizer: stack-use-after-scope on address 0x72d4dddb2570 at pc 0x57fb61e308a1 bp 0x7fff837cb2b0 sp 0x7fff837cb2a8
READ of size 8 at 0x72d4dddb2570 thread T0
    #0 0x57fb61e308a0 in EmitFormatDiagnostic<clang::CharSourceRange> /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6925:43
    #1 0x57fb61e308a0 in (anonymous namespace)::CheckFormatHandler::HandleInvalidConversionSpecifier(unsigned int, clang::SourceLocation, char const*, unsigned int, char const*, unsigned int) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6880:3
    #2 0x57fb61e2cf26 in (anonymous namespace)::CheckPrintfHandler::HandleInvalidPrintfConversionSpecifier(clang::analyze_printf::PrintfSpecifier const&, char const*, unsigned int) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:7152:10
    #3 0x57fb61e2d899 in (anonymous namespace)::CheckPrintfHandler::HandlePrintfSpecifier(clang::analyze_printf::PrintfSpecifier const&, char const*, unsigned int, clang::TargetInfo const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp
    #4 0x57fb61e40876 in (anonymous namespace)::DecomposePrintfHandler::HandlePrintfSpecifier(clang::analyze_printf::PrintfSpecifier const&, char const*, unsigned int, clang::TargetInfo const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:7416:28
    #5 0x57fb6496f402 in clang::analyze_format_string::ParsePrintfString(clang::analyze_format_string::FormatStringHandler&, char const*, char const*, clang::LangOptions const&, clang::TargetInfo const&, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/AST/PrintfFormatString.cpp:449:12
    #6 0x57fb61df40d7 in (anonymous namespace)::DecomposePrintfHandler::GetSpecifiers(clang::Sema&, (anonymous namespace)::FormatStringLiteral const*, clang::Expr const*, clang::Sema::FormatStringType, bool, bool, llvm::SmallVectorImpl<(anonymous namespace)::EquatableFormatArgument>&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:7398:8
    #7 0x57fb61df3322 in clang::Sema::CheckFormatStringsCompatible(clang::Sema::FormatStringType, clang::StringLiteral const*, clang::StringLiteral const*, clang::Expr const*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:8660:7
    #8 0x57fb61e274b2 in CheckFormatString(clang::Sema&, (anonymous namespace)::FormatStringLiteral const*, clang::StringLiteral const*, clang::Expr const*, llvm::ArrayRef<clang::Expr const*>, clang::Sema::FormatArgumentPassingKind, unsigned int, unsigned int, clang::Sema::FormatStringType, bool, clang::Sema::VariadicCallType, llvm::SmallBitVector&, (anonymous namespace)::UncoveredArgHandler&, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:8620:9
    #9 0x57fb61df0052 in checkFormatStringExpr(clang::Sema&, clang::StringLiteral const*, clang::Expr const*, llvm::ArrayRef<clang::Expr const*>, clang::Sema::FormatArgumentPassingKind, unsigned int, unsigned int, clang::Sema::FormatStringType, clang::Sema::VariadicCallType, bool, llvm::SmallBitVector&, (anonymous namespace)::UncoveredArgHandler&, llvm::APSInt, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6270:7
    #10 0x57fb61deba4b in clang::Sema::CheckFormatArguments(llvm::ArrayRef<clang::Expr const*>, clang::Sema::FormatArgumentPassingKind, clang::StringLiteral const*, unsigned int, unsigned int, clang::Sema::FormatStringType, clang::Sema::VariadicCallType, clang::SourceLocation, clang::SourceRange, llvm::SmallBitVector&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6455:31
    #11 0x57fb61dd4429 in CheckFormatString /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:6418:12
    #12 0x57fb61dd4429 in clang::Sema::checkCall(clang::NamedDecl*, clang::FunctionProtoType const*, clang::Expr const*, llvm::ArrayRef<clang::Expr const*>, bool, clang::SourceLocation, clang::SourceRange, clang::Sema::VariadicCallType) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:3307:7
    #13 0x57fb61dda4cb in clang::Sema::CheckFunctionCall(clang::FunctionDecl*, clang::CallExpr*, clang::FunctionProtoType const*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:3546:3
    #14 0x57fb6234c9b6 in clang::Sema::BuildResolvedCallExpr(clang::Expr*, clang::NamedDecl*, clang::SourceLocation, llvm::ArrayRef<clang::Expr*>, clang::SourceLocation, clang::Expr*, bool, clang::CallExpr::ADLCallKind) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaExpr.cpp:7036:9
    #15 0x57fb62310d1d in clang::Sema::BuildCallExpr(clang::Scope*, clang::Expr*, clang::SourceLocation, llvm::MutableArrayRef<clang::Expr*>, clang::SourceLocation, clang::Expr*, bool, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaExpr.cpp:6693:10
    #16 0x57fb623474fe in clang::Sema::ActOnCallExpr(clang::Scope*, clang::Expr*, clang::SourceLocation, llvm::MutableArrayRef<clang::Expr*>, clang::SourceLocation, clang::Expr*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaExpr.cpp:6456:7
    #17 0x57fb6194c787 in clang::Parser::ParsePostfixExpressionSuffix(clang::ActionResult<clang::Expr*, true>) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:2269:23
    #18 0x57fb61953272 in clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, bool&, clang::Parser::TypeCastState, bool, bool*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:1962:9
    #19 0x57fb61945e62 in ParseCastExpression /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:729:20
    #20 0x57fb61945e62 in clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:184:20
    #21 0x57fb61945acd in clang::Parser::ParseExpression(clang::Parser::TypeCastState) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseExpr.cpp:135:18
    #22 0x57fb61ac41fd in clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:564:19
    #23 0x57fb61abcdfa in clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::ParsedAttributes&, clang::ParsedAttributes&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:293:14
    #24 0x57fb61abb924 in clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:125:20
    #25 0x57fb61ad775b in clang::Parser::ParseCompoundStatementBody(bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:1267:11
    #26 0x57fb61adae8e in clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseStmt.cpp:2577:21
    #27 0x57fb618f7866 in clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:1520:10
    #28 0x57fb61a18176 in clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::ParsedAttributes&, clang::Parser::ParsedTemplateInfo&, clang::SourceLocation*, clang::Parser::ForRangeInit*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseDecl.cpp:2461:17
    #29 0x57fb618f4a13 in clang::Parser::ParseDeclOrFunctionDefInternal(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec&, clang::AccessSpecifier) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:1244:10
    #30 0x57fb618f3529 in clang::Parser::ParseDeclarationOrFunctionDefinition(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*, clang::AccessSpecifier) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:1266:12
    #31 0x57fb618f0528 in clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:1069:14
    #32 0x57fb618eb6ab in clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/Parser.cpp:758:12
    #33 0x57fb618dd76d in clang::ParseAST(clang::Sema&, bool, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Parse/ParseAST.cpp:171:20
    #34 0x57fb5e5f68d8 in clang::FrontendAction::Execute() /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Frontend/FrontendAction.cpp:1072:3
    #35 0x57fb5e4f8d0f in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:1056:33
    #36 0x57fb5e8488a5 in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:280:25
    #37 0x57fb556a7754 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/tools/driver/cc1_main.cpp:290:15
    #38 0x57fb5569f1bf in ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/tools/driver/driver.cpp:218:12
    #39 0x57fb5569d657 in clang_main(int, char**, llvm::ToolContext const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/tools/driver/driver.cpp:259:12
    #40 0x57fb556bf501 in main /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/tools/clang/tools/driver/clang-driver.cpp:17:10
    #41 0x76d4dfc2a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 91f01b4ad171c80b6303d08d1f08cba8b990413d)
    #42 0x76d4dfc2a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 91f01b4ad171c80b6303d08d1f08cba8b990413d)
    #43 0x57fb555b3ae4 in _start (/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-21+0xc1caae4)
Address 0x72d4dddb2570 is located in stack of thread T0 at offset 368 in frame
    #0 0x57fb61df3d5f in (anonymous namespace)::DecomposePrintfHandler::GetSpecifiers(clang::Sema&, (anonymous namespace)::FormatStringLiteral const*, clang::Expr const*, clang::Sema::FormatStringType, bool, bool, llvm::SmallVectorImpl<(anonymous namespace)::EquatableFormatArgument>&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/lib/Sema/SemaChecking.cpp:7388

apple-fcloutier added a commit to swiftlang/llvm-project that referenced this pull request Feb 25, 2025
This implements ``__attribute__((format_matches))``, as described in the
RFC:
https://discourse.llvm.org/t/rfc-format-attribute-attribute-format-like/83076

The ``format`` attribute only allows the compiler to check that a format
string matches its arguments. If the format string is passed
independently of its arguments, there is no way to have the compiler
check it. ``format_matches(flavor, fmtidx, example)`` allows the
compiler to check format strings against the ``example`` format string
instead of against format arguments. See the changes to AttrDocs.td in
this diff for more information.

Implementation-wise, this change subclasses CheckPrintfHandler to allow it to collect specifiers into arrays, and
implements comparing that two specifiers are equivalent.
`checkFormatStringExpr` gets a new `ReferenceFormatString` argument that
is piped down when calling a function with the `format_matches`
attribute (and is `nullptr` otherwise); this is the string that the
actual format string is compared against.

Although this change does not enable -Wformat-nonliteral by default,
IMO, all the pieces are now in place such that it could be.

rdar://84936554
@apple-fcloutier
Copy link
Contributor Author

apple-fcloutier commented Feb 25, 2025

Thanks Benjamin for taking care of it. I will enable ASAN in development builds going forward.

@bgra8
Copy link
Contributor

bgra8 commented Mar 5, 2025

@apple-fcloutier it seems this patch breaks code that declares functions with multiple different __attibute__((__format__)) specifications.

One such example is the util-linux library (https://www.kernel.org/pub/linux/utils/util-linux/v2.41/) which has such a declaration:

int ul_path_scanff(struct path_cxt *pc, const char *path, va_list ap, const char *fmt, ...)
				__attribute__ ((__format__ (__printf__, 2, 0)))
				__attribute__ ((__format__ (__scanf__, 4, 5)));

The intention is to specify that path is a printf format and fmt is a scanf format.

With the current patch, the compilers wrongfully errors when fmt is passed to vfscanf in the function implementation with the message:

path.c:734:18: error: passing 'printf' format string where 'scanf' format string is expected [-Werror,-Wformat]
  734 |         rc = vfscanf(f, fmt, fmt_ap);

So it looks like the compiler does not correctly identify each format attribute in the function declaration.

Is this intended?

@apple-fcloutier
Copy link
Contributor Author

apple-fcloutier commented Mar 5, 2025

This is not intended. There is a test for a function with both a printf and scan format (https://github.com/swiftlang/llvm-project/blob/8909cd902bf9985185f59b612e478fc096e91308/clang/test/Sema/format-strings.c#L490), but it doesn't cover the implementation of such a function. I'll get on it.

@apple-fcloutier
Copy link
Contributor Author

Fixing in #129954.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants