Skip to content

[clang] Catch missing format attributes #105479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

budimirarandjelovichtec
Copy link
Contributor

Enable flag -Wmissing-format-attribute to catch missing attributes.

Fixes #60718

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Aug 21, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2024

@llvm/pr-subscribers-clang

Author: Budimir Aranđelović (budimirarandjelovichtec)

Changes

Enable flag -Wmissing-format-attribute to catch missing attributes.

Fixes #60718


Patch is 40.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/105479.diff

9 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+3)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (-1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+3)
  • (modified) clang/include/clang/Sema/Attr.h (+7)
  • (modified) clang/include/clang/Sema/Sema.h (+4)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+2)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+214-2)
  • (added) clang/test/Sema/attr-format-missing.c (+393)
  • (added) clang/test/Sema/attr-format-missing.cpp (+174)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 6adf57da42e656..f9fd4a20ea6bf6 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -707,6 +707,9 @@ Improvements to Clang's diagnostics
 
 - Clang now diagnoses integer constant expressions that are folded to a constant value as an extension in more circumstances. Fixes #GH59863
 
+- Clang now diagnoses missing format attributes for non-template functions and
+  class/struct/union members. Fixes #GH60718
+
 Improvements to Clang's time-trace
 ----------------------------------
 
diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td
index 2241f8481484e2..da6a3b2fe3571b 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -525,7 +525,6 @@ def MainReturnType : DiagGroup<"main-return-type">;
 def MaxUnsignedZero : DiagGroup<"max-unsigned-zero">;
 def MissingBraces : DiagGroup<"missing-braces">;
 def MissingDeclarations: DiagGroup<"missing-declarations">;
-def : DiagGroup<"missing-format-attribute">;
 def MissingIncludeDirs : DiagGroup<"missing-include-dirs">;
 def MissingNoreturn : DiagGroup<"missing-noreturn">;
 def MultiChar : DiagGroup<"multichar">;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 940f9ac226ca87..75e51c5aa166d2 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -1031,6 +1031,9 @@ def err_opencl_invalid_param : Error<
 def err_opencl_invalid_return : Error<
   "declaring function return value of type %0 is not allowed %select{; did you forget * ?|}1">;
 def warn_enum_value_overflow : Warning<"overflow in enumeration value">;
+def warn_missing_format_attribute : Warning<
+  "diagnostic behavior may be improved by adding the %0 format attribute to the declaration of %1">,
+  InGroup<DiagGroup<"missing-format-attribute">>, DefaultIgnore;
 def warn_pragma_options_align_reset_failed : Warning<
   "#pragma options align=reset failed: %0">,
   InGroup<IgnoredPragmas>;
diff --git a/clang/include/clang/Sema/Attr.h b/clang/include/clang/Sema/Attr.h
index 3f0b10212789a4..37c124ca7b454a 100644
--- a/clang/include/clang/Sema/Attr.h
+++ b/clang/include/clang/Sema/Attr.h
@@ -123,6 +123,13 @@ inline bool isInstanceMethod(const Decl *D) {
   return false;
 }
 
+inline bool checkIfMethodHasImplicitObjectParameter(const Decl *D) {
+  if (const auto *MethodDecl = dyn_cast<CXXMethodDecl>(D))
+    return MethodDecl->isInstance() &&
+           !MethodDecl->hasCXXExplicitFunctionObjectParameter();
+  return false;
+}
+
 /// Diagnose mutually exclusive attributes when present on a given
 /// declaration. Returns true if diagnosed.
 template <typename AttrTy>
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 57994f4033922c..9022ac86edf817 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -4602,6 +4602,10 @@ class Sema final : public SemaBase {
 
   enum class RetainOwnershipKind { NS, CF, OS };
 
+  void DiagnoseMissingFormatAttributes(Stmt *Body, const FunctionDecl *FDecl);
+  std::vector<FormatAttr *>
+  GetMissingFormatAttributes(Stmt *Body, const FunctionDecl *FDecl);
+
   UuidAttr *mergeUuidAttr(Decl *D, const AttributeCommonInfo &CI,
                           StringRef UuidAsWritten, MSGuidDecl *GuidDecl);
 
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 66eeaa8e6f7777..29c28d4b567513 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -15925,6 +15925,8 @@ Decl *Sema::ActOnFinishFunctionBody(Decl *dcl, Stmt *Body,
         }
       }
 
+      DiagnoseMissingFormatAttributes(Body, FD);
+
       // We might not have found a prototype because we didn't wish to warn on
       // the lack of a missing prototype. Try again without the checks for
       // whether we want to warn on the missing prototype.
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index f2cd46d1e7c932..4634dd8b9cce80 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -3508,7 +3508,7 @@ static void handleFormatAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
 
   // In C++ the implicit 'this' function parameter also counts, and they are
   // counted from one.
-  bool HasImplicitThisParam = isInstanceMethod(D);
+  bool HasImplicitThisParam = checkIfMethodHasImplicitObjectParameter(D);
   unsigned NumArgs = getFunctionOrMethodNumParams(D) + HasImplicitThisParam;
 
   IdentifierInfo *II = AL.getArgAsIdent(0)->Ident;
@@ -3621,7 +3621,7 @@ static void handleCallbackAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
     return;
   }
 
-  bool HasImplicitThisParam = isInstanceMethod(D);
+  bool HasImplicitThisParam = checkIfMethodHasImplicitObjectParameter(D);
   int32_t NumArgs = getFunctionOrMethodNumParams(D);
 
   FunctionDecl *FD = D->getAsFunction();
@@ -5320,6 +5320,218 @@ static void handlePreferredTypeAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   D->addAttr(::new (S.Context) PreferredTypeAttr(S.Context, AL, ParmTSI));
 }
 
+// This function is called only if function call is not inside template body.
+// TODO: Add call for function calls inside template body.
+// Emit warnings if parent function misses format attributes.
+void Sema::DiagnoseMissingFormatAttributes(Stmt *Body,
+                                           const FunctionDecl *FDecl) {
+  assert(FDecl);
+
+  // If there are no function body, exit.
+  if (!Body)
+    return;
+
+  // Get missing format attributes
+  std::vector<FormatAttr *> MissingFormatAttributes =
+      GetMissingFormatAttributes(Body, FDecl);
+  if (MissingFormatAttributes.empty())
+    return;
+
+  // Check if there are more than one format type found. In that case do not
+  // emit diagnostic.
+  const clang::IdentifierInfo *AttrType = MissingFormatAttributes[0]->getType();
+  if (llvm::any_of(MissingFormatAttributes, [&](const FormatAttr *Attr) {
+        return AttrType != Attr->getType();
+      }))
+    return;
+
+  for (const FormatAttr *FA : MissingFormatAttributes) {
+    // If format index and first-to-check argument index are negative, it means
+    // that this attribute is only saved for multiple format types checking.
+    if (FA->getFormatIdx() < 0 || FA->getFirstArg() < 0)
+      continue;
+
+    // Emit diagnostic
+    SourceLocation Loc = FDecl->getLocation();
+    Diag(Loc, diag::warn_missing_format_attribute)
+        << FA->getType() << FDecl
+        << FixItHint::CreateInsertion(Loc,
+                                      (llvm::Twine("__attribute__((format(") +
+                                       FA->getType()->getName() + ", " +
+                                       llvm::Twine(FA->getFormatIdx()) + ", " +
+                                       llvm::Twine(FA->getFirstArg()) + ")))")
+                                          .str());
+  }
+}
+
+// Returns vector of format attributes. There are no two attributes with same
+// arguments in returning vector. There can be attributes that effectivelly only
+// store information about format type.
+std::vector<FormatAttr *>
+Sema::GetMissingFormatAttributes(Stmt *Body, const FunctionDecl *FDecl) {
+  unsigned int FunctionFormatArgumentIndexOffset =
+      checkIfMethodHasImplicitObjectParameter(FDecl) ? 2 : 1;
+
+  std::vector<FormatAttr *> MissingAttributes;
+
+  // Iterate over body statements.
+  for (auto *Child : Body->children()) {
+    // If child statement is compound statement, recursively get missing
+    // attributes.
+    if (dyn_cast_or_null<CompoundStmt>(Child)) {
+      std::vector<FormatAttr *> CompoundStmtMissingAttributes =
+          GetMissingFormatAttributes(Child, FDecl);
+
+      // If there are already missing attributes with same arguments, do not add
+      // duplicates.
+      for (FormatAttr *FA : CompoundStmtMissingAttributes) {
+        if (!llvm::any_of(MissingAttributes, [&](const FormatAttr *Attr) {
+              return FA->getType() == Attr->getType() &&
+                     FA->getFormatIdx() == Attr->getFormatIdx() &&
+                     FA->getFirstArg() == Attr->getFirstArg();
+            }))
+          MissingAttributes.push_back(FA);
+      }
+
+      continue;
+    }
+
+    ValueStmt *VS = dyn_cast_or_null<ValueStmt>(Child);
+    if (!VS)
+      continue;
+    Expr *TheExpr = VS->getExprStmt();
+    if (!TheExpr)
+      continue;
+    CallExpr *TheCall = dyn_cast_or_null<CallExpr>(TheExpr);
+    if (!TheCall)
+      continue;
+    const FunctionDecl *ChildFunction =
+        dyn_cast_or_null<FunctionDecl>(TheCall->getCalleeDecl());
+    if (!ChildFunction)
+      continue;
+
+    Expr **Args = TheCall->getArgs();
+    unsigned int NumArgs = TheCall->getNumArgs();
+
+    // If child expression is function, check if it is format function.
+    // If it is, check if parent function misses format attributes.
+
+    // If child function is format function and format arguments are not
+    // relevant to emit diagnostic, save only information about format type
+    // (format index and first-to-check argument index are set to -1).
+    // Information about format type is later used to determine if there are
+    // more than one format type found.
+
+    unsigned int ChildFunctionFormatArgumentIndexOffset =
+        checkIfMethodHasImplicitObjectParameter(ChildFunction) ? 2 : 1;
+
+    // Check if function has format attribute with forwarded format string.
+    IdentifierInfo *AttrType;
+    const ParmVarDecl *FormatArg;
+    if (llvm::any_of(ChildFunction->specific_attrs<FormatAttr>(),
+                     [&](const FormatAttr *Attr) {
+                       AttrType = Attr->getType();
+
+                       int OffsetFormatIndex =
+                           Attr->getFormatIdx() -
+                           ChildFunctionFormatArgumentIndexOffset;
+                       if (OffsetFormatIndex < 0 ||
+                           (unsigned)OffsetFormatIndex >= NumArgs)
+                         return true;
+
+                       const auto *FormatArgExpr = dyn_cast<DeclRefExpr>(
+                           Args[OffsetFormatIndex]->IgnoreParenCasts());
+                       if (!FormatArgExpr)
+                         return true;
+
+                       FormatArg = dyn_cast_or_null<ParmVarDecl>(
+                           FormatArgExpr->getReferencedDeclOfCallee());
+                       if (!FormatArg)
+                         return true;
+
+                       return false;
+                     })) {
+      MissingAttributes.push_back(
+          FormatAttr::CreateImplicit(getASTContext(), AttrType, -1, -1));
+      continue;
+    }
+
+    // Do not add in a vector format attributes whose type is different than
+    // parent function attribute type.
+    if (llvm::any_of(FDecl->specific_attrs<FormatAttr>(),
+                     [&](const FormatAttr *FunctionAttr) {
+                       return AttrType != FunctionAttr->getType();
+                     }))
+      continue;
+
+    // Check if format string argument is parent function parameter.
+    int StringIndex = 0;
+    if (!llvm::any_of(FDecl->parameters(), [&](const ParmVarDecl *Param) {
+          if (Param != FormatArg)
+            return false;
+
+          StringIndex = Param->getFunctionScopeIndex() +
+                        FunctionFormatArgumentIndexOffset;
+
+          return true;
+        })) {
+      MissingAttributes.push_back(
+          FormatAttr::CreateImplicit(getASTContext(), AttrType, -1, -1));
+      continue;
+    }
+
+    unsigned NumOfParentFunctionParams = FDecl->getNumParams();
+
+    // Compare parent and calling function format attribute arguments (archetype
+    // and format string).
+    if (llvm::any_of(
+            FDecl->specific_attrs<FormatAttr>(), [&](const FormatAttr *Attr) {
+              if (Attr->getType() != AttrType)
+                return false;
+              int OffsetFormatIndex =
+                  Attr->getFormatIdx() - FunctionFormatArgumentIndexOffset;
+
+              if (OffsetFormatIndex < 0 ||
+                  (unsigned)OffsetFormatIndex >= NumOfParentFunctionParams)
+                return false;
+
+              if (FDecl->parameters()[OffsetFormatIndex] != FormatArg)
+                return false;
+
+              return true;
+            })) {
+      MissingAttributes.push_back(
+          FormatAttr::CreateImplicit(getASTContext(), AttrType, -1, -1));
+      continue;
+    }
+
+    // Get first argument index
+    int FirstToCheck = [&]() -> unsigned {
+      if (!FDecl->isVariadic())
+        return 0;
+      const auto *FirstToCheckArg = Args[NumArgs - 1]->IgnoreParenCasts();
+      if (!FirstToCheckArg ||
+          FirstToCheckArg->getType().getCanonicalType() !=
+              Context.getBuiltinVaListType().getCanonicalType())
+        return 0;
+      return NumOfParentFunctionParams + FunctionFormatArgumentIndexOffset;
+    }();
+
+    // If there are already attributes which arguments matches arguments
+    // detected in this iteration, do not add new attribute as it would be
+    // duplicate.
+    if (!llvm::any_of(MissingAttributes, [&](const FormatAttr *Attr) {
+          return Attr->getType() == AttrType &&
+                 Attr->getFormatIdx() == StringIndex &&
+                 Attr->getFirstArg() == FirstToCheck;
+        }))
+      MissingAttributes.push_back(FormatAttr::CreateImplicit(
+          getASTContext(), AttrType, StringIndex, FirstToCheck));
+  }
+
+  return MissingAttributes;
+}
+
 //===----------------------------------------------------------------------===//
 // Microsoft specific attribute handlers.
 //===----------------------------------------------------------------------===//
diff --git a/clang/test/Sema/attr-format-missing.c b/clang/test/Sema/attr-format-missing.c
new file mode 100644
index 00000000000000..e887e11d767040
--- /dev/null
+++ b/clang/test/Sema/attr-format-missing.c
@@ -0,0 +1,393 @@
+// RUN: %clang_cc1 -fsyntax-only -verify=expected,c_diagnostics -Wmissing-format-attribute %s
+// RUN: %clang_cc1 -fsyntax-only -Wmissing-format-attribute -fdiagnostics-parseable-fixits %s 2>&1 | FileCheck %s --check-prefixes=CHECK,C-CHECK
+// RUN: %clang_cc1 -fsyntax-only -x c++ -verify=expected,cpp_diagnostics -Wmissing-format-attribute %s
+// RUN: %clang_cc1 -fsyntax-only -x c++ -verify=expected,cpp_diagnostics -std=c++2b -Wmissing-format-attribute %s
+// RUN: %clang_cc1 -fsyntax-only -x c++ -verify=expected,cpp_diagnostics -std=c++23 -Wmissing-format-attribute %s
+// RUN: not %clang_cc1 -fsyntax-only -x c++ -Wmissing-format-attribute -fdiagnostics-parseable-fixits %s 2>&1 | FileCheck %s
+
+#ifndef __cplusplus
+typedef __CHAR16_TYPE__ char16_t;
+typedef __CHAR32_TYPE__ char32_t;
+typedef __WCHAR_TYPE__ wchar_t;
+#endif
+
+typedef __SIZE_TYPE__ size_t;
+typedef __builtin_va_list va_list;
+
+__attribute__((__format__(__printf__, 1, 2)))
+int printf(const char *, ...); // #printf
+
+__attribute__((__format__(__scanf__, 1, 2)))
+int scanf(const char *, ...); // #scanf
+
+__attribute__((__format__(__printf__, 1, 0)))
+int vprintf(const char *, va_list); // #vprintf
+
+__attribute__((__format__(__scanf__, 1, 0)))
+int vscanf(const char *, va_list); // #vscanf
+
+__attribute__((__format__(__printf__, 2, 0)))
+int vsprintf(char *, const char *, va_list); // #vsprintf
+
+__attribute__((__format__(__printf__, 3, 0)))
+int vsnprintf(char *ch, size_t, const char *, va_list); // #vsnprintf
+
+__attribute__((__format__(__scanf__, 1, 4)))
+void f1(char *out, const size_t len, const char *format, ... /* args */) // #f1
+{
+    va_list args;
+    vsnprintf(out, len, format, args); // expected-no-warning@#f1
+}
+
+__attribute__((__format__(__printf__, 1, 4)))
+void f2(char *out, const size_t len, const char *format, ... /* args */) // #f2
+{
+    va_list args;
+    vsnprintf(out, len, format, args); // expected-warning@#f2 {{diagnostic behavior may be improved by adding the 'printf' format attribute to the declaration of 'f2'}}
+                                       // CHECK: fix-it:"{{.*}}":{[[@LINE-4]]:6-[[@LINE-4]]:6}:"__attribute__((format(printf, 3, 4)))"
+}
+
+void f3(char *out, va_list args) // #f3
+{
+    vprintf(out, args); // expected-warning@#f3 {{diagnostic behavior may be improved by adding the 'printf' format attribute to the declaration of 'f3'}}
+                        // CHECK: fix-it:"{{.*}}":{[[@LINE-3]]:6-[[@LINE-3]]:6}:"__attribute__((format(printf, 1, 0)))"
+}
+
+void f4(char* out, ... /* args */) // #f4
+{
+    va_list args;
+    vprintf("test", args); // expected-no-warning@#f4
+
+    const char *ch;
+    vprintf(ch, args); // expected-no-warning@#f4
+}
+
+void f5(va_list args) // #f5
+{
+    char *ch;
+    vscanf(ch, args); // expected-no-warning@#f5
+}
+
+void f6(char *out, va_list args) // #f6
+{
+    char *ch;
+    vprintf(ch, args); // expected-no-warning@#f6
+    vprintf("test", args); // expected-no-warning@#f6
+    vprintf(out, args); // expected-warning@#f6 {{diagnostic behavior may be improved by adding the 'printf' format attribute to the declaration of 'f6'}}
+                        // CHECK: fix-it:"{{.*}}":{[[@LINE-6]]:6-[[@LINE-6]]:6}:"__attribute__((format(printf, 1, 0)))"
+}
+
+void f7(const char *out, ... /* args */) // #f7
+{
+    va_list args;
+
+    vscanf(out, args); // expected-warning@#f7 {{diagnostic behavior may be improved by adding the 'scanf' format attribute to the declaration of 'f7'}}
+                       // CHECK: fix-it:"{{.*}}":{[[@LINE-5]]:6-[[@LINE-5]]:6}:"__attribute__((format(scanf, 1, 2)))"
+}
+
+void f8(const char *out, ... /* args */) // #f8
+{
+    va_list args;
+
+    vscanf(out, args); // expected-no-warning@#f8
+    vprintf(out, args); // expected-no-warning@#f8
+}
+
+void f9(const char out[], ... /* args */) // #f9
+{
+    va_list args;
+    char *ch;
+    vprintf(ch, args); // expected-no-warning
+    vsprintf(ch, out, args); // expected-warning@#f9 {{diagnostic behavior may be improved by adding the 'printf' format attribute to the declaration of 'f9'}}
+                             // CHECK: fix-it:"{{.*}}":{[[@LINE-6]]:6-[[@LINE-6]]:6}:"__attribute__((format(printf, 1, 2)))"
+}
+
+void f10(const wchar_t *out, ... /* args */) // #f10
+{
+    va_list args;
+    vscanf(out, args);
+#if (defined(__aarch64__) && !defined(_WIN64)) || (defined(__arm__) && !defined(_WIN32))
+                        // c_diagnostics-warning@-2 {{incompatible pointer types passing 'const wchar_t *' (aka 'const unsigned int *') to parameter of type 'const char *'}}
+#elif __SIZEOF_WCHAR_T__ == 4
+                        // c_diagnostics-warning@-4 {{incompatible pointer types passing 'const wchar_t *' (aka 'const int *') to parameter of type 'const char *'}}
+#else
+                        // c_diagnostics-warning@-6 {{incompatible pointer types passing 'const wchar_t *' (aka 'const unsigned short *') to parameter of type 'const char *'}}
+#endif
+                        // c_diagnostics-note@#vscanf {{passing argument to parameter here}}
+                        // c_diagnostics-warning@#f10 {{diagnostic behavior may be improved by adding the 'scanf' format attribute to the declaration of 'f10'}}
+                        // C-CHECK: fix-it:"{{.*}}":{[[@LINE-13]]:6-[[@LINE-13]]:6}:"__attribute__((format(scanf, 1, 2)))"
+                        // cpp_diagnostics-error@-11 {{no matching function for call to 'vscanf'}}
+                        // cpp_diagnostics-note@#vscanf {{candidate function not viable: no known conversion from 'const wchar_t *' to 'const char *' for 1st argument}}
+}
+
+void f11(const wchar_t *out, ... /* args */) // #f11
+{
+    va_list args;
+    vscanf((const char *) out, args); // expected-warning@#f11 {{diagnostic behavior may be improved by adding the 'scanf' format attribute to the declaration of 'f11'}}
+                                      // CHECK: fix-it:"{{.*}}":{[[@LINE-4]]:6-[[@LINE-4]]:6}:"__attribute__((format(scanf, 1, 2)))"
+}
+
+void f12(const wchar_t *out, ... /* args */) // #f12
+{
+    va_list args;
+    vscanf((char *) out, args); // expected-warning@#f12 {{diagnostic behavior may be improved by adding the 'scanf' format attribute to the declaration of 'f12'}}
+                                // CHECK: fix-it:"{{.*}}":{[[@LINE-4]]:6-[[@LINE-4]]:6}:"__attribute__((format(sc...
[truncated]

@budimirarandjelovichtec
Copy link
Contributor Author

@AaronBallman
Copy link
Collaborator

Can you explain what changes were made to address the various issues which caused the earlier revert? I've tried diffing the PRs to see what the changes are, but my git-fu is insufficient to the task. :-D

@vitalybuka
Copy link
Collaborator

Can you explain what changes were made to address the various issues which caused the earlier revert? I've tried diffing the PRs to see what the changes are, but my git-fu is insufficient to the task. :-D

Just for convenience I extracted update diff into #106649

@budimirarandjelovichtec
Copy link
Contributor Author

Can you explain what changes were made to address the various issues which caused the earlier revert? I've tried diffing the PRs to see what the changes are, but my git-fu is insufficient to the task. :-D

I made changes in several places. Here are major changes:

  1. determining first argument index. Here I removed dynamic cast to DeclRefAttr of last function argument. In C++ mode casting to DeclRefAttr resulted in unrecognizing argument type as va_list for several architectures and OS. So, some architectures gave 0 as output and others gave non-zero value. Unrecognizing argument type as va_list was a result of defining canonical va_list type differently depending on architecture.
  2. added check if calling function has format attribute before iterating through its format attributes. This check was added because one pointer (AttrType) was initialized in iteration through format attributes and later used to get name if missing attribute was caught. In previous merge this was resulting in throwing error that there were use of uninitialized variable for some architectures.
  3. edited some parts of C code where passed arguments were not valid. Examples are passing &args[0] or char * as va_list argument (functions f8() and f41()).

Relevant parts of code are reviewed in #106649 .

@budimirarandjelovichtec budimirarandjelovichtec force-pushed the missing-format-attribute branch 3 times, most recently from e57e2fb to 11aef7c Compare August 30, 2024 14:53
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@budimirarandjelovichtec
Copy link
Contributor Author

Let's wait few days if there are any opinions. Then, someone from my company will merge it.

Comment on lines 5513 to 5567
if (FDecl->parameters()[OffsetFormatIndex] != FormatArg)
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't we already found FormatArg in FDecl->parameters() in the previous any_of? I presume that parameters have to be unique.

Couldn't we just compare Attr->getFormatIdx() with StringIndex? The attribute that we'll be suggesting below contains exactly that index.

In fact I wonder if we couldn't also rely on the check for duplicates here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous, any_of it was found callee format argument. In this any_of, caller format argument is checked with callee one. If they mismatch, it means that caller misses format attribute.

@budimirarandjelovichtec budimirarandjelovichtec force-pushed the missing-format-attribute branch 2 times, most recently from d845def to d04f97b Compare October 31, 2024 10:22
@budimirarandjelovichtec budimirarandjelovichtec force-pushed the missing-format-attribute branch 5 times, most recently from 33135f7 to e4faa01 Compare November 8, 2024 11:50
@budimirarandjelovichtec budimirarandjelovichtec force-pushed the missing-format-attribute branch 3 times, most recently from 38d9734 to 3ffb989 Compare November 20, 2024 15:35
@budimirarandjelovichtec budimirarandjelovichtec force-pushed the missing-format-attribute branch 3 times, most recently from 848ebca to 1a096f6 Compare December 24, 2024 11:34
@Endilll Endilll removed their request for review January 15, 2025 18:48
@budimirarandjelovichtec
Copy link
Contributor Author

Before looking into details, may I ask you to:

1. Re-base

2. address or reply to existing comments

3. click re-request review.

Thank you!

  1. Rebased to the last commit as of February 14, 2025.

  2. Replied on existing comments.

  3. Re-requested review for all reviewers.

@vitalybuka vitalybuka removed their request for review February 27, 2025 05:30
@budimirarandjelovichtec
Copy link
Contributor Author

Ping @vitalybuka @aaronpuchert

@cor3ntin cor3ntin requested a review from apple-fcloutier May 16, 2025 09:43
Copy link
Contributor

@apple-fcloutier apple-fcloutier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand your change, it can only trip when -Wformat-nonliteral also trips. I think that you should implement this as a note/fixit on -Wformat-nonliteral. You should be able to leverage checkFormatString's value traversal to have the same precision as -Wformat checking. I don't think that your change correctly identifies this case, for instance:

void foo(const char *a, ...) {
	va_list ap;
	const char *const b = a;
	va_start(ap, a);
	vprintf(b, ap);
	va_end(ap);
}

whereas -Wformat will dig that deep.


// va_list is not intended to be passed to variadic function.
if (Callee->isVariadic())
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there's just one code base in the world that does it, but Clang supports multiple format attributes on the same function, where one of them has a non-zero format index and any others pass a va_list. See #129954. In that context, bailing out because the function is variadic is too eager.

if (const auto *FormatArgExpr = dyn_cast<DeclRefExpr>(
Args[OffsetFormatIndex]->IgnoreParenCasts()))
if (FormatStringArg = dyn_cast_or_null<ParmVarDecl>(
FormatArgExpr->getReferencedDeclOfCallee()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct for what you're trying to do, but doesn't clang emit a warning for un-parenthesized assignments in if conditions?

// vector. Vector could contains attributes that only store information about
// format type (format string and first to check argument are set to -1).
namespace {
std::vector<FormatAttr *> MissingAttributes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang does not use global mutable state. You need to find some other way to keep that around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to keep it around at all.

The patch wants to avoid emitting the warning when the same parameter is used in different format functions, so as to avoid suggesting to add multiple incompatible attributes. But since different format functions tend to use a different language, this is probably extremely rare. We should of course try to avoid false positives, but in this case it seems rather hypothetical, and it seems to come with a high cost.

If we stop worrying about that, we can directly emit the warning when checking a call.

Copy link
Member

@aaronpuchert aaronpuchert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand your change, it can only trip when -Wformat-nonliteral also trips. I think that you should implement this as a note/fixit on -Wformat-nonliteral.

Good observation, and I agree. It's even documented that this is one way to fix the warning:

Clang implements this mostly the same way as GCC, but there is a difference for functions that accept a va_list argument (for example, vprintf). GCC does not emit -Wformat-nonliteral warning for calls to such functions. Clang does not warn if the format string comes from a function parameter, where the function is annotated with a compatible attribute, otherwise it warns.

An interesting question is whether we should have a separate warning about forwarding that doesn't warn about other nonliterals. But I don't see why one would want that. After all the idea of these warnings is to follow format strings all the way up the stack until we reach a constant string that we can check.

This could also simplify the change a lot.

// vector. Vector could contains attributes that only store information about
// format type (format string and first to check argument are set to -1).
namespace {
std::vector<FormatAttr *> MissingAttributes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to keep it around at all.

The patch wants to avoid emitting the warning when the same parameter is used in different format functions, so as to avoid suggesting to add multiple incompatible attributes. But since different format functions tend to use a different language, this is probably extremely rare. We should of course try to avoid false positives, but in this case it seems rather hypothetical, and it seems to come with a high cost.

If we stop worrying about that, we can directly emit the warning when checking a call.

diagnoseArgDependentDiagnoseIfAttrs(FD, ThisArg, Args, Loc);
DetectMissingFormatAttributes(FD, Args, Loc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this into the block up there with the comment "Printf and scanf checking"? We care about the same case: calling a function with a FormatAttr. Up there is a loop going over these attributes. Why don't we add this call to the loop, processing a single attribute at a time?

The reason is to avoid adding complexity in the common case of no such attribute. Up there we already have the check, so we're not adding anything.

And I don't think we lose anything. If anything, this should make the check simpler.

  • Callees with multiple attributes are probably rare, but in that case we could still forward multiple format strings as @apple-fcloutier pointed out.
  • If we treat these attributes independently and suggest contradictory attributes, this is probably an issue anyway.

Although in theory I don't see a reason why this is contradictory at all, at least without argument checking: it would simply constrain the format string to strings accepted by multiple formatting languages.

Or, as @apple-fcloutier suggested, just put it into CheckFormatArguments.

Comment on lines +5955 to +5966
// Check if va_list is passed to callee function.
// If va_list is not passed, return.
bool hasVaList = false;
for (const auto *Param : Callee->parameters()) {
if (Param->getOriginalType().getCanonicalType() ==
getASTContext().getBuiltinVaListType().getCanonicalType()) {
hasVaList = true;
break;
}
}
if (!hasVaList)
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attribute is also about checking the format string. So I don't think we should bail out here either. Just forwarding a parameter as format string should be enough to trigger the warning. Only for the fix-it we have to figure out what happens to the arguments.

IdentifierInfo *AttrType;
const ParmVarDecl *FormatStringArg;
if (!llvm::any_of(
Callee->specific_attrs<FormatAttr>(), [&](const FormatAttr *Attr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to call this from the loop in checkCall that goes through FDecl->specific_attrs<FormatAttr>(). Then we'd be looking at a single attribute here.

Comment on lines +5987 to +5988
if (OffsetFormatIndex < 0 || (unsigned)OffsetFormatIndex >= NumArgs)
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this actually happen? I would have thought we get an error when the index is out of bounds and don't even get here.

Comment on lines +6038 to +6045
// Do not add duplicate in a vector of missing format attributes.
if (!llvm::any_of(MissingAttributes, [&](const FormatAttr *Attr) {
return Attr->getType() == AttrType &&
Attr->getFormatIdx() == FormatStringIndex &&
Attr->getFirstArg() == FirstToCheck;
}))
MissingAttributes.push_back(FormatAttr::CreateImplicit(
getASTContext(), AttrType, FormatStringIndex, FirstToCheck, Loc));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just not bother doing this and emit the warning right here.

void f2(char *out, const size_t len, const char *format, ... /* args */) // #f2
{
va_list args;
vsnprintf(out, len, format, args); // expected-warning@#f2 {{diagnostic behavior may be improved by adding the 'printf' format attribute to the declaration of 'f2'}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should warn on the call where we found out. Yes, the declaration is where the fix should be, but this is typically indicated with a note. The warning should be where we see the issue. The issue can not be seen in the declaration, but in the call forwarding the parameter.

Another reason is that the attribute should be added to the declaration in a header file, not to the definition in the source file.

The reason why for example -Wmissing-noreturn behaves different is that there is no place in the function body where we notice that the function never returns. It's a control flow graph walk that finds no paths to a return. On the other hand:

bool b;
[[noreturn]] void f() {
  if (b) return;
}

produces

<stdin>:3:10: warning: function 'f' declared 'noreturn' should not return [-Winvalid-noreturn]
    3 |   if (b) return;
      |          ^
<stdin>:4:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
    4 | }
      | ^

GCC emits the warning on the forwarded parameter. In the printf test case:

    vprintf(out, args);
            ^

Comment on lines +99 to +100
vscanf(out, args);
vprintf(out, args);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I don't see why we shouldn't warn on each of these calls. If the first call triggers a warning, why does the second call make it go away? This might be seen as leading to contradictory suggestions, but this function doesn't really make sense: the same arguments can't be both arguments to printf and scanf, unless the list is empty.


void f16(char *out, va_list args) // #f16
{
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blocks seem to be leftovers from an earlier version where you walked the function body and unpacked CompoundStmts. Now this doesn't make so much sense anymore, so I'd drop them and then drop duplicate test cases.

Comment on lines 5382 to 5479
// va_list is not intended to be passed to variadic function.
if (CalleeFunction->isVariadic())
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your second example is what I meant. The attribute can have first-to-check = 0, in which case only the format string is checked. Such an attribute can be on a non-variadic function. Since we're forwarding the parameter, we should still suggest adding the attribute so that the format string can be checked when f1X is called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clang flag -Wmissing-format-attribute doesn't catch missing attributes
7 participants