Skip to content

Add bugprone-sprintf-argument-overlap #114244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Conversation

ccotter
Copy link
Contributor

@ccotter ccotter commented Oct 30, 2024

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-clang-tools-extra

@llvm/pr-subscribers-clang-tidy

Author: Chris Cotter (ccotter)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/114244.diff

8 Files Affected:

  • (modified) clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp (+3)
  • (modified) clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt (+1)
  • (added) clang-tools-extra/clang-tidy/bugprone/UndefinedSprintfOverlapCheck.cpp (+52)
  • (added) clang-tools-extra/clang-tidy/bugprone/UndefinedSprintfOverlapCheck.h (+33)
  • (modified) clang-tools-extra/docs/ReleaseNotes.rst (+6)
  • (added) clang-tools-extra/docs/clang-tidy/checks/bugprone/undefined-sprintf-overlap.rst (+15)
  • (modified) clang-tools-extra/docs/clang-tidy/checks/list.rst (+2-1)
  • (added) clang-tools-extra/test/clang-tidy/checkers/bugprone/undefined-sprintf-overlap.cpp (+29)
diff --git a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
index 33ac65e715ce81..ac3a08c80d7ae6 100644
--- a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
+++ b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
@@ -86,6 +86,7 @@
 #include "TooSmallLoopVariableCheck.h"
 #include "UncheckedOptionalAccessCheck.h"
 #include "UndefinedMemoryManipulationCheck.h"
+#include "UndefinedSprintfOverlapCheck.h"
 #include "UndelegatedConstructorCheck.h"
 #include "UnhandledExceptionAtNewCheck.h"
 #include "UnhandledSelfAssignmentCheck.h"
@@ -248,6 +249,8 @@ class BugproneModule : public ClangTidyModule {
         "bugprone-unchecked-optional-access");
     CheckFactories.registerCheck<UndefinedMemoryManipulationCheck>(
         "bugprone-undefined-memory-manipulation");
+    CheckFactories.registerCheck<UndefinedSprintfOverlapCheck>(
+        "bugprone-undefined-sprintf-overlap");
     CheckFactories.registerCheck<UndelegatedConstructorCheck>(
         "bugprone-undelegated-constructor");
     CheckFactories.registerCheck<UnhandledSelfAssignmentCheck>(
diff --git a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
index b0a2318acc0597..2403ed665fd5c7 100644
--- a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
+++ b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
@@ -81,6 +81,7 @@ add_clang_library(clangTidyBugproneModule STATIC
   TooSmallLoopVariableCheck.cpp
   UncheckedOptionalAccessCheck.cpp
   UndefinedMemoryManipulationCheck.cpp
+  UndefinedSprintfOverlapCheck.cpp
   UndelegatedConstructorCheck.cpp
   UnhandledExceptionAtNewCheck.cpp
   UnhandledSelfAssignmentCheck.cpp
diff --git a/clang-tools-extra/clang-tidy/bugprone/UndefinedSprintfOverlapCheck.cpp b/clang-tools-extra/clang-tidy/bugprone/UndefinedSprintfOverlapCheck.cpp
new file mode 100644
index 00000000000000..5b5507c6263224
--- /dev/null
+++ b/clang-tools-extra/clang-tidy/bugprone/UndefinedSprintfOverlapCheck.cpp
@@ -0,0 +1,52 @@
+//===--- UndefinedSprintfOverlapCheck.cpp - clang-tidy --------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "UndefinedSprintfOverlapCheck.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+
+using namespace clang::ast_matchers;
+
+namespace clang::tidy::bugprone {
+
+AST_MATCHER_P(CallExpr, hasAnyOtherArgument,
+              ast_matchers::internal::Matcher<Expr>, InnerMatcher) {
+  for (const auto *Arg : llvm::drop_begin(Node.arguments())) {
+    ast_matchers::internal::BoundNodesTreeBuilder Result(*Builder);
+    if (InnerMatcher.matches(*Arg, Finder, &Result)) {
+      *Builder = std::move(Result);
+      return true;
+    }
+  }
+  return false;
+}
+
+void UndefinedSprintfOverlapCheck::registerMatchers(MatchFinder *Finder) {
+  Finder->addMatcher(
+      callExpr(
+          callee(
+              functionDecl(matchesName("(::std)?::(sn?printf)")).bind("decl")),
+          hasArgument(0, ignoringParenImpCasts(
+                             declRefExpr(to(varDecl().bind("firstArg"))))),
+          hasAnyOtherArgument(
+              ignoringParenImpCasts(declRefExpr().bind("overlappingArg")))),
+      this);
+}
+
+void UndefinedSprintfOverlapCheck::check(
+    const MatchFinder::MatchResult &Result) {
+  const auto *OverlappingArg =
+      Result.Nodes.getNodeAs<DeclRefExpr>("overlappingArg");
+  const auto *FirstArg = Result.Nodes.getNodeAs<VarDecl>("firstArg");
+  const auto *FnDecl = Result.Nodes.getNodeAs<FunctionDecl>("decl");
+
+  diag(OverlappingArg->getLocation(), "argument %0 overlaps the first argument "
+                                      "in %1, which is undefined behavior")
+      << FirstArg << FnDecl;
+}
+
+} // namespace clang::tidy::bugprone
diff --git a/clang-tools-extra/clang-tidy/bugprone/UndefinedSprintfOverlapCheck.h b/clang-tools-extra/clang-tidy/bugprone/UndefinedSprintfOverlapCheck.h
new file mode 100644
index 00000000000000..0e100aa1a5bfdf
--- /dev/null
+++ b/clang-tools-extra/clang-tidy/bugprone/UndefinedSprintfOverlapCheck.h
@@ -0,0 +1,33 @@
+//===--- UndefinedSprintfOverlapCheck.h - clang-tidy ------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_UNDEFINEDSPRINTFOVERLAPCHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_UNDEFINEDSPRINTFOVERLAPCHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang::tidy::bugprone {
+
+/// FIXME: Write a short description.
+///
+/// For the user-facing documentation see:
+/// http://clang.llvm.org/extra/clang-tidy/checks/bugprone/undefined-sprintf-overlap.html
+class UndefinedSprintfOverlapCheck : public ClangTidyCheck {
+public:
+  UndefinedSprintfOverlapCheck(StringRef Name, ClangTidyContext *Context)
+      : ClangTidyCheck(Name, Context) {}
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+  bool isLanguageVersionSupported(const LangOptions &LangOpts) const override {
+    return LangOpts.CPlusPlus;
+  }
+};
+
+} // namespace clang::tidy::bugprone
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_UNDEFINEDSPRINTFOVERLAPCHECK_H
diff --git a/clang-tools-extra/docs/ReleaseNotes.rst b/clang-tools-extra/docs/ReleaseNotes.rst
index ccebf74e8a67e7..7285343971189f 100644
--- a/clang-tools-extra/docs/ReleaseNotes.rst
+++ b/clang-tools-extra/docs/ReleaseNotes.rst
@@ -131,6 +131,12 @@ New checks
   Gives warnings for tagged unions, where the number of tags is
   different from the number of data members inside the union.
 
+- New :doc:`bugprone-undefined-sprintf-overlap
+  <clang-tidy/checks/bugprone/undefined-sprintf-overlap>` check.
+
+  Finds calls to sprintf family of functions whose first argument
+  overlaps with a subsequent argument.
+
 - New :doc:`portability-template-virtual-member-function
   <clang-tidy/checks/portability/template-virtual-member-function>` check.
 
diff --git a/clang-tools-extra/docs/clang-tidy/checks/bugprone/undefined-sprintf-overlap.rst b/clang-tools-extra/docs/clang-tidy/checks/bugprone/undefined-sprintf-overlap.rst
new file mode 100644
index 00000000000000..020d18a0802198
--- /dev/null
+++ b/clang-tools-extra/docs/clang-tidy/checks/bugprone/undefined-sprintf-overlap.rst
@@ -0,0 +1,15 @@
+.. title:: clang-tidy - bugprone-undefined-sprintf-overlap
+
+bugprone-undefined-sprintf-overlap
+==================================
+
+Warns if any arguments to the sprintf family of functions overlap with the
+first argument.
+
+.. code-block:: c++
+
+    char buf[20] = {"hi"};
+    sprintf(buf, "%s%d", buf, 0);
+
+C99 and POSIX.1-2001 states that if copying were to take place between objects
+that overlap, the result is undefined.
diff --git a/clang-tools-extra/docs/clang-tidy/checks/list.rst b/clang-tools-extra/docs/clang-tidy/checks/list.rst
index d731b13fc0df44..41b7d92f94f90c 100644
--- a/clang-tools-extra/docs/clang-tidy/checks/list.rst
+++ b/clang-tools-extra/docs/clang-tidy/checks/list.rst
@@ -115,8 +115,8 @@ Clang-Tidy Checks
    :doc:`bugprone-multiple-new-in-one-expression <bugprone/multiple-new-in-one-expression>`,
    :doc:`bugprone-multiple-statement-macro <bugprone/multiple-statement-macro>`,
    :doc:`bugprone-no-escape <bugprone/no-escape>`,
-   :doc:`bugprone-nondeterministic-pointer-iteration-order <bugprone/nondeterministic-pointer-iteration-order>`,
    :doc:`bugprone-non-zero-enum-to-bool-conversion <bugprone/non-zero-enum-to-bool-conversion>`,
+   :doc:`bugprone-nondeterministic-pointer-iteration-order <bugprone/nondeterministic-pointer-iteration-order>`,
    :doc:`bugprone-not-null-terminated-result <bugprone/not-null-terminated-result>`, "Yes"
    :doc:`bugprone-optional-value-conversion <bugprone/optional-value-conversion>`, "Yes"
    :doc:`bugprone-parent-virtual-call <bugprone/parent-virtual-call>`, "Yes"
@@ -153,6 +153,7 @@ Clang-Tidy Checks
    :doc:`bugprone-too-small-loop-variable <bugprone/too-small-loop-variable>`,
    :doc:`bugprone-unchecked-optional-access <bugprone/unchecked-optional-access>`,
    :doc:`bugprone-undefined-memory-manipulation <bugprone/undefined-memory-manipulation>`,
+   :doc:`bugprone-undefined-sprintf-overlap <bugprone/undefined-sprintf-overlap>`, "Yes"
    :doc:`bugprone-undelegated-constructor <bugprone/undelegated-constructor>`,
    :doc:`bugprone-unhandled-exception-at-new <bugprone/unhandled-exception-at-new>`,
    :doc:`bugprone-unhandled-self-assignment <bugprone/unhandled-self-assignment>`,
diff --git a/clang-tools-extra/test/clang-tidy/checkers/bugprone/undefined-sprintf-overlap.cpp b/clang-tools-extra/test/clang-tidy/checkers/bugprone/undefined-sprintf-overlap.cpp
new file mode 100644
index 00000000000000..24d2939aee2c74
--- /dev/null
+++ b/clang-tools-extra/test/clang-tidy/checkers/bugprone/undefined-sprintf-overlap.cpp
@@ -0,0 +1,29 @@
+// RUN: %check_clang_tidy %s bugprone-undefined-sprintf-overlap %t
+
+using size_t = decltype(sizeof(int));
+
+extern "C" int sprintf(char *s, const char *format, ...);
+extern "C" int snprintf(char *s, size_t n, const char *format, ...);
+
+namespace std {
+  int snprintf(char *s, size_t n, const char *format, ...);
+}
+
+void first_arg_overlaps() {
+  char buf[10];
+  sprintf(buf, "%s", buf);
+  // CHECK-MESSAGES: :[[@LINE-1]]:22: warning: argument 'buf' overlaps the first argument in 'sprintf', which is undefined behavior [bugprone-undefined-sprintf-overlap]
+  snprintf(buf, sizeof(buf), "%s", buf);
+  // CHECK-MESSAGES: :[[@LINE-1]]:36: warning: argument 'buf' overlaps the first argument in 'snprintf', which is undefined behavior [bugprone-undefined-sprintf-overlap]
+  std::snprintf(buf, sizeof(buf), "%s", buf);
+  // CHECK-MESSAGES: :[[@LINE-1]]:41: warning: argument 'buf' overlaps the first argument in 'snprintf', which is undefined behavior [bugprone-undefined-sprintf-overlap]
+
+  char* c = &buf[0];
+  sprintf(c, "%s", c);
+  // CHECK-MESSAGES: :[[@LINE-1]]:20: warning: argument 'c' overlaps the first argument in 'sprintf', which is undefined behavior [bugprone-undefined-sprintf-overlap]
+  snprintf(c, sizeof(buf), "%s", c);
+  // CHECK-MESSAGES: :[[@LINE-1]]:34: warning: argument 'c' overlaps the first argument in 'snprintf', which is undefined behavior [bugprone-undefined-sprintf-overlap]
+
+  snprintf(c, sizeof(buf), "%s%s", c, c);
+  // CHECK-MESSAGES: :[[@LINE-1]]:36: warning: argument 'c' overlaps the first argument in 'snprintf', which is undefined behavior [bugprone-undefined-sprintf-overlap]
+}

Copy link

github-actions bot commented Oct 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ccotter ccotter changed the title Add bugprone-sprintf-overlap Add bugprone-undefined-sprintf-overlap Oct 31, 2024
@ccotter
Copy link
Contributor Author

ccotter commented Nov 9, 2024

@PiotrZSL @5chmidti mind giving this a review?

Copy link
Contributor

@5chmidti 5chmidti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some false-negatives that exist for this check:

E.g.,

  // from tests
  char bufss[10][10][10];
  sprintf(bufss[0][1], "%s", bufss[0][1]);

or *( *(bufss + 0) + 1). Though these can border on requiring, e.g., symbolic execution due to the potential complexity, and there is IMO no need to support expressions that are arbitrarily complex, but two levels of pointer indirections sounds like a solid basis to me. (the note on complexity also applies to the other points)

Also, obj.bufs[1].

The check is also only considering subscript operators that are using integer literals as offsets, and not expressions that evaluate to a constant at compile time (1+1), or plain variables. If the expression inside [] has no side effects, they will be the same and the access is overlapping: sprintf(bufs[n], "%s". bufs[n]).

WDYT? It would be nice to have some of those cases detectable

Copy link
Contributor

@nicovank nicovank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions regarding the motivation behind adding this check:

  1. How common is this bug?
  2. What happens in practice at runtime?
  3. What are existing ways of detecting this, if any?

@nicovank
Copy link
Contributor

Oops, crossed signals with @5chmidti. Hopefully not too much overlap sorry.

@ccotter
Copy link
Contributor Author

ccotter commented Jan 5, 2025

  1. How common is this bug?

In one of our internal codebases, a sample of about 1500 files found just over 2% of the files had this bug. As we migrated platforms/standard libraries, the behavior which was previously correct (by chance) changed to be incorrect.

  1. What happens in practice at runtime?

In practice, the formatted string can be incorrect (but no crash etc). One example is https://godbolt.org/z/3GdnPrsYj

char buf[10];
sprintf(buf, "%s", "12");
sprintf(buf, "%s%s", "34", buf);
printf("/%s/\n", buf); // 3434
  1. What are existing ways of detecting this, if any?

None to my knowledge. After finding a couple occurrences of this bug internally, I realized clang-query/clang-tidy could easily catch this.

@ccotter
Copy link
Contributor Author

ccotter commented Jan 5, 2025

Thanks both @nicovank @5chmidti for the overlapping reviews. I got sidetracked, but am coming back to hopefully finish this up.

Re: supporting more use cases, I found nearly all bugs in our codebase were of the simple form sprintf(a, "", ..., a, ...) with a simple DeclRefExpr. None were of the form sprintf(a[0][0], "", ..., a[0][0], ...) (at least, none that I couple find with a more naive grep and visual inspection), or anything else suggested above. And, since matching on other expressions more generically like more deeply nested arrays or (buf+10) led to more complex matchers, I opted for the simpler cases my check can find in its current form.

Let me look at utils::areStatementsIdentical to see if this could help expand the forms that the tool can find.

@ccotter
Copy link
Contributor Author

ccotter commented Jan 6, 2025

Ok, all comments should be addressed. The check now uses areStatementsIdentical to more generically find buggy patterns (thanks for that suggestion!).

@ccotter ccotter changed the title Add bugprone-undefined-sprintf-overlap Add bugprone-sprintf-argument-overlap Jan 6, 2025
@carlosgalvezp
Copy link
Contributor

carlosgalvezp commented Jan 6, 2025

Sorry for the late review, but isn't this already covered by -Wall since GCC 9? At least the example from the check documentation:

https://godbolt.org/z/es37ozqe9

warning: 'sprintf' argument 3 overlaps destination object 'buf'

If this check is not redundant given the above, it would be good document what the differences are and what failure modes it detects that GCC doesn't.

@ccotter
Copy link
Contributor Author

ccotter commented Jan 7, 2025

Oh, you are entirely right. I completely missed that gcc had this check. All of my test cases are covered by gcc's check, and gcc's check is a bit smarter than mine (sprintf(stp->buf, "%s", st1.buf) is caught by gcc, but not my check).

Is this still worth adding here? I didn't see an equivalent in clang's -Wall.

@carlosgalvezp
Copy link
Contributor

Is this still worth adding here? I didn't see an equivalent in clang's -Wall.

Indeed Clang is missing this check in -Wall. For parity with GCC it probably makes sense to add it to -Wall as well instead of being a clang-tidy check. What do you think @AaronBallman ?

@AaronBallman
Copy link
Collaborator

Is this still worth adding here? I didn't see an equivalent in clang's -Wall.

Indeed Clang is missing this check in -Wall. For parity with GCC it probably makes sense to add it to -Wall as well instead of being a clang-tidy check. What do you think @AaronBallman ?

In general, this requires dataflow analysis to get a low false-positive rate, which is typically handled via static analysis or a CFG-based analysis (which are expensive to construct for -Wall and thus usually opt-in, but we do have some such checks that have crept in so there is precedence).

Ideally, I think this check would either live in Clang under an opt-in analysis warning flag (similar to implicit fallthrough checks) or in the Clang static analyzer (CC @steakhal @haoNoQ @Xazax-hun for additional opinions). If we did that, I think this check should be generalized a bit more. 1) We could have a restrict-based check for any APIs with pointers marked restrict. 2) We could handle C library functions specially: sprintf and friends, mbstowcs, wcstombs, memcpy, memccpy, strcpy, etc.

sprintf((stp->buf), "%s", stp->buf);
// CHECK-MESSAGES: :[[@LINE-1]]:29: warning: the 3rd argument in 'sprintf' overlaps the 1st argument, which is undefined behavior [bugprone-sprintf-argument-overlap]
stp = &st1;
sprintf(stp->buf, "%s", st1.buf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add comments for false negatives.

@Xazax-hun
Copy link
Collaborator

this requires dataflow analysis to get a low false-positive rate

I think it might be possible to have low false positive rates without dataflow analysis. Currently, it looks like the check is looking for syntactically identical subexpressions. Those tend to overlap in most cases. That being said, I can imagine some cases where it is not really the case, like:

sprintf(st1.buf, return_format_string_and_modify_buf("%s", &st1.buf), st1.buf);

To filter these, the check would need to ensure that the pointer cannot be modified while the arguments are evaluated.

Are there any other false positives you anticipate?

That being said, we'd absolutely need dataflow analysis to reduce the number of false negatives when the arguments are not syntactically equivalent.

I think it would be OK to have a fast, syntactic check in -Wall and have a smarted dataflow-based check as opt-in somewhere else (can be compiler, tidy, or the clang static analyzer). I think implementing this check in the clang static analyzer would be relatively straightforward.

I think this check should be generalized a bit more

Huge +1. I like the generalization ideas. No matter in what form we land this, I think people would benefit a lot from that.

@AaronBallman
Copy link
Collaborator

this requires dataflow analysis to get a low false-positive rate

I think it might be possible to have low false positive rates without dataflow analysis. Currently, it looks like the check is looking for syntactically identical subexpressions. Those tend to overlap in most cases. That being said, I can imagine some cases where it is not really the case, like:

sprintf(st1.buf, return_format_string_and_modify_buf("%s", &st1.buf), st1.buf);

To filter these, the check would need to ensure that the pointer cannot be modified while the arguments are evaluated.

Are there any other false positives you anticipate?

The biggest false positive I'm worried about are successive uses of sprintf (or similar) where the buffer pointer is incremented by the return value and then used in a subsequent call. Basically, any pointer arithmetic on the buffer will lead to potential false positives.

That being said, we'd absolutely need dataflow analysis to reduce the number of false negatives when the arguments are not syntactically equivalent.

100% agreed.

I think it would be OK to have a fast, syntactic check in -Wall and have a smarted dataflow-based check as opt-in somewhere else (can be compiler, tidy, or the clang static analyzer). I think implementing this check in the clang static analyzer would be relatively straightforward.

That's an interesting thought! I think that might be plausible, depending on how good of a true positive rate we find with the -Wall check.

Copy link
Member

@PiotrZSL PiotrZSL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return Matched;
}

AST_MATCHER_P(Stmt, identicalTo, std::string, ID) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is isStatementIdenticalToBoundNode in utils/Matchers.h

}

private:
const std::string SprintfRegex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: llvm::StringRef should be fine here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants