Skip to content

[analyzer] Split TaintPropagation checker into reporting and modeling checkers #98157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 10, 2024

Conversation

dkrupp
Copy link
Contributor

@dkrupp dkrupp commented Jul 9, 2024

Taint propagation is a a generic modeling feature of the Clang Static Analyzer which many other checkers depend on. Therefore GenericTaintChecker is split into a TaintPropagation modeling checker and a GenericTaint reporting checker.

Other checkers, which report taint related warnings, should set the TaintPropagation checker as their dependency.

@dkrupp dkrupp requested review from NagyDonat and steakhal July 9, 2024 13:28
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Jul 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 9, 2024

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Daniel Krupp (dkrupp)

Changes

…ling checkers

Taint propagation is a a generic modeling feature of the Clang Static Analyzer which many other checkers depend on. Therefore GenericTaintChecker is split into a TaintPropagation modeling checker and a GenericTaint reporting checker.

Other checkers, which report taint related warnings, should set the TaintPropagation checker as their dependency.


Full diff: https://github.com/llvm/llvm-project/pull/98157.diff

4 Files Affected:

  • (modified) clang/docs/analyzer/checkers.rst (+3-3)
  • (modified) clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst (+7-4)
  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+8-4)
  • (modified) clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp (+23-5)
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 42c097d973d53..b7a3a4ebd927c 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -3011,10 +3011,10 @@ alpha.security.taint
 Checkers implementing
 `taint analysis <https://en.wikipedia.org/wiki/Taint_checking>`_.
 
-.. _alpha-security-taint-TaintPropagation:
+.. _alpha-security-taint-GenericTaint:
 
-alpha.security.taint.TaintPropagation (C, C++)
-""""""""""""""""""""""""""""""""""""""""""""""
+alpha.security.taint.GenericTaint (C, C++)
+""""""""""""""""""""""""""""""""""""""""""
 
 Taint analysis identifies potential security vulnerabilities where the
 attacker can inject malicious data to the program to execute an attack
diff --git a/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst b/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
index 94db84494e00b..f0117cbfe02ad 100644
--- a/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
+++ b/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
@@ -2,10 +2,13 @@
 Taint Analysis Configuration
 ============================
 
-The Clang Static Analyzer uses taint analysis to detect security-related issues in code.
-The backbone of taint analysis in the Clang SA is the `GenericTaintChecker`, which the user can access via the :ref:`alpha-security-taint-TaintPropagation` checker alias and this checker has a default taint-related configuration.
-The built-in default settings are defined in code, and they are always in effect once the checker is enabled, either directly or via the alias.
-The checker also provides a configuration interface for extending the default settings by providing a configuration file in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format.
+The Clang Static Analyzer uses taint analysis to detect injection vulnerability related issues in code.
+The backbone of taint analysis in the Clang SA is the `TaintPropagation` modeling checker.
+The reports are emitted via the :ref:`alpha-security-taint-GenericTaint` checker.
+The `TaintPropagation` checker has a default taint-related configuration.
+The built-in default settings are defined in code, and they are always in effect.
+The checker also provides a configuration interface for extending the default settings via the ``alpha.security.taint.TaintPropagation:Config`` checker config parameter
+by providing a configuration file to the in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format.
 This documentation describes the syntax of the configuration file and gives the informal semantics of the configuration options.
 
 .. contents::
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index 6e224a4e098ad..ec5dbd28a5272 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -1071,7 +1071,7 @@ def ReturnPointerRangeChecker : Checker<"ReturnPtrRange">,
 
 let ParentPackage = Taint in {
 
-def GenericTaintChecker : Checker<"TaintPropagation">,
+def TaintPropagationChecker : Checker<"TaintPropagation">, // Modelling checker
   HelpText<"Generate taint information used by other checkers">,
   CheckerOptions<[
     CmdLineOption<String,
@@ -1080,6 +1080,12 @@ def GenericTaintChecker : Checker<"TaintPropagation">,
                   "",
                   InAlpha>,
   ]>,
+  Documentation<NotDocumented>,
+  Hidden;
+
+def GenericTaintChecker : Checker<"GenericTaint">,
+  HelpText<"Reports potential injection vulnerabilities">,
+  Dependencies<[TaintPropagationChecker]>,
   Documentation<HasDocumentation>;
 
 } // end "alpha.security.taint"
@@ -1717,9 +1723,7 @@ let ParentPackage = TaintOptIn in {
 def TaintedAllocChecker: Checker<"TaintedAlloc">,
   HelpText<"Check for memory allocations, where the size parameter "
            "might be a tainted (attacker controlled) value.">,
-  Dependencies<[DynamicMemoryModeling]>,
-  //FIXME: GenericTaintChecker should be a dependency, but only after it
-  //is transformed into a modeling checker
+  Dependencies<[DynamicMemoryModeling, TaintPropagationChecker]>,
   Documentation<HasDocumentation>;
 
 } // end "optin.taint"
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index eb40a4cd3d902..578db0dcea277 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -27,6 +27,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
 #include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
 #include "llvm/Support/YAMLTraits.h"
 
 #include <limits>
@@ -391,8 +392,10 @@ class GenericTaintChecker : public Checker<check::PreCall, check::PostCall> {
   bool generateReportIfTainted(const Expr *E, StringRef Msg,
                                CheckerContext &C) const;
 
+  bool isTaintReporterCheckerEnabled = false;
+  CheckerNameRef reporterCheckerName;
 private:
-  const BugType BT{this, "Use of Untrusted Data", categories::TaintedData};
+  mutable std::unique_ptr<BugType>  BT;
 
   bool checkUncontrolledFormatString(const CallEvent &Call,
                                      CheckerContext &C) const;
@@ -1033,6 +1036,8 @@ bool GenericTaintRule::UntrustedEnv(CheckerContext &C) {
 bool GenericTaintChecker::generateReportIfTainted(const Expr *E, StringRef Msg,
                                                   CheckerContext &C) const {
   assert(E);
+  if (!isTaintReporterCheckerEnabled)
+    return false;
   std::optional<SVal> TaintedSVal =
       getTaintedPointeeOrPointer(C.getState(), C.getSVal(E));
 
@@ -1040,13 +1045,16 @@ bool GenericTaintChecker::generateReportIfTainted(const Expr *E, StringRef Msg,
     return false;
 
   // Generate diagnostic.
-  if (ExplodedNode *N = C.generateNonFatalErrorNode()) {
-    auto report = std::make_unique<PathSensitiveBugReport>(BT, Msg, N);
+  if (!BT)
+    BT.reset(new BugType(
+          reporterCheckerName, "Use of Untrusted Data", categories::TaintedData));
+  static CheckerProgramPointTag Tag(reporterCheckerName, Msg);
+  if (ExplodedNode *N = C.generateNonFatalErrorNode(C.getState(), &Tag)) {
+    auto report = std::make_unique<PathSensitiveBugReport>(*BT, Msg, N);
     report->addRange(E->getSourceRange());
     for (auto TaintedSym : getTaintedSymbols(C.getState(), *TaintedSVal)) {
       report->markInteresting(TaintedSym);
     }
-
     C.emitReport(std::move(report));
     return true;
   }
@@ -1122,10 +1130,20 @@ void GenericTaintChecker::taintUnsafeSocketProtocol(const CallEvent &Call,
 }
 
 /// Checker registration
-void ento::registerGenericTaintChecker(CheckerManager &Mgr) {
+void ento::registerTaintPropagationChecker(CheckerManager &Mgr) {
   Mgr.registerChecker<GenericTaintChecker>();
 }
 
+bool ento::shouldRegisterTaintPropagationChecker(const CheckerManager &mgr) {
+  return true;
+}
+
+void ento::registerGenericTaintChecker(CheckerManager &Mgr) {
+  GenericTaintChecker *checker = Mgr.getChecker<GenericTaintChecker>();
+  checker->isTaintReporterCheckerEnabled = true;
+  checker->reporterCheckerName = Mgr.getCurrentCheckerName();
+}
+
 bool ento::shouldRegisterGenericTaintChecker(const CheckerManager &mgr) {
   return true;
 }

Copy link

github-actions bot commented Jul 9, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

…ling checkers

Taint propagation is a a generic modeling feature of the
Clang Static Analyzer which many other checkers depend on.
Therefore GenericTaintChecker is split into a TaintPropagation modeling
checker and a GenericTaint reporting checker.

Other checkers, which report taint related warnings, should
set the TaintPropagation checker as their dependency.
Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, I added some minor remarks in inline comments.

Also note that with this change we can finally remove the note

The ``alpha.security.taint.TaintPropagation`` checker also needs to be enabled for
this checker to give warnings.

from the documentation of optin.taint.TaintedAlloc (because the TaintPropagation modeling checker is now a dependency of it).

private:
const BugType BT{this, "Use of Untrusted Data", categories::TaintedData};
mutable std::unique_ptr<BugType> BT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit sad that this mutable unique_ptr trickery reappears in this checker, but I don't see a better alternative ☹️

void ento::registerGenericTaintChecker(CheckerManager &Mgr) {
GenericTaintChecker *checker = Mgr.getChecker<GenericTaintChecker>();
checker->isTaintReporterCheckerEnabled = true;
checker->reporterCheckerName = Mgr.getCurrentCheckerName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the initialization of BT to this point -- then it doesn't need to be a mutable and you won't need the string data member reporterCheckerName.

Notes:

  • You may turn BT into a public data member -- its visibility is still limited to this one source file.
  • BugType is a very simple type (essentially a plain struct of three strings), there is no reason to postpone its initialization once you have all the data.
  • You can use BugType::getCheckerName() to get the checker name to construct the CheckerProgramPointTag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this make it possible to eliminate the unique_ptr? I really liked the look without them :)
In other words, what forces us to use unique_ptrs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use that unique_ptr has a null / empty state in addition to representing the "real" BugType values. However, std::optional would be completely sufficient for this goal (here and AFAIK in every checker that uses unique_ptrs to juggle multiple BugTypes).

If we really want, in this checker we could use a "bare" BugType data member if we leave it uninitialized in the constructor of the checker class (or initialize it with dummy values) and then unconditionally overwrite it with the "real" BugType in this register... function. (Other checkers have lazy initialization patterns that check the nullness of the BugType.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we initialize it regardless if the given checker is enabled or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this way we can access the checker name specified in Checkers.td via the function Mgr.getCurrentCheckerName();.

When the checker class corresponds to just one checker defined in Checkers.td we can use the alternative constructor of BugType that takes the checker object (this) as a first argument and then queries the name from the checker object (which can save a single name automatically).

When a single class implements multiple checkers, we need to explicitly pass the right name to the BugType constructor, and so we need to either (1) postpone the construction of the BugType to a point after the registration or (2) duplicate the checker name between Checkers.td and the source file.

AFAIK we always choose option (1), but I didn't check this systematically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is really unfortunate.
WDYT about hard-coding the checker name, as a string, and asserting that it's the same as the one in checkers td inside the register checker call, with a comment stating that you should also update the BugType accordingly.

Copy link
Contributor

@NagyDonat NagyDonat Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that your hardcode-and-assert plan is acceptable, but the current status of the PR (which uses std::optional and emplace()) is slightly better.

Overall, I think this is a systemic issue that should be solved by implementing a clear framework for supporting all the "one class -- multiple checkers" situations.

For example introduce a virtual base class that looks like

// CheckKind is an enum type
template <typename CheckKind>
class MultiChecker : public CheckerBase {
    constexpr size_t NumKinds = static_cast<size_t>(CheckKind::NumKinds);
    bool IsEnabled[NumKinds];
    StringRef CheckerNames[NumKinds];
  public:
    void enableCheck(CheckKind CK, StringRef CheckerName);
    StringRef getCheckerName(CheckKind CK) const;   
    StringRef getCheckerName(int CK) const {
      return getCheckerName(static_cast<CheckKind>(CK));
    }
    bool isEnabled(CheckKind CK) const;
};

and then extend the definition of BugType with

  private:
    int CK;  // We can use a single integer to store all the different CheckKind enum types.
  public:
   template <typename CheckKind>
   BugType(const MultiChecker<CheckKind> *Checker, CheckKind Kind, StringRef Desc, StringRef Cat, bool SuppressOnSink)
     : Checker(Checker), CK(static_cast<int>(Kind)), /*initialize other fields as in other constructors */ {}
    /* in getCheckerName() use Checker and CK to get the correct checker name if these are present */

and with this framework a the implementation of multiple checkers in a single class could look like

enum class CheckKind {
  TaintPropagation,
  GenericTaint,
  NumKinds
};
class GenericTaintChecker: public MultiChecker<CheckKind> /*other base classes*/ {
  BugType BT{this, CheckKind::GenericTaint, "Use of Untrusted Data", categories::TaintedData};
  // Here the other part is a modeling checker so it doesn't have a bug type,
  // but in general there would be multiple declarations like this.
  
  // ...
};
// use isEnabled(CheckKind::GenericTaint) for checking whether that part is enabled
// call enableCheck(CheckKind::/*something*/, Mgr.getCurrentCheckerName())
//    within the `register` functions

Note that this is a very rough draft and probably has some ugly parts and mistakes, but I hope that it demonstrates the core idea.

The built-in default settings are defined in code, and they are always in effect once the checker is enabled, either directly or via the alias.
The checker also provides a configuration interface for extending the default settings by providing a configuration file in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format.
The Clang Static Analyzer uses taint analysis to detect injection vulnerability related issues in code.
The backbone of taint analysis in the Clang SA is the ``TaintPropagation`` modeling checker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit strange that the TaintPropagation checker is hidden from the user and e.g. it doesn't have a documentation, but it still mentioned on this page and it still appears in the name of alpha.security.taint.TaintPropagation:Config.

I don't think that this is a blocking issue for this commit, but I think that it would be important to resolve this in the future. I can imagine two main solutions:

  • We double down on the idea that "modeling checkers are hidden internal parts of the analyzer engine", eliminate all references to TaintPropagation and convert alpha.security.taint.TaintPropagation:Config into an option of the analyzer itself.
  • We document the modeling checkers (probably in a separate section, as the alpha and debug checkers are already separate sections within the checker list). This would allow us to have modeling checkers that can be customized/configured or even disabled by the user.

@haoNoQ @Xazax-hun @steakhal What do you think about this? Which direction would be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd prefer accepting modeling checkers as a thing and have configs for them. I'd rather not make this as an option to the analyzer itself.

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the separation. Thanks.

@dkrupp
Copy link
Contributor Author

dkrupp commented Jul 10, 2024

Thanks for the review. I updated the patch with your suggestions.
-std::unique_pointer changed to std::optional
-I fixed documentation related grammatical and refernce errors.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the updates.

@steakhal Is it OK for you if we merge this?

@NagyDonat
Copy link
Contributor

(By the way, this change doesn't have significant user-facing parts, so I don't think that we need to mention it in the release notes.)

@NagyDonat NagyDonat requested a review from steakhal July 10, 2024 13:03
@@ -1046,10 +1044,7 @@ bool GenericTaintChecker::generateReportIfTainted(const Expr *E, StringRef Msg,
return false;

// Generate diagnostic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Perhaps add an assert(BT) here for the sake of paranoia?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are there, maybe we should also assert that inside the register call, before emplacing it was empty.

@steakhal steakhal changed the title [analyzer] Splitting TaintPropagation checker into reporting and mode… [analyzer] Split TaintPropagation checker into reporting and modeling checkers Jul 10, 2024
Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dkrupp dkrupp merged commit 6002e2f into llvm:main Jul 10, 2024
5 of 7 checks passed
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
… checkers (llvm#98157)

Taint propagation is a a generic modeling feature of the Clang Static
Analyzer which many other checkers depend on. Therefore
GenericTaintChecker is split into a TaintPropagation modeling checker
and a GenericTaint reporting checker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants