-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[analyzer] Split TaintPropagation checker into reporting and modeling checkers #98157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: Daniel Krupp (dkrupp) Changes…ling checkers Taint propagation is a a generic modeling feature of the Clang Static Analyzer which many other checkers depend on. Therefore GenericTaintChecker is split into a TaintPropagation modeling checker and a GenericTaint reporting checker. Other checkers, which report taint related warnings, should set the TaintPropagation checker as their dependency. Full diff: https://github.com/llvm/llvm-project/pull/98157.diff 4 Files Affected:
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 42c097d973d53..b7a3a4ebd927c 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -3011,10 +3011,10 @@ alpha.security.taint
Checkers implementing
`taint analysis <https://en.wikipedia.org/wiki/Taint_checking>`_.
-.. _alpha-security-taint-TaintPropagation:
+.. _alpha-security-taint-GenericTaint:
-alpha.security.taint.TaintPropagation (C, C++)
-""""""""""""""""""""""""""""""""""""""""""""""
+alpha.security.taint.GenericTaint (C, C++)
+""""""""""""""""""""""""""""""""""""""""""
Taint analysis identifies potential security vulnerabilities where the
attacker can inject malicious data to the program to execute an attack
diff --git a/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst b/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
index 94db84494e00b..f0117cbfe02ad 100644
--- a/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
+++ b/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
@@ -2,10 +2,13 @@
Taint Analysis Configuration
============================
-The Clang Static Analyzer uses taint analysis to detect security-related issues in code.
-The backbone of taint analysis in the Clang SA is the `GenericTaintChecker`, which the user can access via the :ref:`alpha-security-taint-TaintPropagation` checker alias and this checker has a default taint-related configuration.
-The built-in default settings are defined in code, and they are always in effect once the checker is enabled, either directly or via the alias.
-The checker also provides a configuration interface for extending the default settings by providing a configuration file in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format.
+The Clang Static Analyzer uses taint analysis to detect injection vulnerability related issues in code.
+The backbone of taint analysis in the Clang SA is the `TaintPropagation` modeling checker.
+The reports are emitted via the :ref:`alpha-security-taint-GenericTaint` checker.
+The `TaintPropagation` checker has a default taint-related configuration.
+The built-in default settings are defined in code, and they are always in effect.
+The checker also provides a configuration interface for extending the default settings via the ``alpha.security.taint.TaintPropagation:Config`` checker config parameter
+by providing a configuration file to the in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format.
This documentation describes the syntax of the configuration file and gives the informal semantics of the configuration options.
.. contents::
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index 6e224a4e098ad..ec5dbd28a5272 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -1071,7 +1071,7 @@ def ReturnPointerRangeChecker : Checker<"ReturnPtrRange">,
let ParentPackage = Taint in {
-def GenericTaintChecker : Checker<"TaintPropagation">,
+def TaintPropagationChecker : Checker<"TaintPropagation">, // Modelling checker
HelpText<"Generate taint information used by other checkers">,
CheckerOptions<[
CmdLineOption<String,
@@ -1080,6 +1080,12 @@ def GenericTaintChecker : Checker<"TaintPropagation">,
"",
InAlpha>,
]>,
+ Documentation<NotDocumented>,
+ Hidden;
+
+def GenericTaintChecker : Checker<"GenericTaint">,
+ HelpText<"Reports potential injection vulnerabilities">,
+ Dependencies<[TaintPropagationChecker]>,
Documentation<HasDocumentation>;
} // end "alpha.security.taint"
@@ -1717,9 +1723,7 @@ let ParentPackage = TaintOptIn in {
def TaintedAllocChecker: Checker<"TaintedAlloc">,
HelpText<"Check for memory allocations, where the size parameter "
"might be a tainted (attacker controlled) value.">,
- Dependencies<[DynamicMemoryModeling]>,
- //FIXME: GenericTaintChecker should be a dependency, but only after it
- //is transformed into a modeling checker
+ Dependencies<[DynamicMemoryModeling, TaintPropagationChecker]>,
Documentation<HasDocumentation>;
} // end "optin.taint"
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index eb40a4cd3d902..578db0dcea277 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -27,6 +27,7 @@
#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
#include "llvm/Support/YAMLTraits.h"
#include <limits>
@@ -391,8 +392,10 @@ class GenericTaintChecker : public Checker<check::PreCall, check::PostCall> {
bool generateReportIfTainted(const Expr *E, StringRef Msg,
CheckerContext &C) const;
+ bool isTaintReporterCheckerEnabled = false;
+ CheckerNameRef reporterCheckerName;
private:
- const BugType BT{this, "Use of Untrusted Data", categories::TaintedData};
+ mutable std::unique_ptr<BugType> BT;
bool checkUncontrolledFormatString(const CallEvent &Call,
CheckerContext &C) const;
@@ -1033,6 +1036,8 @@ bool GenericTaintRule::UntrustedEnv(CheckerContext &C) {
bool GenericTaintChecker::generateReportIfTainted(const Expr *E, StringRef Msg,
CheckerContext &C) const {
assert(E);
+ if (!isTaintReporterCheckerEnabled)
+ return false;
std::optional<SVal> TaintedSVal =
getTaintedPointeeOrPointer(C.getState(), C.getSVal(E));
@@ -1040,13 +1045,16 @@ bool GenericTaintChecker::generateReportIfTainted(const Expr *E, StringRef Msg,
return false;
// Generate diagnostic.
- if (ExplodedNode *N = C.generateNonFatalErrorNode()) {
- auto report = std::make_unique<PathSensitiveBugReport>(BT, Msg, N);
+ if (!BT)
+ BT.reset(new BugType(
+ reporterCheckerName, "Use of Untrusted Data", categories::TaintedData));
+ static CheckerProgramPointTag Tag(reporterCheckerName, Msg);
+ if (ExplodedNode *N = C.generateNonFatalErrorNode(C.getState(), &Tag)) {
+ auto report = std::make_unique<PathSensitiveBugReport>(*BT, Msg, N);
report->addRange(E->getSourceRange());
for (auto TaintedSym : getTaintedSymbols(C.getState(), *TaintedSVal)) {
report->markInteresting(TaintedSym);
}
-
C.emitReport(std::move(report));
return true;
}
@@ -1122,10 +1130,20 @@ void GenericTaintChecker::taintUnsafeSocketProtocol(const CallEvent &Call,
}
/// Checker registration
-void ento::registerGenericTaintChecker(CheckerManager &Mgr) {
+void ento::registerTaintPropagationChecker(CheckerManager &Mgr) {
Mgr.registerChecker<GenericTaintChecker>();
}
+bool ento::shouldRegisterTaintPropagationChecker(const CheckerManager &mgr) {
+ return true;
+}
+
+void ento::registerGenericTaintChecker(CheckerManager &Mgr) {
+ GenericTaintChecker *checker = Mgr.getChecker<GenericTaintChecker>();
+ checker->isTaintReporterCheckerEnabled = true;
+ checker->reporterCheckerName = Mgr.getCurrentCheckerName();
+}
+
bool ento::shouldRegisterGenericTaintChecker(const CheckerManager &mgr) {
return true;
}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
…ling checkers Taint propagation is a a generic modeling feature of the Clang Static Analyzer which many other checkers depend on. Therefore GenericTaintChecker is split into a TaintPropagation modeling checker and a GenericTaint reporting checker. Other checkers, which report taint related warnings, should set the TaintPropagation checker as their dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, I added some minor remarks in inline comments.
Also note that with this change we can finally remove the note
The ``alpha.security.taint.TaintPropagation`` checker also needs to be enabled for
this checker to give warnings.
from the documentation of optin.taint.TaintedAlloc
(because the TaintPropagation
modeling checker is now a dependency of it).
private: | ||
const BugType BT{this, "Use of Untrusted Data", categories::TaintedData}; | ||
mutable std::unique_ptr<BugType> BT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit sad that this mutable unique_ptr trickery reappears in this checker, but I don't see a better alternative
void ento::registerGenericTaintChecker(CheckerManager &Mgr) { | ||
GenericTaintChecker *checker = Mgr.getChecker<GenericTaintChecker>(); | ||
checker->isTaintReporterCheckerEnabled = true; | ||
checker->reporterCheckerName = Mgr.getCurrentCheckerName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the initialization of BT
to this point -- then it doesn't need to be a mutable
and you won't need the string data member reporterCheckerName
.
Notes:
- You may turn
BT
into a public data member -- its visibility is still limited to this one source file. BugType
is a very simple type (essentially a plain struct of three strings), there is no reason to postpone its initialization once you have all the data.- You can use
BugType::getCheckerName()
to get the checker name to construct theCheckerProgramPointTag
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this make it possible to eliminate the unique_ptr
? I really liked the look without them :)
In other words, what forces us to use unique_ptrs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use that unique_ptr
has a null / empty state in addition to representing the "real" BugType
values. However, std::optional
would be completely sufficient for this goal (here and AFAIK in every checker that uses unique_ptr
s to juggle multiple BugType
s).
If we really want, in this checker we could use a "bare" BugType
data member if we leave it uninitialized in the constructor of the checker class (or initialize it with dummy values) and then unconditionally overwrite it with the "real" BugType
in this register...
function. (Other checkers have lazy initialization patterns that check the nullness of the BugType
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we initialize it regardless if the given checker is enabled or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this way we can access the checker name specified in Checkers.td
via the function Mgr.getCurrentCheckerName();
.
When the checker class corresponds to just one checker defined in Checkers.td
we can use the alternative constructor of BugType
that takes the checker object (this
) as a first argument and then queries the name from the checker object (which can save a single name automatically).
When a single class implements multiple checkers, we need to explicitly pass the right name to the BugType
constructor, and so we need to either (1) postpone the construction of the BugType
to a point after the registration or (2) duplicate the checker name between Checkers.td
and the source file.
AFAIK we always choose option (1), but I didn't check this systematically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This is really unfortunate.
WDYT about hard-coding the checker name, as a string, and asserting that it's the same as the one in checkers td inside the register checker call, with a comment stating that you should also update the BugType accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that your hardcode-and-assert plan is acceptable, but the current status of the PR (which uses std::optional
and emplace()
) is slightly better.
Overall, I think this is a systemic issue that should be solved by implementing a clear framework for supporting all the "one class -- multiple checkers" situations.
For example introduce a virtual base class that looks like
// CheckKind is an enum type
template <typename CheckKind>
class MultiChecker : public CheckerBase {
constexpr size_t NumKinds = static_cast<size_t>(CheckKind::NumKinds);
bool IsEnabled[NumKinds];
StringRef CheckerNames[NumKinds];
public:
void enableCheck(CheckKind CK, StringRef CheckerName);
StringRef getCheckerName(CheckKind CK) const;
StringRef getCheckerName(int CK) const {
return getCheckerName(static_cast<CheckKind>(CK));
}
bool isEnabled(CheckKind CK) const;
};
and then extend the definition of BugType with
private:
int CK; // We can use a single integer to store all the different CheckKind enum types.
public:
template <typename CheckKind>
BugType(const MultiChecker<CheckKind> *Checker, CheckKind Kind, StringRef Desc, StringRef Cat, bool SuppressOnSink)
: Checker(Checker), CK(static_cast<int>(Kind)), /*initialize other fields as in other constructors */ {}
/* in getCheckerName() use Checker and CK to get the correct checker name if these are present */
and with this framework a the implementation of multiple checkers in a single class could look like
enum class CheckKind {
TaintPropagation,
GenericTaint,
NumKinds
};
class GenericTaintChecker: public MultiChecker<CheckKind> /*other base classes*/ {
BugType BT{this, CheckKind::GenericTaint, "Use of Untrusted Data", categories::TaintedData};
// Here the other part is a modeling checker so it doesn't have a bug type,
// but in general there would be multiple declarations like this.
// ...
};
// use isEnabled(CheckKind::GenericTaint) for checking whether that part is enabled
// call enableCheck(CheckKind::/*something*/, Mgr.getCurrentCheckerName())
// within the `register` functions
Note that this is a very rough draft and probably has some ugly parts and mistakes, but I hope that it demonstrates the core idea.
The built-in default settings are defined in code, and they are always in effect once the checker is enabled, either directly or via the alias. | ||
The checker also provides a configuration interface for extending the default settings by providing a configuration file in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format. | ||
The Clang Static Analyzer uses taint analysis to detect injection vulnerability related issues in code. | ||
The backbone of taint analysis in the Clang SA is the ``TaintPropagation`` modeling checker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit strange that the TaintPropagation
checker is hidden from the user and e.g. it doesn't have a documentation, but it still mentioned on this page and it still appears in the name of alpha.security.taint.TaintPropagation:Config
.
I don't think that this is a blocking issue for this commit, but I think that it would be important to resolve this in the future. I can imagine two main solutions:
- We double down on the idea that "modeling checkers are hidden internal parts of the analyzer engine", eliminate all references to
TaintPropagation
and convertalpha.security.taint.TaintPropagation:Config
into an option of the analyzer itself. - We document the modeling checkers (probably in a separate section, as the alpha and debug checkers are already separate sections within the checker list). This would allow us to have modeling checkers that can be customized/configured or even disabled by the user.
@haoNoQ @Xazax-hun @steakhal What do you think about this? Which direction would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd prefer accepting modeling checkers as a thing and have configs for them. I'd rather not make this as an option to the analyzer itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the separation. Thanks.
Thanks for the review. I updated the patch with your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the updates.
@steakhal Is it OK for you if we merge this?
(By the way, this change doesn't have significant user-facing parts, so I don't think that we need to mention it in the release notes.) |
@@ -1046,10 +1044,7 @@ bool GenericTaintChecker::generateReportIfTainted(const Expr *E, StringRef Msg, | |||
return false; | |||
|
|||
// Generate diagnostic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Perhaps add an assert(BT)
here for the sake of paranoia?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are there, maybe we should also assert that inside the register call, before emplacing it was empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… checkers (llvm#98157) Taint propagation is a a generic modeling feature of the Clang Static Analyzer which many other checkers depend on. Therefore GenericTaintChecker is split into a TaintPropagation modeling checker and a GenericTaint reporting checker.
Taint propagation is a a generic modeling feature of the Clang Static Analyzer which many other checkers depend on. Therefore GenericTaintChecker is split into a TaintPropagation modeling checker and a GenericTaint reporting checker.
Other checkers, which report taint related warnings, should set the TaintPropagation checker as their dependency.