Skip to content

Commit 744745a

Browse files
author
Balazs Benics
committed
[analyzer] Add failing test case demonstrating buggy taint propagation
Recently we uncovered a serious bug in the `GenericTaintChecker`. It was already flawed before D116025, but that was the patch that turned this silent bug into a crash. It happens if the `GenericTaintChecker` has a rule for a function, which also has a definition. char *fgets(char *s, int n, FILE *fp) { nested_call(); // no parameters! return (char *)0; } // Within some function: fgets(..., tainted_fd); When the engine inlines the definition and finds a function call within that, the `PostCall` event for the call will get triggered sooner than the `PostCall` for the original function. This mismatch violates the assumption of the `GenericTaintChecker` which wants to propagate taint information from the `PreCall` event to the `PostCall` event, where it can actually bind taint to the return value **of the same call**. Let's get back to the example and go through step-by-step. The `GenericTaintChecker` will see the `PreCall<fgets(..., tainted_fd)>` event, so it would 'remember' that it needs to taint the return value and the buffer, from the `PostCall` handler, where it has access to the return value symbol. However, the engine will inline fgets and the `nested_call()` gets evaluated subsequently, which produces an unimportant `PreCall<nested_call()>`, then a `PostCall<nested_call()>` event, which is observed by the `GenericTaintChecker`, which will unconditionally mark tainted the 'remembered' arg indexes, trying to access a non-existing argument, resulting in a crash. If it doesn't crash, it will behave completely unintuitively, by marking completely unrelated memory regions tainted, which is even worse. The resulting assertion is something like this: Expr.h: const Expr *CallExpr::getArg(unsigned int) const: Assertion `Arg < getNumArgs() && "Arg access out of range!"' failed. The gist of the backtrace: CallExpr::getArg(unsigned int) const SimpleFunctionCall::getArgExpr(unsigned int) CallEvent::getArgSVal(unsigned int) const GenericTaintChecker::checkPostCall(const CallEvent &, CheckerContext&) const Prior to D116025, there was a check for the argument count before it applied taint, however, it still suffered from the same underlying issue/bug regarding propagation. This path does not intend to fix the bug, rather start a discussion on how to fix this. --- Let me elaborate on how I see this problem. This pre-call, post-call juggling is just a workaround. The engine should by itself propagate taint where necessary right where it invalidates regions. For the tracked values, which potentially escape, we need to erase the information we know about them; and this is exactly what is done by invalidation. However, in the case of taint, we basically want to approximate from the opposite side of the spectrum. We want to preserve taint in most cases, rather than cleansing them. Now, we basically sanitize all escaping tainted regions implicitly, since invalidation binds a fresh conjured symbol for the given region, and that has not been associated with taint. IMO this is a bad default behavior, we should be more aggressive about preserving taint if not further spreading taint to the reachable regions. We have a couple of options for dealing with it (let's call it //tainting policy//): 1) Taint only the parameters which were tainted prior to the call. 2) Taint the return value of the call, since it likely depends on the tainted input - if any arguments were tainted. 3) Taint all escaped regions - (maybe transitively using the cluster algorithm) - if any arguments were tainted. 4) Not taint anything - this is what we do right now :D The `ExprEngine` should not deal with taint on its own. It should be done by a checker, such as the `GenericTaintChecker`. However, the `Pre`-`PostCall` checker callbacks are not designed for this. `RegionChanges` would be a much better fit for modeling taint propagation. What we would need in the `RegionChanges` callback is the `State` prior invalidation, the `State` after the invalidation, and a `CheckerContext` in which the checker can create transitions, where it would place `NoteTags` for the modeled taint propagations and report errors if a taint sink rule gets violated. In this callback, we could query from the prior State, if the given value was tainted; then act and taint if necessary according to the checker's tainting policy. By using RegionChanges for this, we would 'fix' the mentioned propagation bug 'by-design'. Reviewed By: Szelethus Differential Revision: https://reviews.llvm.org/D118987
1 parent ae8b638 commit 744745a

File tree

3 files changed

+98
-3
lines changed

3 files changed

+98
-3
lines changed

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@
3232
#include <memory>
3333
#include <utility>
3434

35+
#define DEBUG_TYPE "taint-checker"
36+
3537
using namespace clang;
3638
using namespace ento;
3739
using namespace taint;
@@ -691,6 +693,13 @@ void GenericTaintChecker::checkPostCall(const CallEvent &Call,
691693
if (TaintArgs.isEmpty())
692694
return;
693695

696+
LLVM_DEBUG(for (ArgIdxTy I
697+
: TaintArgs) {
698+
llvm::dbgs() << "PostCall<";
699+
Call.dump(llvm::dbgs());
700+
llvm::dbgs() << "> actually wants to taint arg index: " << I << '\n';
701+
});
702+
694703
for (ArgIdxTy ArgNum : TaintArgs) {
695704
// Special handling for the tainted return value.
696705
if (ArgNum == ReturnValueIndex) {
@@ -768,15 +777,25 @@ void GenericTaintRule::process(const GenericTaintChecker &Checker,
768777

769778
/// Propagate taint where it is necessary.
770779
ForEachCallArg(
771-
[this, &State, WouldEscape](ArgIdxTy I, const Expr *E, SVal V) {
772-
if (PropDstArgs.contains(I))
780+
[this, &State, WouldEscape, &Call](ArgIdxTy I, const Expr *E, SVal V) {
781+
if (PropDstArgs.contains(I)) {
782+
LLVM_DEBUG(llvm::dbgs() << "PreCall<"; Call.dump(llvm::dbgs());
783+
llvm::dbgs()
784+
<< "> prepares tainting arg index: " << I << '\n';);
773785
State = State->add<TaintArgsOnPostVisit>(I);
786+
}
774787

775788
// TODO: We should traverse all reachable memory regions via the
776789
// escaping parameter. Instead of doing that we simply mark only the
777790
// referred memory region as tainted.
778-
if (WouldEscape(V, E->getType()))
791+
if (WouldEscape(V, E->getType())) {
792+
LLVM_DEBUG(if (!State->contains<TaintArgsOnPostVisit>(I)) {
793+
llvm::dbgs() << "PreCall<";
794+
Call.dump(llvm::dbgs());
795+
llvm::dbgs() << "> prepares tainting arg index: " << I << '\n';
796+
});
779797
State = State->add<TaintArgsOnPostVisit>(I);
798+
}
780799
});
781800

782801
C.addTransition(State);
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
// RUN: %clang_analyze_cc1 %s \
2+
// RUN: -analyzer-checker=core,alpha.security.taint \
3+
// RUN: -mllvm -debug-only=taint-checker \
4+
// RUN: 2>&1 | FileCheck %s
5+
6+
// FIXME: We should not crash.
7+
// XFAIL: *
8+
9+
struct _IO_FILE;
10+
typedef struct _IO_FILE FILE;
11+
FILE *fopen(const char *fname, const char *mode);
12+
13+
void nested_call(void) {}
14+
15+
char *fgets(char *s, int n, FILE *fp) {
16+
nested_call(); // no-crash: we should not try adding taint to a non-existent argument.
17+
return (char *)0;
18+
}
19+
20+
void top(const char *fname, char *buf) {
21+
FILE *fp = fopen(fname, "r");
22+
// CHECK: PreCall<fopen(fname, "r")> prepares tainting arg index: -1
23+
// CHECK-NEXT: PostCall<fopen(fname, "r")> actually wants to taint arg index: -1
24+
25+
if (!fp)
26+
return;
27+
28+
(void)fgets(buf, 42, fp); // Trigger taint propagation.
29+
// CHECK-NEXT: PreCall<fgets(buf, 42, fp)> prepares tainting arg index: -1
30+
// CHECK-NEXT: PreCall<fgets(buf, 42, fp)> prepares tainting arg index: 0
31+
// CHECK-NEXT: PreCall<fgets(buf, 42, fp)> prepares tainting arg index: 1
32+
// CHECK-NEXT: PreCall<fgets(buf, 42, fp)> prepares tainting arg index: 2
33+
34+
// FIXME: We should propagate taint from PreCall<fgets> -> PostCall<fgets>.
35+
// CHECK-NEXT: PostCall<nested_call()> actually wants to taint arg index: -1
36+
// CHECK-NEXT: PostCall<nested_call()> actually wants to taint arg index: 0
37+
// CHECK-NEXT: PostCall<nested_call()> actually wants to taint arg index: 1
38+
// CHECK-NEXT: PostCall<nested_call()> actually wants to taint arg index: 2
39+
40+
// FIXME: We should not crash.
41+
// CHECK: PLEASE submit a bug report
42+
}
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
// RUN: %clang_analyze_cc1 %s \
2+
// RUN: -analyzer-checker=core,alpha.security.taint \
3+
// RUN: -mllvm -debug-only=taint-checker \
4+
// RUN: 2>&1 | FileCheck %s
5+
6+
struct _IO_FILE;
7+
typedef struct _IO_FILE FILE;
8+
FILE *fopen(const char *fname, const char *mode);
9+
10+
char *fgets(char *s, int n, FILE *fp); // no-definition
11+
12+
void top(const char *fname, char *buf) {
13+
FILE *fp = fopen(fname, "r"); // Introduce taint.
14+
// CHECK: PreCall<fopen(fname, "r")> prepares tainting arg index: -1
15+
// CHECK-NEXT: PostCall<fopen(fname, "r")> actually wants to taint arg index: -1
16+
17+
if (!fp)
18+
return;
19+
20+
(void)fgets(buf, 42, fp); // Trigger taint propagation.
21+
22+
// FIXME: Why is the arg index 1 prepared for taint?
23+
// Before the call it wasn't tainted, and it also shouldn't be tainted after the call.
24+
25+
// CHECK-NEXT: PreCall<fgets(buf, 42, fp)> prepares tainting arg index: -1
26+
// CHECK-NEXT: PreCall<fgets(buf, 42, fp)> prepares tainting arg index: 0
27+
// CHECK-NEXT: PreCall<fgets(buf, 42, fp)> prepares tainting arg index: 1
28+
// CHECK-NEXT: PreCall<fgets(buf, 42, fp)> prepares tainting arg index: 2
29+
//
30+
// CHECK-NEXT: PostCall<fgets(buf, 42, fp)> actually wants to taint arg index: -1
31+
// CHECK-NEXT: PostCall<fgets(buf, 42, fp)> actually wants to taint arg index: 0
32+
// CHECK-NEXT: PostCall<fgets(buf, 42, fp)> actually wants to taint arg index: 1
33+
// CHECK-NEXT: PostCall<fgets(buf, 42, fp)> actually wants to taint arg index: 2
34+
}

0 commit comments

Comments
 (0)