-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[analyzer] Improve bug report hashing, merge similar reports #98621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] Improve bug report hashing, merge similar reports #98621
Conversation
Previously alpha.security.ArrayBoundV2 displayed the (negative) offset value when it reported an underflow, but this produced lots of very similar and redundant reports in certain situations. After this commit the offset won't be printed so the usual deduplication will handle these reports as equivalent (and print only one of them). See llvm#86969 for background.
@llvm/pr-subscribers-clang Author: Donát Nagy (NagyDonat) ChangesPreviously alpha.security.ArrayBoundV2 displayed the (negative) offset value when it reported an underflow, but this produced lots of very similar and redundant reports in certain situations. After this commit the offset won't be printed so the usual deduplication will handle these reports as equivalent (and print only one of them). See #86969 for background. Full diff: https://github.com/llvm/llvm-project/pull/98621.diff 2 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index f82288f1099e8..f177041abc411 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -374,13 +374,14 @@ static std::optional<int64_t> getConcreteValue(std::optional<NonLoc> SV) {
static Messages getPrecedesMsgs(const SubRegion *Region, NonLoc Offset) {
std::string RegName = getRegionName(Region);
- SmallString<128> Buf;
- llvm::raw_svector_ostream Out(Buf);
- Out << "Access of " << RegName << " at negative byte offset";
- if (auto ConcreteIdx = Offset.getAs<nonloc::ConcreteInt>())
- Out << ' ' << ConcreteIdx->getValue();
+
+ // We're not reporting the Offset, because we don't want to spam the user
+ // with similar reports that differ only in different offset values.
+ // See https://github.com/llvm/llvm-project/issues/86969 for details.
+ (void)Offset;
+
return {formatv("Out of bound access to memory preceding {0}", RegName),
- std::string(Buf)};
+ formatv("Access of {0} at negative byte offset", RegName)};
}
/// Try to divide `Val1` and `Val2` (in place) by `Divisor` and return true if
@@ -609,7 +610,7 @@ void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
// CHECK UPPER BOUND
DefinedOrUnknownSVal Size = getDynamicExtent(State, Reg, SVB);
if (auto KnownSize = Size.getAs<NonLoc>()) {
- // In a situation where both overflow and overflow are possible (but the
+ // In a situation where both underflow and overflow are possible (but the
// index is either tainted or known to be invalid), the logic of this
// checker will first assume that the offset is non-negative, and then
// (with this additional assumption) it will detect an overflow error.
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 92f983d8b1561..10cfe95a79a55 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -6,7 +6,7 @@ int TenElements[10];
void arrayUnderflow(void) {
TenElements[-3] = 5;
// expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
- // expected-note@-2 {{Access of 'TenElements' at negative byte offset -12}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset}}
}
int underflowWithDeref(void) {
@@ -14,7 +14,29 @@ int underflowWithDeref(void) {
--p;
return *p;
// expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
- // expected-note@-2 {{Access of 'TenElements' at negative byte offset -4}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset}}
+}
+
+int rng(void);
+int getIndex(void) {
+ switch (rng()) {
+ case 1: return -152;
+ case 2: return -160;
+ case 3: return -168;
+ default: return -172;
+ }
+}
+
+void gh86959(void) {
+ // Previously code like this produced many almost-identical bug reports that
+ // only differed in the byte offset value (which was reported by the checker
+ // at that time). Verify that now we only see one report.
+
+ // expected-note@+1 {{Entering loop body}}
+ while (rng())
+ TenElements[getIndex()] = 10;
+ // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset}}
}
int scanf(const char *restrict fmt, ...);
|
@llvm/pr-subscribers-clang-static-analyzer-1 Author: Donát Nagy (NagyDonat) ChangesPreviously alpha.security.ArrayBoundV2 displayed the (negative) offset value when it reported an underflow, but this produced lots of very similar and redundant reports in certain situations. After this commit the offset won't be printed so the usual deduplication will handle these reports as equivalent (and print only one of them). See #86969 for background. Full diff: https://github.com/llvm/llvm-project/pull/98621.diff 2 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index f82288f1099e8..f177041abc411 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -374,13 +374,14 @@ static std::optional<int64_t> getConcreteValue(std::optional<NonLoc> SV) {
static Messages getPrecedesMsgs(const SubRegion *Region, NonLoc Offset) {
std::string RegName = getRegionName(Region);
- SmallString<128> Buf;
- llvm::raw_svector_ostream Out(Buf);
- Out << "Access of " << RegName << " at negative byte offset";
- if (auto ConcreteIdx = Offset.getAs<nonloc::ConcreteInt>())
- Out << ' ' << ConcreteIdx->getValue();
+
+ // We're not reporting the Offset, because we don't want to spam the user
+ // with similar reports that differ only in different offset values.
+ // See https://github.com/llvm/llvm-project/issues/86969 for details.
+ (void)Offset;
+
return {formatv("Out of bound access to memory preceding {0}", RegName),
- std::string(Buf)};
+ formatv("Access of {0} at negative byte offset", RegName)};
}
/// Try to divide `Val1` and `Val2` (in place) by `Divisor` and return true if
@@ -609,7 +610,7 @@ void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
// CHECK UPPER BOUND
DefinedOrUnknownSVal Size = getDynamicExtent(State, Reg, SVB);
if (auto KnownSize = Size.getAs<NonLoc>()) {
- // In a situation where both overflow and overflow are possible (but the
+ // In a situation where both underflow and overflow are possible (but the
// index is either tainted or known to be invalid), the logic of this
// checker will first assume that the offset is non-negative, and then
// (with this additional assumption) it will detect an overflow error.
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 92f983d8b1561..10cfe95a79a55 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -6,7 +6,7 @@ int TenElements[10];
void arrayUnderflow(void) {
TenElements[-3] = 5;
// expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
- // expected-note@-2 {{Access of 'TenElements' at negative byte offset -12}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset}}
}
int underflowWithDeref(void) {
@@ -14,7 +14,29 @@ int underflowWithDeref(void) {
--p;
return *p;
// expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
- // expected-note@-2 {{Access of 'TenElements' at negative byte offset -4}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset}}
+}
+
+int rng(void);
+int getIndex(void) {
+ switch (rng()) {
+ case 1: return -152;
+ case 2: return -160;
+ case 3: return -168;
+ default: return -172;
+ }
+}
+
+void gh86959(void) {
+ // Previously code like this produced many almost-identical bug reports that
+ // only differed in the byte offset value (which was reported by the checker
+ // at that time). Verify that now we only see one report.
+
+ // expected-note@+1 {{Entering loop body}}
+ while (rng())
+ TenElements[getIndex()] = 10;
+ // expected-warning@-1 {{Out of bound access to memory preceding 'TenElements'}}
+ // expected-note@-2 {{Access of 'TenElements' at negative byte offset}}
}
int scanf(const char *restrict fmt, ...);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense.
I wonder if we could have something in between. I'm thinking of having the concrete offset as a separate note, instead of having it part of the primary message.
That way after BR selection, we would still deterministically pick the shortest parh, and also have the offset in the path notes.
I find this better than dropping the concrete offset you worked so hard acquiring. WDYT?
This commit re-adds the concrete offset value at the end of the (long) `Description` of the underflow report, but ensures that the `Profile()` method of `PathSensitiveBugReport` only uses the _short_ description (as returned by `getShortDescription()`) instead of the the `Description` field. For the sake of consistency, the same modification is also applied to `BasicBugReport::Profile()`. This modification of `Profile()` is a no-op for most of the checkers, because there are very few checkers that set a separate short description for their bug reports and `getShortDescription()` defaults to returning the long `Description` when the field `ShortDescription` is an empty string (i.e. unspecified). I'd say that it was a bug that the short description (which is arguably _the_ human-readable "hash" of the report) wasn't included in the hash calculations performed by `Profile()`. On the other hand, I think it'll be useful that `Profile()` ignores the long description, because then we'll have a nice place where we can print the "nice to mention, but not enough to create a fundamentally different report" secondary information. (Note that the long description is essentially the "final note on the bug path" and the other notes are also ignored by `Profile()` -- because they are calculated later by the visitors.)
Actually I had a better solution for achieving this: during my earlier experimentation I noticed that the I'd guess that (For more details see the commit message of the commit that I just pushed onto this PR.) |
(Technical detail: I'll be on vacation during the next week, so I won't see updates on this PR until the 22nd of July. If you want to merge this PR, feel free to do so.) |
No need to rush. Have a nice one! |
I'm back from the vacation, so I would like to restart and conclude this review process. @steakhal or anybody else please have a look at the PR when it's convenient for you. |
(Btw it seems that I accidentally added everyone as an "assignee" instead of a reviewer...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I only had one minor question inline.
hash.AddString(Description); | ||
hash.AddString(getShortDescription()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional that you use a member function here instead of the member variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's intentional, because the method call getShortDescription()
is equivalent to ShortDescripton.empty() ? Description : ShortDescription
. (The common case when the short and full descriptions are identical is represented internally by an empty ShortDescription
and the shared value stored in Description
.)
Summary: Previously there were certain situations where alpha.security.ArrayBoundV2 produced lots of very similar and redundant reports that only differed in their full `Description` that contained the (negative) byte offset value. (See #86969 for details.) This change updates the `Profile()` method of `PathSensitiveBugReport` to ensure that it uses `getShortDescription()` instead of the full `Description` so the standard report deduplication eliminates most of these redundant reports. Note that the effects of this change are very limited because there are very few checkers that specify a separate short description, and so `getShortDescription()` practically always defaults to returning the full `Description`. For the sake of consistency `BasicBugReport::Profile()` is also updated to use the short description. (Right now there are no checkers that use `BasicBugReport` with separate long and short descriptions.) This commit also includes some small code quality improvements in `ArrayBoundV2` that are IMO too trivial to be moved into a separate commit. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251171
Previously there were certain situations where alpha.security.ArrayBoundV2 produced lots of very similar and redundant reports that only differed in their full
Description
that contained the (negative) byte offset value. (See #86969 for details.)This change updates the
Profile()
method ofPathSensitiveBugReport
to ensure that it usesgetShortDescription()
instead of the fullDescription
so the standard report deduplication eliminates most of these redundant reports.Note that the effects of this change are very limited because there are very few checkers that specify a separate short description, and so
getShortDescription()
practically always defaults to returning the fullDescription
.For the sake of consistency
BasicBugReport::Profile()
is also updated to use the short description. (If I recall correctly there are no checkers that useBasicBugReport
with separate short and long descriptions.)This commit also includes some small code quality improvements in
ArrayBoundV2
that are IMO too trivial to be moved into a separate commit.