Skip to content

Commit 57e3641

Browse files
nectosteakhal
andauthored
[analyzer] Introduce per-entry-point statistics (#131175)
So far CSA was relying on the LLVM Statistic package that allowed us to gather some data about analysis of an entire translation unit. However, the translation unit consists of a collection of loosely related entry points. Aggregating data across multiple such entry points is often counter productive. This change introduces a new lightweight always-on facility to collect Boolean or numerical statistics for each entry point and dump them in a CSV format. Such format makes it easy to aggregate data across multiple translation units and analyze it with common data-processing tools. We break down the existing statistics that were collected on the per-TU basis into values per entry point. Additionally, we enable the statistics unconditionally (STATISTIC -> ALWAYS_ENABLED_STATISTIC) to facilitate their use (you can gather the data with a simple run-time flag rather than having to recompile the analyzer). These statistics are very light and add virtually no overhead. Co-authored-by: Balazs Benics <[email protected]> CPP-6160
1 parent c3f6d2c commit 57e3641

File tree

18 files changed

+699
-78
lines changed

18 files changed

+699
-78
lines changed

clang/docs/analyzer/developer-docs.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ Contents:
1212
developer-docs/nullability
1313
developer-docs/RegionStore
1414
developer-docs/PerformanceInvestigation
15+
developer-docs/Statistics
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
===================
2+
Analysis Statistics
3+
===================
4+
5+
Clang Static Analyzer enjoys two facilities to collect statistics: per translation unit and per entry point.
6+
We use `llvm/ADT/Statistic.h`_ for numbers describing the entire translation unit.
7+
We use `clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h`_ to collect data for each symbolic-execution entry point.
8+
9+
.. _llvm/ADT/Statistic.h: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/Statistic.h#L171
10+
.. _clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h
11+
12+
In many cases, it makes sense to collect statistics on both translation-unit level and entry-point level. You can use the two macros defined in EntryPointStats.h for that:
13+
14+
- ``STAT_COUNTER`` for additive statistics, for example, "the number of steps executed", "the number of functions inlined".
15+
- ``STAT_MAX`` for maximizing statistics, for example, "the maximum environment size", or "the longest execution path".
16+
17+
If you want to define a statistic that makes sense only for the entire translation unit, for example, "the number of entry points", Statistic.h defines two macros: ``STATISTIC`` and ``ALWAYS_ENABLED_STATISTIC``.
18+
You should prefer ``ALWAYS_ENABLED_STATISTIC`` unless you have a good reason not to.
19+
``STATISTIC`` is controlled by ``LLVM_ENABLE_STATS`` / ``LLVM_FORCE_ENABLE_STATS``.
20+
However, note that with ``LLVM_ENABLE_STATS`` disabled, only storage of the values is disabled, the computations producing those values still carry on unless you took an explicit precaution to make them conditional too.
21+
22+
If you want to define a statistic only for entry point, EntryPointStats.h has four classes at your disposal:
23+
24+
25+
- ``BoolEPStat`` - a boolean value assigned at most once per entry point. For example: "has the inline limit been reached".
26+
- ``UnsignedEPStat`` - an unsigned value assigned at most once per entry point. For example: "the number of source characters in an entry-point body".
27+
- ``CounterEPStat`` - an additive statistic. It starts with 0 and you can add to it as many times as needed. For example: "the number of bugs discovered".
28+
- ``UnsignedMaxEPStat`` - a maximizing statistic. It starts with 0 and when you join it with a value, it picks the maximum of the previous value and the new one. For example, "the longest execution path of a bug".
29+
30+
To produce a CSV file with all the statistics collected per entry point, use the ``dump-entry-point-stats-to-csv=<file>.csv`` parameter.
31+
32+
Note, EntryPointStats.h is not meant to be complete, and if you feel it is lacking certain kind of statistic, odds are that it does.
33+
Feel free to extend it!

clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,12 @@ ANALYZER_OPTION(bool, DisplayCTUProgress, "display-ctu-progress",
353353
"the analyzer's progress related to ctu.",
354354
false)
355355

356+
ANALYZER_OPTION(
357+
StringRef, DumpEntryPointStatsToCSV, "dump-entry-point-stats-to-csv",
358+
"If provided, the analyzer will dump statistics per entry point "
359+
"into the specified CSV file.",
360+
"")
361+
356362
ANALYZER_OPTION(bool, ShouldTrackConditions, "track-conditions",
357363
"Whether to track conditions that are a control dependency of "
358364
"an already tracked variable.",
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
// EntryPointStats.h - Tracking statistics per entry point ------*- C++ -*-===//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
9+
#ifndef CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H
10+
#define CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H
11+
12+
#include "llvm/ADT/Statistic.h"
13+
#include "llvm/ADT/StringRef.h"
14+
15+
namespace llvm {
16+
class raw_ostream;
17+
} // namespace llvm
18+
19+
namespace clang {
20+
class Decl;
21+
22+
namespace ento {
23+
24+
class EntryPointStat {
25+
public:
26+
llvm::StringLiteral name() const { return Name; }
27+
28+
static void lockRegistry();
29+
30+
static void takeSnapshot(const Decl *EntryPoint);
31+
static void dumpStatsAsCSV(llvm::raw_ostream &OS);
32+
static void dumpStatsAsCSV(llvm::StringRef FileName);
33+
34+
protected:
35+
explicit EntryPointStat(llvm::StringLiteral Name) : Name{Name} {}
36+
EntryPointStat(const EntryPointStat &) = delete;
37+
EntryPointStat(EntryPointStat &&) = delete;
38+
EntryPointStat &operator=(EntryPointStat &) = delete;
39+
EntryPointStat &operator=(EntryPointStat &&) = delete;
40+
41+
private:
42+
llvm::StringLiteral Name;
43+
};
44+
45+
class BoolEPStat : public EntryPointStat {
46+
std::optional<bool> Value = {};
47+
48+
public:
49+
explicit BoolEPStat(llvm::StringLiteral Name);
50+
unsigned value() const { return Value && *Value; }
51+
void set(bool V) {
52+
assert(!Value.has_value());
53+
Value = V;
54+
}
55+
void reset() { Value = {}; }
56+
};
57+
58+
// used by CounterEntryPointTranslationUnitStat
59+
class CounterEPStat : public EntryPointStat {
60+
using EntryPointStat::EntryPointStat;
61+
unsigned Value = {};
62+
63+
public:
64+
explicit CounterEPStat(llvm::StringLiteral Name);
65+
unsigned value() const { return Value; }
66+
void reset() { Value = {}; }
67+
CounterEPStat &operator++() {
68+
++Value;
69+
return *this;
70+
}
71+
72+
CounterEPStat &operator++(int) {
73+
// No difference as you can't extract the value
74+
return ++(*this);
75+
}
76+
77+
CounterEPStat &operator+=(unsigned Inc) {
78+
Value += Inc;
79+
return *this;
80+
}
81+
};
82+
83+
// used by UnsignedMaxEtryPointTranslationUnitStatistic
84+
class UnsignedMaxEPStat : public EntryPointStat {
85+
using EntryPointStat::EntryPointStat;
86+
unsigned Value = {};
87+
88+
public:
89+
explicit UnsignedMaxEPStat(llvm::StringLiteral Name);
90+
unsigned value() const { return Value; }
91+
void reset() { Value = {}; }
92+
void updateMax(unsigned X) { Value = std::max(Value, X); }
93+
};
94+
95+
class UnsignedEPStat : public EntryPointStat {
96+
using EntryPointStat::EntryPointStat;
97+
std::optional<unsigned> Value = {};
98+
99+
public:
100+
explicit UnsignedEPStat(llvm::StringLiteral Name);
101+
unsigned value() const { return Value.value_or(0); }
102+
void reset() { Value.reset(); }
103+
void set(unsigned V) {
104+
assert(!Value.has_value());
105+
Value = V;
106+
}
107+
};
108+
109+
class CounterEntryPointTranslationUnitStat {
110+
CounterEPStat M;
111+
llvm::TrackingStatistic S;
112+
113+
public:
114+
CounterEntryPointTranslationUnitStat(const char *DebugType,
115+
llvm::StringLiteral Name,
116+
llvm::StringLiteral Desc)
117+
: M(Name), S(DebugType, Name.data(), Desc.data()) {}
118+
CounterEntryPointTranslationUnitStat &operator++() {
119+
++M;
120+
++S;
121+
return *this;
122+
}
123+
124+
CounterEntryPointTranslationUnitStat &operator++(int) {
125+
// No difference with prefix as the value is not observable.
126+
return ++(*this);
127+
}
128+
129+
CounterEntryPointTranslationUnitStat &operator+=(unsigned Inc) {
130+
M += Inc;
131+
S += Inc;
132+
return *this;
133+
}
134+
};
135+
136+
class UnsignedMaxEntryPointTranslationUnitStatistic {
137+
UnsignedMaxEPStat M;
138+
llvm::TrackingStatistic S;
139+
140+
public:
141+
UnsignedMaxEntryPointTranslationUnitStatistic(const char *DebugType,
142+
llvm::StringLiteral Name,
143+
llvm::StringLiteral Desc)
144+
: M(Name), S(DebugType, Name.data(), Desc.data()) {}
145+
void updateMax(uint64_t Value) {
146+
M.updateMax(static_cast<unsigned>(Value));
147+
S.updateMax(Value);
148+
}
149+
};
150+
151+
#define STAT_COUNTER(VARNAME, DESC) \
152+
static clang::ento::CounterEntryPointTranslationUnitStat VARNAME = { \
153+
DEBUG_TYPE, #VARNAME, DESC}
154+
155+
#define STAT_MAX(VARNAME, DESC) \
156+
static clang::ento::UnsignedMaxEntryPointTranslationUnitStatistic VARNAME = \
157+
{DEBUG_TYPE, #VARNAME, DESC}
158+
159+
} // namespace ento
160+
} // namespace clang
161+
162+
#endif // CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H

clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,12 @@
1313
#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
1414
#include "clang/StaticAnalyzer/Core/Checker.h"
1515
#include "clang/StaticAnalyzer/Core/CheckerManager.h"
16+
#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h"
1617
#include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
1718
#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
1819
#include "llvm/ADT/STLExtras.h"
1920
#include "llvm/ADT/SmallPtrSet.h"
2021
#include "llvm/ADT/SmallString.h"
21-
#include "llvm/ADT/Statistic.h"
2222
#include "llvm/Support/raw_ostream.h"
2323
#include <optional>
2424

@@ -27,10 +27,9 @@ using namespace ento;
2727

2828
#define DEBUG_TYPE "StatsChecker"
2929

30-
STATISTIC(NumBlocks,
31-
"The # of blocks in top level functions");
32-
STATISTIC(NumBlocksUnreachable,
33-
"The # of unreachable blocks in analyzing top level functions");
30+
STAT_COUNTER(NumBlocks, "The # of blocks in top level functions");
31+
STAT_COUNTER(NumBlocksUnreachable,
32+
"The # of unreachable blocks in analyzing top level functions");
3433

3534
namespace {
3635
class AnalyzerStatsChecker : public Checker<check::EndAnalysis> {

clang/lib/StaticAnalyzer/Core/BugReporter.cpp

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
#include "clang/StaticAnalyzer/Core/Checker.h"
4040
#include "clang/StaticAnalyzer/Core/CheckerManager.h"
4141
#include "clang/StaticAnalyzer/Core/CheckerRegistryData.h"
42+
#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h"
4243
#include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
4344
#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
4445
#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
@@ -54,7 +55,6 @@
5455
#include "llvm/ADT/SmallPtrSet.h"
5556
#include "llvm/ADT/SmallString.h"
5657
#include "llvm/ADT/SmallVector.h"
57-
#include "llvm/ADT/Statistic.h"
5858
#include "llvm/ADT/StringExtras.h"
5959
#include "llvm/ADT/StringRef.h"
6060
#include "llvm/ADT/iterator_range.h"
@@ -82,19 +82,19 @@ using namespace llvm;
8282

8383
#define DEBUG_TYPE "BugReporter"
8484

85-
STATISTIC(MaxBugClassSize,
86-
"The maximum number of bug reports in the same equivalence class");
87-
STATISTIC(MaxValidBugClassSize,
88-
"The maximum number of bug reports in the same equivalence class "
89-
"where at least one report is valid (not suppressed)");
90-
91-
STATISTIC(NumTimesReportPassesZ3, "Number of reports passed Z3");
92-
STATISTIC(NumTimesReportRefuted, "Number of reports refuted by Z3");
93-
STATISTIC(NumTimesReportEQClassAborted,
94-
"Number of times a report equivalence class was aborted by the Z3 "
95-
"oracle heuristic");
96-
STATISTIC(NumTimesReportEQClassWasExhausted,
97-
"Number of times all reports of an equivalence class was refuted");
85+
STAT_MAX(MaxBugClassSize,
86+
"The maximum number of bug reports in the same equivalence class");
87+
STAT_MAX(MaxValidBugClassSize,
88+
"The maximum number of bug reports in the same equivalence class "
89+
"where at least one report is valid (not suppressed)");
90+
91+
STAT_COUNTER(NumTimesReportPassesZ3, "Number of reports passed Z3");
92+
STAT_COUNTER(NumTimesReportRefuted, "Number of reports refuted by Z3");
93+
STAT_COUNTER(NumTimesReportEQClassAborted,
94+
"Number of times a report equivalence class was aborted by the Z3 "
95+
"oracle heuristic");
96+
STAT_COUNTER(NumTimesReportEQClassWasExhausted,
97+
"Number of times all reports of an equivalence class was refuted");
9898

9999
BugReporterVisitor::~BugReporterVisitor() = default;
100100

clang/lib/StaticAnalyzer/Core/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ add_clang_library(clangStaticAnalyzerCore
2424
CoreEngine.cpp
2525
DynamicExtent.cpp
2626
DynamicType.cpp
27+
EntryPointStats.cpp
2728
Environment.cpp
2829
ExplodedGraph.cpp
2930
ExprEngine.cpp

clang/lib/StaticAnalyzer/Core/CoreEngine.cpp

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,12 @@
2222
#include "clang/Basic/LLVM.h"
2323
#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"
2424
#include "clang/StaticAnalyzer/Core/PathSensitive/BlockCounter.h"
25+
#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h"
2526
#include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
2627
#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
2728
#include "clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h"
2829
#include "clang/StaticAnalyzer/Core/PathSensitive/WorkList.h"
2930
#include "llvm/ADT/STLExtras.h"
30-
#include "llvm/ADT/Statistic.h"
3131
#include "llvm/Support/Casting.h"
3232
#include "llvm/Support/ErrorHandling.h"
3333
#include "llvm/Support/FormatVariadic.h"
@@ -43,14 +43,12 @@ using namespace ento;
4343

4444
#define DEBUG_TYPE "CoreEngine"
4545

46-
STATISTIC(NumSteps,
47-
"The # of steps executed.");
48-
STATISTIC(NumSTUSteps, "The # of STU steps executed.");
49-
STATISTIC(NumCTUSteps, "The # of CTU steps executed.");
50-
STATISTIC(NumReachedMaxSteps,
51-
"The # of times we reached the max number of steps.");
52-
STATISTIC(NumPathsExplored,
53-
"The # of paths explored by the analyzer.");
46+
STAT_COUNTER(NumSteps, "The # of steps executed.");
47+
STAT_COUNTER(NumSTUSteps, "The # of STU steps executed.");
48+
STAT_COUNTER(NumCTUSteps, "The # of CTU steps executed.");
49+
ALWAYS_ENABLED_STATISTIC(NumReachedMaxSteps,
50+
"The # of times we reached the max number of steps.");
51+
STAT_COUNTER(NumPathsExplored, "The # of paths explored by the analyzer.");
5452

5553
//===----------------------------------------------------------------------===//
5654
// Core analysis engine.

0 commit comments

Comments
 (0)