-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[analyzer] Introduce per-entry-point statistics #131175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
So far CSA was relying on the LLVM Statistic package that allowed us to gather some data about analysis of an entire translation unit. However, the translation unit consists of a collection of loosely related entry points. Aggregating data across multiple such entry points is often counter productive. This change introduces a new lightweight always-on facility to collect Boolean or numerical statistics for each entry point and dump them in a CSV format. Such format makes it easy to aggregate data across multiple translation units and analyze it with common data-processing tools. We break down the existing statistics that were collected on the per-TU basis into values per entry point. Additionally, we enable the statistics unconditionally (STATISTIC -> ALWAYS_ENABLED_STATISTIC) to facilitate their use (you can gather the data with a simple run-time flag rather than having to recompile the analyzer). These statistics are very light and add virtually no overhead. @steakhal (Balázs Benics) started this design and I picked over the baton. --- CPP-6160
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: Arseniy Zaostrovnykh (necto) ChangesSo far CSA was relying on the LLVM Statistic package that allowed us to gather some data about analysis of an entire translation unit. However, the translation unit consists of a collection of loosely related entry points. Aggregating data across multiple such entry points is often counter productive. This change introduces a new lightweight always-on facility to collect Boolean or numerical statistics for each entry point and dump them in a CSV format. Such format makes it easy to aggregate data across multiple translation units and analyze it with common data-processing tools. We break down the existing statistics that were collected on the per-TU basis into values per entry point. Additionally, we enable the statistics unconditionally (STATISTIC -> ALWAYS_ENABLED_STATISTIC) to facilitate their use (you can gather the data with a simple run-time flag rather than having to recompile the analyzer). These statistics are very light and add virtually no overhead. @steakhal (Balázs Benics) started this design and I picked over the baton. CPP-6160 Patch is 35.92 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131175.diff 17 Files Affected:
diff --git a/clang/docs/analyzer/developer-docs.rst b/clang/docs/analyzer/developer-docs.rst
index 60c0e71ad847c..a925cf7ca02e1 100644
--- a/clang/docs/analyzer/developer-docs.rst
+++ b/clang/docs/analyzer/developer-docs.rst
@@ -12,3 +12,4 @@ Contents:
developer-docs/nullability
developer-docs/RegionStore
developer-docs/PerformanceInvestigation
+ developer-docs/Statistics
diff --git a/clang/docs/analyzer/developer-docs/Statistics.rst b/clang/docs/analyzer/developer-docs/Statistics.rst
new file mode 100644
index 0000000000000..d352bb6f01ebc
--- /dev/null
+++ b/clang/docs/analyzer/developer-docs/Statistics.rst
@@ -0,0 +1,21 @@
+======================
+Metrics and Statistics
+======================
+
+TODO: write this once the design is settled (@reviewer, don't look here yet)
+
+CSA enjoys two facilities to collect statistics per translation unit and per entry point.
+
+Mention the following tools:
+- STATISTIC macro
+- ALLWAYS_ENABLED_STATISTIC macro
+
+- STAT_COUNTER macro
+- STAT_MAX macro
+
+- BoolEPStat
+- UnsignedEPStat
+- CounterEPStat
+- UnsignedMaxEPStat
+
+- dump-se-metrics-to-csv="%t.csv"
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index 2aa00db411844..b88bce5e262a7 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -353,6 +353,12 @@ ANALYZER_OPTION(bool, DisplayCTUProgress, "display-ctu-progress",
"the analyzer's progress related to ctu.",
false)
+ANALYZER_OPTION(
+ StringRef, DumpSEStatsToCSV, "dump-se-stats-to-csv",
+ "If provided, the analyzer will dump statistics per entry point "
+ "into the specified CSV file.",
+ "")
+
ANALYZER_OPTION(bool, ShouldTrackConditions, "track-conditions",
"Whether to track conditions that are a control dependency of "
"an already tracked variable.",
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h
new file mode 100644
index 0000000000000..16c9fdf97fc30
--- /dev/null
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h
@@ -0,0 +1,162 @@
+// EntryPointStats.h - Tracking statistics per entry point -*- C++ -*-//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---------------------------------------------------------------===//
+
+#ifndef CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H
+#define CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H
+
+#include "llvm/ADT/Statistic.h"
+#include "llvm/ADT/StringRef.h"
+
+namespace llvm {
+class raw_ostream;
+} // namespace llvm
+
+namespace clang {
+class Decl;
+
+namespace ento {
+
+class EntryPointStat {
+public:
+ llvm::StringLiteral name() const { return Name; }
+
+ static void lockRegistry();
+
+ static void takeSnapshot(const Decl *EntryPoint);
+ static void dumpStatsAsCSV(llvm::raw_ostream &OS);
+ static void dumpStatsAsCSV(llvm::StringRef FileName);
+
+protected:
+ explicit EntryPointStat(llvm::StringLiteral Name) : Name{Name} {}
+ EntryPointStat(const EntryPointStat &) = delete;
+ EntryPointStat(EntryPointStat &&) = delete;
+ EntryPointStat &operator=(EntryPointStat &) = delete;
+ EntryPointStat &operator=(EntryPointStat &&) = delete;
+
+private:
+ llvm::StringLiteral Name;
+};
+
+class BoolEPStat : public EntryPointStat {
+ std::optional<bool> Value = {};
+
+public:
+ explicit BoolEPStat(llvm::StringLiteral Name);
+ unsigned value() const { return Value && *Value; }
+ void set(bool V) {
+ assert(!Value.has_value());
+ Value = V;
+ }
+ void reset() { Value = {}; }
+};
+
+// used by CounterEntryPointTranslationUnitStat
+class CounterEPStat : public EntryPointStat {
+ using EntryPointStat::EntryPointStat;
+ unsigned Value = {};
+
+public:
+ explicit CounterEPStat(llvm::StringLiteral Name);
+ unsigned value() const { return Value; }
+ void reset() { Value = {}; }
+ CounterEPStat &operator++() {
+ ++Value;
+ return *this;
+ }
+
+ CounterEPStat &operator++(int) {
+ // No difference as you can't extract the value
+ return ++(*this);
+ }
+
+ CounterEPStat &operator+=(unsigned Inc) {
+ Value += Inc;
+ return *this;
+ }
+};
+
+// used by UnsignedMaxEtryPointTranslationUnitStatistic
+class UnsignedMaxEPStat : public EntryPointStat {
+ using EntryPointStat::EntryPointStat;
+ unsigned Value = {};
+
+public:
+ explicit UnsignedMaxEPStat(llvm::StringLiteral Name);
+ unsigned value() const { return Value; }
+ void reset() { Value = {}; }
+ void updateMax(unsigned X) { Value = std::max(Value, X); }
+};
+
+class UnsignedEPStat : public EntryPointStat {
+ using EntryPointStat::EntryPointStat;
+ std::optional<unsigned> Value = {};
+
+public:
+ explicit UnsignedEPStat(llvm::StringLiteral Name);
+ unsigned value() const { return Value.value_or(0); }
+ void reset() { Value.reset(); }
+ void set(unsigned V) {
+ assert(!Value.has_value());
+ Value = V;
+ }
+};
+
+class CounterEntryPointTranslationUnitStat {
+ CounterEPStat M;
+ llvm::TrackingStatistic S;
+
+public:
+ CounterEntryPointTranslationUnitStat(const char *DebugType,
+ llvm::StringLiteral Name,
+ llvm::StringLiteral Desc)
+ : M(Name), S(DebugType, Name.data(), Desc.data()) {}
+ CounterEntryPointTranslationUnitStat &operator++() {
+ ++M;
+ ++S;
+ return *this;
+ }
+
+ CounterEntryPointTranslationUnitStat &operator++(int) {
+ // No difference with prefix as the value is not observable.
+ return ++(*this);
+ }
+
+ CounterEntryPointTranslationUnitStat &operator+=(unsigned Inc) {
+ M += Inc;
+ S += Inc;
+ return *this;
+ }
+};
+
+class UnsignedMaxEtryPointTranslationUnitStatistic {
+ UnsignedMaxEPStat M;
+ llvm::TrackingStatistic S;
+
+public:
+ UnsignedMaxEtryPointTranslationUnitStatistic(const char *DebugType,
+ llvm::StringLiteral Name,
+ llvm::StringLiteral Desc)
+ : M(Name), S(DebugType, Name.data(), Desc.data()) {}
+ void updateMax(uint64_t Value) {
+ M.updateMax(static_cast<unsigned>(Value));
+ S.updateMax(Value);
+ }
+};
+
+#define STAT_COUNTER(VARNAME, DESC) \
+ static clang::ento::CounterEntryPointTranslationUnitStat VARNAME = { \
+ DEBUG_TYPE, #VARNAME, DESC}
+
+#define STAT_MAX(VARNAME, DESC) \
+ static clang::ento::UnsignedMaxEtryPointTranslationUnitStatistic VARNAME = { \
+ DEBUG_TYPE, #VARNAME, DESC}
+
+} // namespace ento
+} // namespace clang
+
+#endif // CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H
diff --git a/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp
index a54f1b1e71d47..d030e69a2a6e0 100644
--- a/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp
@@ -13,12 +13,12 @@
#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
#include "clang/StaticAnalyzer/Core/Checker.h"
#include "clang/StaticAnalyzer/Core/CheckerManager.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallString.h"
-#include "llvm/ADT/Statistic.h"
#include "llvm/Support/raw_ostream.h"
#include <optional>
@@ -27,10 +27,9 @@ using namespace ento;
#define DEBUG_TYPE "StatsChecker"
-STATISTIC(NumBlocks,
- "The # of blocks in top level functions");
-STATISTIC(NumBlocksUnreachable,
- "The # of unreachable blocks in analyzing top level functions");
+STAT_COUNTER(NumBlocks, "The # of blocks in top level functions");
+STAT_COUNTER(NumBlocksUnreachable,
+ "The # of unreachable blocks in analyzing top level functions");
namespace {
class AnalyzerStatsChecker : public Checker<check::EndAnalysis> {
diff --git a/clang/lib/StaticAnalyzer/Core/BugReporter.cpp b/clang/lib/StaticAnalyzer/Core/BugReporter.cpp
index a4f9e092e8205..5f78fc433275d 100644
--- a/clang/lib/StaticAnalyzer/Core/BugReporter.cpp
+++ b/clang/lib/StaticAnalyzer/Core/BugReporter.cpp
@@ -39,6 +39,7 @@
#include "clang/StaticAnalyzer/Core/Checker.h"
#include "clang/StaticAnalyzer/Core/CheckerManager.h"
#include "clang/StaticAnalyzer/Core/CheckerRegistryData.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
@@ -54,7 +55,6 @@
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/SmallVector.h"
-#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/iterator_range.h"
@@ -82,19 +82,19 @@ using namespace llvm;
#define DEBUG_TYPE "BugReporter"
-STATISTIC(MaxBugClassSize,
- "The maximum number of bug reports in the same equivalence class");
-STATISTIC(MaxValidBugClassSize,
- "The maximum number of bug reports in the same equivalence class "
- "where at least one report is valid (not suppressed)");
-
-STATISTIC(NumTimesReportPassesZ3, "Number of reports passed Z3");
-STATISTIC(NumTimesReportRefuted, "Number of reports refuted by Z3");
-STATISTIC(NumTimesReportEQClassAborted,
- "Number of times a report equivalence class was aborted by the Z3 "
- "oracle heuristic");
-STATISTIC(NumTimesReportEQClassWasExhausted,
- "Number of times all reports of an equivalence class was refuted");
+STAT_MAX(MaxBugClassSize,
+ "The maximum number of bug reports in the same equivalence class");
+STAT_MAX(MaxValidBugClassSize,
+ "The maximum number of bug reports in the same equivalence class "
+ "where at least one report is valid (not suppressed)");
+
+STAT_COUNTER(NumTimesReportPassesZ3, "Number of reports passed Z3");
+STAT_COUNTER(NumTimesReportRefuted, "Number of reports refuted by Z3");
+STAT_COUNTER(NumTimesReportEQClassAborted,
+ "Number of times a report equivalence class was aborted by the Z3 "
+ "oracle heuristic");
+STAT_COUNTER(NumTimesReportEQClassWasExhausted,
+ "Number of times all reports of an equivalence class was refuted");
BugReporterVisitor::~BugReporterVisitor() = default;
diff --git a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt
index fb9394a519eb7..d0a9b202f9c52 100644
--- a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt
+++ b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt
@@ -24,6 +24,7 @@ add_clang_library(clangStaticAnalyzerCore
CoreEngine.cpp
DynamicExtent.cpp
DynamicType.cpp
+ EntryPointStats.cpp
Environment.cpp
ExplodedGraph.cpp
ExprEngine.cpp
diff --git a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
index d96211c3a6635..5c05c9c87f124 100644
--- a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
@@ -22,12 +22,12 @@
#include "clang/Basic/LLVM.h"
#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/BlockCounter.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/WorkList.h"
#include "llvm/ADT/STLExtras.h"
-#include "llvm/ADT/Statistic.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/FormatVariadic.h"
@@ -43,14 +43,12 @@ using namespace ento;
#define DEBUG_TYPE "CoreEngine"
-STATISTIC(NumSteps,
- "The # of steps executed.");
-STATISTIC(NumSTUSteps, "The # of STU steps executed.");
-STATISTIC(NumCTUSteps, "The # of CTU steps executed.");
-STATISTIC(NumReachedMaxSteps,
- "The # of times we reached the max number of steps.");
-STATISTIC(NumPathsExplored,
- "The # of paths explored by the analyzer.");
+STAT_COUNTER(NumSteps, "The # of steps executed.");
+STAT_COUNTER(NumSTUSteps, "The # of STU steps executed.");
+STAT_COUNTER(NumCTUSteps, "The # of CTU steps executed.");
+ALWAYS_ENABLED_STATISTIC(NumReachedMaxSteps,
+ "The # of times we reached the max number of steps.");
+STAT_COUNTER(NumPathsExplored, "The # of paths explored by the analyzer.");
//===----------------------------------------------------------------------===//
// Core analysis engine.
diff --git a/clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp b/clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp
new file mode 100644
index 0000000000000..f17d0522f983a
--- /dev/null
+++ b/clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp
@@ -0,0 +1,201 @@
+//===- EntryPointStats.cpp ----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--------------------------------------------------------------------===//
+
+#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h"
+#include "clang/AST/DeclBase.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/ManagedStatic.h"
+#include "llvm/Support/raw_ostream.h"
+#include <iterator>
+
+using namespace clang;
+using namespace ento;
+
+namespace {
+struct Registry {
+ std::vector<BoolEPStat *> BoolStats;
+ std::vector<CounterEPStat *> CounterStats;
+ std::vector<UnsignedMaxEPStat *> UnsignedMaxStats;
+ std::vector<UnsignedEPStat *> UnsignedStats;
+
+ bool IsLocked = false;
+
+ struct Snapshot {
+ const Decl *EntryPoint;
+ std::vector<bool> BoolStatValues;
+ std::vector<unsigned> UnsignedStatValues;
+
+ void dumpDynamicStatsAsCSV(llvm::raw_ostream &OS) const;
+ };
+
+ std::vector<Snapshot> Snapshots;
+};
+} // namespace
+
+static llvm::ManagedStatic<Registry> StatsRegistry;
+
+namespace {
+template <typename Callback> void enumerateStatVectors(const Callback &Fn) {
+ Fn(StatsRegistry->BoolStats);
+ Fn(StatsRegistry->CounterStats);
+ Fn(StatsRegistry->UnsignedMaxStats);
+ Fn(StatsRegistry->UnsignedStats);
+}
+} // namespace
+
+static void checkStatName(const EntryPointStat *M) {
+#ifdef NDEBUG
+ return;
+#endif // NDEBUG
+ constexpr std::array AllowedSpecialChars = {
+ '+', '-', '_', '=', ':', '(', ')', '@', '!', '~',
+ '$', '%', '^', '&', '*', '\'', ';', '<', '>', '/'};
+ for (unsigned char C : M->name()) {
+ if (!std::isalnum(C) && !llvm::is_contained(AllowedSpecialChars, C)) {
+ llvm::errs() << "Stat name \"" << M->name() << "\" contains character '"
+ << C << "' (" << static_cast<int>(C)
+ << ") that is not allowed.";
+ assert(false && "The Stat name contains unallowed character");
+ }
+ }
+}
+
+void EntryPointStat::lockRegistry() {
+ auto CmpByNames = [](const EntryPointStat *L, const EntryPointStat *R) {
+ return L->name() < R->name();
+ };
+ enumerateStatVectors(
+ [CmpByNames](auto &Stats) { llvm::sort(Stats, CmpByNames); });
+ enumerateStatVectors(
+ [](const auto &Stats) { llvm::for_each(Stats, checkStatName); });
+ StatsRegistry->IsLocked = true;
+}
+
+static bool isRegistered(llvm::StringLiteral Name) {
+ auto ByName = [Name](const EntryPointStat *M) { return M->name() == Name; };
+ bool Result = false;
+ enumerateStatVectors([ByName, &Result](const auto &Stats) {
+ Result = Result || llvm::any_of(Stats, ByName);
+ });
+ return Result;
+}
+
+BoolEPStat::BoolEPStat(llvm::StringLiteral Name) : EntryPointStat(Name) {
+ assert(!StatsRegistry->IsLocked);
+ assert(!isRegistered(Name));
+ StatsRegistry->BoolStats.push_back(this);
+}
+
+CounterEPStat::CounterEPStat(llvm::StringLiteral Name) : EntryPointStat(Name) {
+ assert(!StatsRegistry->IsLocked);
+ assert(!isRegistered(Name));
+ StatsRegistry->CounterStats.push_back(this);
+}
+
+UnsignedMaxEPStat::UnsignedMaxEPStat(llvm::StringLiteral Name)
+ : EntryPointStat(Name) {
+ assert(!StatsRegistry->IsLocked);
+ assert(!isRegistered(Name));
+ StatsRegistry->UnsignedMaxStats.push_back(this);
+}
+
+UnsignedEPStat::UnsignedEPStat(llvm::StringLiteral Name)
+ : EntryPointStat(Name) {
+ assert(!StatsRegistry->IsLocked);
+ assert(!isRegistered(Name));
+ StatsRegistry->UnsignedStats.push_back(this);
+}
+
+static std::vector<unsigned> consumeUnsignedStats() {
+ std::vector<unsigned> Result;
+ Result.reserve(StatsRegistry->CounterStats.size() +
+ StatsRegistry->UnsignedMaxStats.size() +
+ StatsRegistry->UnsignedStats.size());
+ for (auto *M : StatsRegistry->CounterStats) {
+ Result.push_back(M->value());
+ M->reset();
+ }
+ for (auto *M : StatsRegistry->UnsignedMaxStats) {
+ Result.push_back(M->value());
+ M->reset();
+ }
+ for (auto *M : StatsRegistry->UnsignedStats) {
+ Result.push_back(M->value());
+ M->reset();
+ }
+ return Result;
+}
+
+static std::vector<llvm::StringLiteral> getStatNames() {
+ std::vector<llvm::StringLiteral> Ret;
+ auto GetName = [](const EntryPointStat *M) { return M->name(); };
+ enumerateStatVectors([GetName, &Ret](const auto &Stats) {
+ transform(Stats, std::back_inserter(Ret), GetName);
+ });
+ return Ret;
+}
+
+void Registry::Snapshot::dumpDynamicStatsAsCSV(llvm::raw_ostream &OS) const {
+ OS << '"';
+ llvm::printEscapedString(
+ clang::AnalysisDeclContext::getFunctionName(EntryPoint), OS);
+ OS << "\", ";
+ auto PrintAsBool = [&OS](bool B) { OS << (B ? "true" : "false"); };
+ llvm::interleaveComma(BoolStatValues, OS, PrintAsBool);
+ OS << ((BoolStatValues.empty() || UnsignedStatValues.empty()) ? "" : ", ");
+ llvm::interleaveComma(UnsignedStatValues, OS);
+}
+
+static std::vector<bool> consumeBoolStats() {
+ std::vector<bool> Result;
+ Result.reserve(StatsRegistry->BoolStats.size());
+ for (auto *M : StatsRegistry->BoolStats) {
+ Result.push_back(M->value());
+ M->reset();
+ }
+ return Result;
+}
+
+void EntryPointStat::takeSnapshot(const Decl *EntryPoint) {
+ auto BoolValues = consumeBoolStats();
+ auto UnsignedValues = consumeUnsignedStats();
+ StatsRegistry->Snapshots.push_back(
+ {EntryPoint, std::move(BoolValues), std::move(UnsignedValues)});
+}
+
+void EntryPointStat::dumpStatsAsCSV(llvm::StringRef FileName) {
+ std::error_code EC;
+ llvm::raw_fd_ostream File(FileName, EC, llvm::sys::fs::OF_Text);
+ if (EC)
+ return;
+ dumpStatsAsCSV(File);
+}
+
+void EntryPointStat::dumpStatsAsCSV(llvm::raw_ostream &OS) {
+ OS << "EntryPoint, ";
+ llvm::interleaveComma(getStatNames(), OS);
+ OS << "\n";
+
+ std::vector<std::string> Rows;
+ Rows.reserve(StatsRegistry->Snapshots.size());
+ for (const auto &Snapshot : StatsRegistry->Snapshots) {
+ std::string Row;
+ llvm::raw_string_ostream RowOs(Row);
+ Snapshot.dumpDynamicStatsAsCSV(RowOs);
+ RowOs << "\n";
+ Rows.push_back(RowOs.str());
+ }
+ llvm::sort(Rows);
+ for (const auto &Row : Rows) {
+ OS << Row;
+ }
+}
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 914eb0f4ef6bd..12a5b248c843f 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -49,6 +49,7 @@
#include "clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h"
#include "clang/StaticAnalyz...
[truncated]
|
✅ With the latest revision this PR passed the Python code formatter. |
Moving out the attribution, to not get spammed by different forks when picking this commit or rebasing.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing infrastructural improvements like this; these will be helpful in the analyzer development. Overall I'm satisfied with the idea of this commit; but I marked a few typos and stylistic suggestions in inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h
Outdated
Show resolved
Hide resolved
clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h
Outdated
Show resolved
Hide resolved
Co-authored-by: Donát Nagy <[email protected]>
…ntStats.h Co-authored-by: Donát Nagy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only found a couple of minor points. Otherwise looks good.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
So far CSA was relying on the LLVM Statistic package that allowed us to gather some data about analysis of an entire translation unit. However, the translation unit consists of a collection of loosely related entry points. Aggregating data across multiple such entry points is often counter productive.
This change introduces a new lightweight always-on facility to collect Boolean or numerical statistics for each entry point and dump them in a CSV format. Such format makes it easy to aggregate data across multiple translation units and analyze it with common data-processing tools.
We break down the existing statistics that were collected on the per-TU basis into values per entry point.
Additionally, we enable the statistics unconditionally (STATISTIC -> ALWAYS_ENABLED_STATISTIC) to facilitate their use (you can gather the data with a simple run-time flag rather than having to recompile the analyzer). These statistics are very light and add virtually no overhead.
CPP-6160