Skip to content

Commit 011008b

Browse files
committed
[analyzer] Workaround for unintended slowdown (scope increase)
Recently some users reported that the runtime of the static analyzer drastically increased on their codebase when they upgraded to a more recent (sligthly patched, internal) clang version: some of their translation units got more than +600% runtime (yes, with two zeroes) and IIUC the average increase was also high. Bisection revealed that the bulk of the increase was caused by my earlier commit bb27d5e ("Don't assume third iteration in loops") and I verified that the slowdown affects the downstream clang. (I used tmux as an arbitrary easy-to-analyze open source project and measured that reverting this commit reduces the total runtime by more than 50% compared to a recent main revision.) Further profiling and investigation proved that this increase is caused by an _increase of analysis scope_ because there was an arbitrary heuristic that placed functions on a "don't inline this" blacklist if they reached the `-analyzer-max-loop` limit (anywhere, on any one execution path) -- which became significantly rarer when my commit ensured the analyzer no longer "just assumes" four iterations. (With more inlining significantly more entry points use up their allocated budgets, which leads to the increased runtime.) I feel that this heuristic for the "don't inline" blacklist is unjustified, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope. However, the brutal jump of analysis runtime significantly degrades the user experience, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of opaque loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it).
1 parent a84a6f7 commit 011008b

File tree

6 files changed

+285
-22
lines changed

6 files changed

+285
-22
lines changed

clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -414,6 +414,20 @@ ANALYZER_OPTION(
414414
"any target.",
415415
true)
416416

417+
ANALYZER_OPTION(
418+
bool, LegacyInliningPrevention, "legacy-inlining-prevention",
419+
"If enabled, the analyzer puts functions on a \"do not inline this\" list "
420+
"if it finds an execution path within that function that may potentially "
421+
"perform 'analyzer-max-loop' (= 4 by default) iterations in a loop. "
422+
"Note that functions that _definitely_ reach the loop limit on some "
423+
"execution path are currently marked as \"do not inline\" even if this "
424+
"option is disabled (but this may change in future versions). This option "
425+
"is a dumb and arbitrary restriction on inlining, but disabling it would "
426+
"significantly increase the analysis workload (and the time taken) "
427+
"compared to older clang versions because more top-level functions can "
428+
"use up their 'max-nodes' limit if inlining is not restricted.",
429+
true)
430+
417431
//===----------------------------------------------------------------------===//
418432
// Unsigned analyzer options.
419433
//===----------------------------------------------------------------------===//

clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -81,10 +81,6 @@ class FunctionSummariesTy {
8181
I->second.MayInline = 0;
8282
}
8383

84-
void markReachedMaxBlockCount(const Decl *D) {
85-
markShouldNotInline(D);
86-
}
87-
8884
std::optional<bool> mayInline(const Decl *D) {
8985
MapTy::const_iterator I = Map.find(D);
9086
if (I != Map.end() && I->second.InlineChecked)

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp

Lines changed: 49 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2523,6 +2523,20 @@ bool ExprEngine::replayWithoutInlining(ExplodedNode *N,
25232523
return true;
25242524
}
25252525

2526+
/// Return the innermost location context which is inlined at `Node`, unless
2527+
/// it's the top-level (entry point) location context.
2528+
static const LocationContext *getInlinedLocationContext(ExplodedNode *Node,
2529+
ExplodedGraph &G) {
2530+
const LocationContext *CalleeLC = Node->getLocation().getLocationContext();
2531+
const LocationContext *RootLC =
2532+
(*G.roots_begin())->getLocation().getLocationContext();
2533+
2534+
if (CalleeLC->getStackFrame() == RootLC->getStackFrame())
2535+
return nullptr;
2536+
2537+
return CalleeLC;
2538+
}
2539+
25262540
/// Block entrance. (Update counters).
25272541
void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
25282542
NodeBuilderWithSinks &nodeBuilder,
@@ -2570,21 +2584,24 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
25702584
const ExplodedNode *Sink =
25712585
nodeBuilder.generateSink(Pred->getState(), Pred, &tag);
25722586

2573-
// Check if we stopped at the top level function or not.
2574-
// Root node should have the location context of the top most function.
2575-
const LocationContext *CalleeLC = Pred->getLocation().getLocationContext();
2576-
const LocationContext *CalleeSF = CalleeLC->getStackFrame();
2577-
const LocationContext *RootLC =
2578-
(*G.roots_begin())->getLocation().getLocationContext();
2579-
if (RootLC->getStackFrame() != CalleeSF) {
2580-
Engine.FunctionSummaries->markReachedMaxBlockCount(CalleeSF->getDecl());
2587+
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
2588+
// FIXME: This will unconditionally prevent inlining this function (even
2589+
// from other entrypoints), which is not a reasonable heuristic: even if
2590+
// we reached max block count on this particular execution path, there
2591+
// may be other execution paths (especially with other parametrizations)
2592+
// where the analyzer can reach the end of the function (so there is no
2593+
// natural reason to avoid inlining it). However, disabling this would
2594+
// significantly increase the analysis time (because more entrypoints
2595+
// would exhaust their allocated budget), so it must be compensated by a
2596+
// different (more reasonable) reduction of analysis scope.
2597+
Engine.FunctionSummaries->markShouldNotInline(
2598+
LC->getStackFrame()->getDecl());
25812599

25822600
// Re-run the call evaluation without inlining it, by storing the
25832601
// no-inlining policy in the state and enqueuing the new work item on
25842602
// the list. Replay should almost never fail. Use the stats to catch it
25852603
// if it does.
2586-
if ((!AMgr.options.NoRetryExhausted &&
2587-
replayWithoutInlining(Pred, CalleeLC)))
2604+
if ((!AMgr.options.NoRetryExhausted && replayWithoutInlining(Pred, LC)))
25882605
return;
25892606
NumMaxBlockCountReachedInInlined++;
25902607
} else
@@ -2856,8 +2873,29 @@ void ExprEngine::processBranch(
28562873
// conflicts with the widen-loop analysis option (which is off by
28572874
// default). If we intend to support and stabilize the loop widening,
28582875
// we must ensure that it 'plays nicely' with this logic.
2859-
if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops)
2876+
if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops) {
28602877
Builder.generateNode(StTrue, true, PredN);
2878+
} else if (AMgr.options.LegacyInliningPrevention) {
2879+
// FIXME: There is an ancient and very arbitrary heuristic in
2880+
// `ExprEngine::processCFGBlockEntrance` which prevents all further
2881+
// inlining of a function if it finds an execution path within that
2882+
// function which reaches the `MaxBlockVisitOnPath` limit (a/k/a
2883+
// `analyzer-max-loop`, by default four iterations in a loop). Adding
2884+
// this "don't assume third iteration" logic significantly increased
2885+
// the analysis runtime on some inputs because less functions were
2886+
// arbitrarily excluded from being inlined, so more entrypoints used
2887+
// up their full allocated budget. As a hacky compensation for this,
2888+
// here we apply the "should not inline" mark in cases when the loop
2889+
// could potentially reach the `MaxBlockVisitOnPath` limit without the
2890+
// "don't assume third iteration" logic. This slightly overcompensates
2891+
// (activates if the third iteration can be entered, and will not
2892+
// recognize cases where the fourth iteration would't be completed), but
2893+
// should be good enough for practical purposes.
2894+
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
2895+
Engine.FunctionSummaries->markShouldNotInline(
2896+
LC->getStackFrame()->getDecl());
2897+
}
2898+
}
28612899
}
28622900

28632901
if (StFalse) {

clang/test/Analysis/analyzer-config.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@
9292
// CHECK-NEXT: inline-lambdas = true
9393
// CHECK-NEXT: ipa = dynamic-bifurcate
9494
// CHECK-NEXT: ipa-always-inline-size = 3
95+
// CHECK-NEXT: legacy-inlining-prevention = true
9596
// CHECK-NEXT: max-inlinable-size = 100
9697
// CHECK-NEXT: max-nodes = 225000
9798
// CHECK-NEXT: max-symbol-complexity = 35
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify=expected,default %s
2+
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config legacy-inlining-prevention=false -verify=expected,disabled %s
3+
4+
int getNum(void); // Get an opaque number.
5+
6+
void clang_analyzer_numTimesReached(void);
7+
void clang_analyzer_dump(int arg);
8+
9+
//-----------------------------------------------------------------------------
10+
// Simple case: inlined function never reaches `analyzer-max-loop`.
11+
12+
int inner_simple(void) {
13+
clang_analyzer_numTimesReached(); // expected-warning {{2}}
14+
return 42;
15+
}
16+
17+
int outer_simple(void) {
18+
int x = inner_simple();
19+
int y = inner_simple();
20+
return 53 / (x - y); // expected-warning {{Division by zero}}
21+
}
22+
23+
//-----------------------------------------------------------------------------
24+
// Inlined function always reaches `analyzer-max-loop`.
25+
26+
int inner_fixed_loop_1(void) {
27+
int i;
28+
clang_analyzer_numTimesReached(); // expected-warning {{1}}
29+
for (i = 0; i < 10; i++);
30+
clang_analyzer_numTimesReached(); // no-warning
31+
return 42;
32+
}
33+
34+
int outer_fixed_loop_1(void) {
35+
int x = inner_fixed_loop_1();
36+
int y = inner_fixed_loop_1();
37+
return 53 / (x - y); // no-warning
38+
}
39+
40+
//-----------------------------------------------------------------------------
41+
// Inlined function always reaches `analyzer-max-loop`; inlining is prevented
42+
// even for different entry points.
43+
// This test uses `clang_analyzer_dump` and distinct `arg` values because
44+
// `clang_analyzer_numTimesReached` only counts the paths reaching that node
45+
// during the analysis of one particular entry point, so it cannot distinguish
46+
// "two entry points reached this, both with one path" (where the two reports
47+
// are unified as duplicates by the generic report postprocessing) and "one
48+
// entry point reached this with one path" (where naturally nothing shows that
49+
// the second entry point _tried_ to reach it).
50+
51+
int inner_fixed_loop_2(int arg) {
52+
// Identical copy of inner_fixed_loop_1
53+
int i;
54+
clang_analyzer_dump(arg); // expected-warning {{2}}
55+
for (i = 0; i < 10; i++);
56+
clang_analyzer_dump(arg); // no-warning
57+
return 42;
58+
}
59+
60+
int outer_1_fixed_loop_2(void) {
61+
return inner_fixed_loop_2(1);
62+
}
63+
64+
int outer_2_fixed_loop_2(void) {
65+
return inner_fixed_loop_2(2);
66+
}
67+
68+
//-----------------------------------------------------------------------------
69+
// Inlined function reaches `analyzer-max-loop` only in its second call. The
70+
// function is inlined twice but the second call doesn't finish and ends up
71+
// being conservatively evaluated.
72+
73+
int inner_parametrized_loop_1(int count) {
74+
int i;
75+
clang_analyzer_numTimesReached(); // expected-warning {{2}}
76+
for (i = 0; i < count; i++);
77+
clang_analyzer_numTimesReached(); // expected-warning {{1}}
78+
return 42;
79+
}
80+
81+
int outer_parametrized_loop_1(void) {
82+
int x = inner_parametrized_loop_1(2);
83+
int y = inner_parametrized_loop_1(10);
84+
return 53 / (x - y); // no-warning
85+
}
86+
87+
//-----------------------------------------------------------------------------
88+
// Inlined function reaches `analyzer-max-loop` on its first call, so the
89+
// second call isn't inlined (although it could be fully evaluated).
90+
91+
int inner_parametrized_loop_2(int count) {
92+
int i;
93+
clang_analyzer_numTimesReached(); // expected-warning {{1}}
94+
for (i = 0; i < count; i++);
95+
clang_analyzer_numTimesReached(); // no-warning
96+
return 42;
97+
}
98+
99+
int outer_parametrized_loop_2(void) {
100+
int y = inner_parametrized_loop_2(10);
101+
int x = inner_parametrized_loop_2(2);
102+
return 53 / (x - y); // no-warning
103+
}
104+
105+
//-----------------------------------------------------------------------------
106+
// Inlined function may or may not reach `analyzer-max-loop` depending on an
107+
// opaque check before the loop. This is very similar to the "fixed loop"
108+
// cases: the function is placed on the "don't inline" list when any execution
109+
// path reaches `analyzer-max-loop` (even if other execution paths reach the
110+
// end of the function).
111+
112+
int inner_conditional_loop(void) {
113+
int i;
114+
clang_analyzer_numTimesReached(); // expected-warning {{1}}
115+
if (getNum() == 777) {
116+
for (i = 0; i < 10; i++);
117+
}
118+
clang_analyzer_numTimesReached(); // expected-warning {{1}}
119+
return 42;
120+
}
121+
122+
int outer_1_conditional_loop(void) {
123+
return inner_conditional_loop();
124+
}
125+
126+
int outer_2_conditional_loop(void) {
127+
return inner_conditional_loop();
128+
}
129+
130+
//-----------------------------------------------------------------------------
131+
// Inlined function executes an opaque loop that may or may not reach
132+
// `analyzer-max-loop`. Historically, before the "don't assume third iteration"
133+
// commit (bb27d5e5c6b194a1440b8ac4e5ace68d0ee2a849) this worked like the
134+
// `conditional_loop` cases: the analyzer was able to find a path reaching
135+
// `analyzer-max-loop` so inlining was disabled. After that commit the analyzer
136+
// does not _assume_ a third (or later) iteration (i.e. does not enter those
137+
// iterations if the loop condition is an unknown value), so e.g. this test
138+
// function does not reach `analyzer-max-loop` iterations and the inlining is
139+
// not disabled.
140+
// Unfortunately this change significantly increased the workload and
141+
// runtime of the analyzer (more entry points used up their budget), so the
142+
// option `legacy-inlining-prevention` was introduced and enabled by default to
143+
// suppress the inlining in situations where the "don't assume third iteration"
144+
// logic activates.
145+
// This testcase demonstrate that the inlining is prevented with the default
146+
// `legacy-inlining-prevention=true` config, but is not prevented when this
147+
// option is disabled (set to false).
148+
149+
int inner_opaque_loop_1(void) {
150+
int i;
151+
clang_analyzer_numTimesReached(); // default-warning {{1}} disabled-warning {{2}}
152+
for (i = 0; i < getNum(); i++);
153+
return i;
154+
}
155+
156+
int outer_opaque_loop_1(void) {
157+
int iterCount = inner_opaque_loop_1();
158+
159+
// The first call to `inner_opaque_loop_1()` splits three execution paths that
160+
// differ in the number of performed iterations (0, 1 or 2). The function
161+
// `inner_opaque_loop_1` is added to the "do not inline this" list when the
162+
// path that performed two iterations tries to enter the third iteration (and
163+
// the "don't assume third iteration" logic prevents this) -- but the other
164+
// two paths (which performed 0 and 1 iterations) would reach and inline the
165+
// second `inner_opaque_loop_1()` before this would happen (because the
166+
// default traversal is a complex heuristic that happens to prefer this). The
167+
// following `if` will discard these "early exit" paths to highlight the
168+
// difference between the default and disabled state:
169+
if (iterCount < 2)
170+
return 0;
171+
172+
return inner_opaque_loop_1();
173+
}
174+
175+
//-----------------------------------------------------------------------------
176+
// Another less contrived testcase that demonstrates the difference between the
177+
// enabled (default) and disabled state of `legacy-inlining-prevention`.
178+
// Here the two calls to `inner_opaque_loop_2()` are in different entry points
179+
// so the first call is fully analyzed (and can put the function on the "do
180+
// not inline" list) before reaching the second call.
181+
// This test uses `clang_analyzer_dump` because (as explained in an earlier
182+
// comment block) `clang_analyzer_numTimesReached` is not suitable for counting
183+
// visits from separate entry points.
184+
185+
int inner_opaque_loop_2(int arg) {
186+
int i;
187+
clang_analyzer_dump(arg); // default-warning {{2}}
188+
// disabled-warning@-1 {{1}} disabled-warning@-1 {{2}}
189+
for (i = 0; i < getNum(); i++);
190+
return i;
191+
}
192+
193+
int outer_1_opaque_loop_2(void) {
194+
return inner_opaque_loop_2(1);
195+
}
196+
int outer_2_opaque_loop(void) {
197+
return inner_opaque_loop_2(2);
198+
}

clang/test/Analysis/loop-unrolling.cpp

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
2-
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify -std=c++14 -DDFS=1 %s
1+
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify=expected,default -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
2+
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify=expected,dfs -std=c++14 %s
33

44
void clang_analyzer_numTimesReached();
55
void clang_analyzer_warnIfReached();
@@ -337,6 +337,7 @@ int nested_both_unrolled() {
337337
}
338338

339339
int simple_known_bound_loop() {
340+
// Iteration count visible: can be unrolled and fully executed.
340341
for (int i = 2; i < 12; i++) {
341342
// This function is inlined in nested_inlined_unroll1()
342343
clang_analyzer_numTimesReached(); // expected-warning {{90}}
@@ -345,27 +346,42 @@ int simple_known_bound_loop() {
345346
}
346347

347348
int simple_unknown_bound_loop() {
349+
// Iteration count unknown: unrolling won't happen and the execution will be
350+
// split two times:
351+
// (1) split between skipped loop (immediate exit) and entering the loop
352+
// (2) split between exit after 1 iteration and entering the second iteration
353+
// After these there is no third state split because the "don't assume third
354+
// iteration" logic in `ExprEngine::processBranch` prevents it; but the
355+
// `legacy-inlining-prevention` logic will put this function onto the list of
356+
// functions that may not be inlined in the future.
357+
// The exploration strategy apparently influences the number of times this
358+
// function can be inlined before it's placed on the "don't inline" list.
348359
for (int i = 2; i < getNum(); i++) {
349-
clang_analyzer_numTimesReached(); // expected-warning {{8}}
360+
clang_analyzer_numTimesReached(); // default-warning {{4}} dfs-warning {{8}}
350361
}
351362
return 0;
352363
}
353364

354365
int nested_inlined_unroll1() {
366+
// Here the analyzer can unroll and fully execute both the outer loop and the
367+
// inner loop within simple_known_bound_loop().
355368
int k;
356369
for (int i = 0; i < 9; i++) {
357370
clang_analyzer_numTimesReached(); // expected-warning {{9}}
358-
k = simple_known_bound_loop(); // no reevaluation without inlining
371+
k = simple_known_bound_loop();
359372
}
360373
int a = 22 / k; // expected-warning {{Division by zero}}
361374
return 0;
362375
}
363376

364377
int nested_inlined_no_unroll1() {
378+
// Here no unrolling happens and we only run `analyzer-max-loop` (= 4)
379+
// iterations of the loop within this function, but some state splits happen
380+
// in `simple_unknown_bound_loop()` calls.
365381
int k;
366-
for (int i = 0; i < 9; i++) {
367-
clang_analyzer_numTimesReached(); // expected-warning {{10}}
368-
k = simple_unknown_bound_loop(); // reevaluation without inlining, splits the state as well
382+
for (int i = 0; i < 40; i++) {
383+
clang_analyzer_numTimesReached(); // default-warning {{9}} dfs-warning {{12}}
384+
k = simple_unknown_bound_loop();
369385
}
370386
int a = 22 / k; // no-warning
371387
return 0;

0 commit comments

Comments
 (0)