Skip to content

Commit 680d125

Browse files
Kan Liangacmel
authored andcommitted
perf script: Add option to enable the LBR stitching approach
With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number of samples with stitched LBRs are huge. Add an option to enable the approach. Committer testing: Using the same perf.data as with the latest cset committer testing section: $ perf script --stitch-lbr <SNIP> tchain_edit 11131 15164.984292: 437491 cycles:u: 401106 f43+0x0 (/wb/tchain_edit) 40114c f42+0x18 (/wb/tchain_edit) 401172 f41+0xe (/wb/tchain_edit) 401194 f40+0x0 (/wb/tchain_edit) 40119b f39+0x0 (/wb/tchain_edit) 4011a2 f38+0x0 (/wb/tchain_edit) 4011a9 f37+0x0 (/wb/tchain_edit) 4011b0 f36+0x0 (/wb/tchain_edit) 4011b7 f35+0x0 (/wb/tchain_edit) 4011be f34+0x0 (/wb/tchain_edit) 4011c5 f33+0x0 (/wb/tchain_edit) 4011cc f32+0x0 (/wb/tchain_edit) 401207 f31+0x34 (/wb/tchain_edit) 401212 f30+0x0 (/wb/tchain_edit) 401219 f29+0x0 (/wb/tchain_edit) 401220 f28+0x0 (/wb/tchain_edit) 401227 f27+0x0 (/wb/tchain_edit) 40122e f26+0x0 (/wb/tchain_edit) 401235 f25+0x0 (/wb/tchain_edit) 40123c f24+0x0 (/wb/tchain_edit) 401243 f23+0x0 (/wb/tchain_edit) 40124a f22+0x0 (/wb/tchain_edit) 401251 f21+0x0 (/wb/tchain_edit) 401258 f20+0x0 (/wb/tchain_edit) 40125f f19+0x0 (/wb/tchain_edit) 401266 f18+0x0 (/wb/tchain_edit) 40126d f17+0x0 (/wb/tchain_edit) 401274 f16+0x0 (/wb/tchain_edit) 40127b f15+0x0 (/wb/tchain_edit) 401282 f14+0x0 (/wb/tchain_edit) 401289 f13+0x0 (/wb/tchain_edit) 401290 f12+0x0 (/wb/tchain_edit) 401297 f11+0x0 (/wb/tchain_edit) 40129e f10+0x0 (/wb/tchain_edit) 4012a5 f9+0x0 (/wb/tchain_edit) 4012ac f8+0x0 (/wb/tchain_edit) 4012b3 f7+0x0 (/wb/tchain_edit) 4012ba f6+0x0 (/wb/tchain_edit) 4012c1 f5+0x0 (/wb/tchain_edit) 4012c8 f4+0x0 (/wb/tchain_edit) 4012cf f3+0x0 (/wb/tchain_edit) 4012d6 f2+0x0 (/wb/tchain_edit) 4012dd f1+0x0 (/wb/tchain_edit) 4012e4 main+0x0 (/wb/tchain_edit) 7f41a5016f41 __libc_start_main+0xf1 (/usr/lib64/libc-2.29.so) <SNIP> $ Signed-off-by: Kan Liang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexey Budankov <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Pavel Gerasimov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vitaly Slobodskoy <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
1 parent b1d1429 commit 680d125

File tree

2 files changed

+23
-0
lines changed

2 files changed

+23
-0
lines changed

tools/perf/Documentation/perf-script.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -440,6 +440,17 @@ include::itrace.txt[]
440440
--show-on-off-events::
441441
Show the --switch-on/off events too.
442442

443+
--stitch-lbr::
444+
Show callgraph with stitched LBRs, which may have more complete
445+
callgraph. The perf.data file must have been obtained using
446+
perf record --call-graph lbr.
447+
Disabled by default. In common cases with call stack overflows,
448+
it can recreate better call stacks than the default lbr call stack
449+
output. But this approach is not full proof. There can be cases
450+
where it creates incorrect call stacks from incorrect matches.
451+
The known limitations include exception handing such as
452+
setjmp/longjmp will have calls/returns not match.
453+
443454
SEE ALSO
444455
--------
445456
linkperf:perf-record[1], linkperf:perf-script-perl[1],

tools/perf/builtin-script.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1697,6 +1697,7 @@ struct perf_script {
16971697
bool show_cgroup_events;
16981698
bool allocated;
16991699
bool per_event_dump;
1700+
bool stitch_lbr;
17001701
struct evswitch evswitch;
17011702
struct perf_cpu_map *cpus;
17021703
struct perf_thread_map *threads;
@@ -1923,6 +1924,9 @@ static void process_event(struct perf_script *script,
19231924
if (PRINT_FIELD(IP)) {
19241925
struct callchain_cursor *cursor = NULL;
19251926

1927+
if (script->stitch_lbr)
1928+
al->thread->lbr_stitch_enable = true;
1929+
19261930
if (symbol_conf.use_callchain && sample->callchain &&
19271931
thread__resolve_callchain(al->thread, &callchain_cursor, evsel,
19281932
sample, NULL, NULL, scripting_max_stack) == 0)
@@ -3170,6 +3174,12 @@ static void script__setup_sample_type(struct perf_script *script)
31703174
else
31713175
callchain_param.record_mode = CALLCHAIN_FP;
31723176
}
3177+
3178+
if (script->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
3179+
pr_warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
3180+
"Please apply --call-graph lbr when recording.\n");
3181+
script->stitch_lbr = false;
3182+
}
31733183
}
31743184

31753185
static int process_stat_round_event(struct perf_session *session,
@@ -3481,6 +3491,8 @@ int cmd_script(int argc, const char **argv)
34813491
"file", "file saving guest os /proc/kallsyms"),
34823492
OPT_STRING(0, "guestmodules", &symbol_conf.default_guest_modules,
34833493
"file", "file saving guest os /proc/modules"),
3494+
OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
3495+
"Enable LBR callgraph stitching approach"),
34843496
OPTS_EVSWITCH(&script.evswitch),
34853497
OPT_END()
34863498
};

0 commit comments

Comments
 (0)