Skip to content

Commit c53e14f

Browse files
Kan LiangPeter Zijlstra
authored andcommitted
perf: Extend per event callchain limit to branch stack
The commit 97c79a3 ("perf core: Per event callchain limit") introduced a per-event term to allow finer tuning of the depth of callchains to save space. It should be applied to the branch stack as well. For example, autoFDO collections require maximum LBR entries. In the meantime, other system-wide LBR users may only be interested in the latest a few number of LBRs. A per-event LBR depth would save the perf output buffer. The patch simply drops the uninterested branches, but HW still collects the maximum branches. There may be a model-specific optimization that can reduce the HW depth for some cases to reduce the overhead further. But it isn't included in the patch set. Because it's not useful for all cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect LBRs. The depth should have less impact on the collecting overhead. The model-specific optimization may be implemented later separately. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent c96fff3 commit c53e14f

File tree

2 files changed

+5
-0
lines changed

2 files changed

+5
-0
lines changed

include/linux/perf_event.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1347,6 +1347,9 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
13471347

13481348
if (branch_sample_hw_index(event))
13491349
size += sizeof(u64);
1350+
1351+
brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr);
1352+
13501353
size += brs->nr * sizeof(struct perf_branch_entry);
13511354

13521355
/*

include/uapi/linux/perf_event.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -385,6 +385,8 @@ enum perf_event_read_format {
385385
*
386386
* @sample_max_stack: Max number of frame pointers in a callchain,
387387
* should be < /proc/sys/kernel/perf_event_max_stack
388+
* Max number of entries of branch stack
389+
* should be < hardware limit
388390
*/
389391
struct perf_event_attr {
390392

0 commit comments

Comments
 (0)