Skip to content

Commit 71eb9ee

Browse files
Stephane EranianIngo Molnar
authored andcommitted
perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
this patch fix a bug in how the pebs->real_ip is handled in the PEBS handler. real_ip only exists in Haswell and later processor. It is actually the eventing IP, i.e., where the event occurred. As opposed to the pebs->ip which is the PEBS interrupt IP which is always off by one. The problem is that the real_ip just like the IP needs to be fixed up because PEBS does not record all the machine state registers, and in particular the code segement (cs). This is why we have the set_linear_ip() function. The problem was that set_linear_ip() was only used on the pebs->ip and not the pebs->real_ip. We have profiles which ran into invalid callstacks because of this. Here is an example: ..... 0: ffffffffffffff80 recent entry, marker kernel v ..... 1: 000000000040044d <= user address in kernel space! ..... 2: fffffffffffffe00 marker enter user v ..... 3: 000000000040044d ..... 4: 00000000004004b6 oldest entry Debugging output in get_perf_callchain(): [ 857.769909] CALLCHAIN: CPU8 ip=40044d regs->cs=10 user_mode(regs)=0 The problem is that the kernel entry in 1: points to a user level address. How can that be? The reason is that with PEBS sampling the instruction that caused the event to occur and the instruction where the CPU was when the interrupt was posted may be far apart. And sometime during that time window, the privilege level may change. This happens, for instance, when the PEBS sample is taken close to a kernel entry point. Here PEBS, eventing IP (real_ip) captured a user level instruction. But by the time the PMU interrupt fired, the processor had already entered kernel space. This is why the debug output shows a user address with user_mode() false. The problem comes from PEBS not recording the code segment (cs) register. The register is used in x86_64 to determine if executing in kernel vs user space. This is okay because the kernel has a software workaround called set_linear_ip(). But the issue in setup_pebs_sample_data() is that set_linear_ip() is never called on the real_ip value when it is available (Haswell and later) and precise_ip > 1. This patch fixes this problem and eliminates the callchain discrepancy. The patch restructures the code around set_linear_ip() to minimize the number of times the IP has to be set. Signed-off-by: Stephane Eranian <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
1 parent 3eb2ce8 commit 71eb9ee

File tree

1 file changed

+17
-8
lines changed
  • arch/x86/events/intel

1 file changed

+17
-8
lines changed

arch/x86/events/intel/ds.c

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1153,6 +1153,7 @@ static void setup_pebs_sample_data(struct perf_event *event,
11531153
if (pebs == NULL)
11541154
return;
11551155

1156+
regs->flags &= ~PERF_EFLAGS_EXACT;
11561157
sample_type = event->attr.sample_type;
11571158
dsrc = sample_type & PERF_SAMPLE_DATA_SRC;
11581159

@@ -1197,7 +1198,6 @@ static void setup_pebs_sample_data(struct perf_event *event,
11971198
*/
11981199
*regs = *iregs;
11991200
regs->flags = pebs->flags;
1200-
set_linear_ip(regs, pebs->ip);
12011201

12021202
if (sample_type & PERF_SAMPLE_REGS_INTR) {
12031203
regs->ax = pebs->ax;
@@ -1233,13 +1233,22 @@ static void setup_pebs_sample_data(struct perf_event *event,
12331233
#endif
12341234
}
12351235

1236-
if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
1237-
regs->ip = pebs->real_ip;
1238-
regs->flags |= PERF_EFLAGS_EXACT;
1239-
} else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(regs))
1240-
regs->flags |= PERF_EFLAGS_EXACT;
1241-
else
1242-
regs->flags &= ~PERF_EFLAGS_EXACT;
1236+
if (event->attr.precise_ip > 1) {
1237+
/* Haswell and later have the eventing IP, so use it: */
1238+
if (x86_pmu.intel_cap.pebs_format >= 2) {
1239+
set_linear_ip(regs, pebs->real_ip);
1240+
regs->flags |= PERF_EFLAGS_EXACT;
1241+
} else {
1242+
/* Otherwise use PEBS off-by-1 IP: */
1243+
set_linear_ip(regs, pebs->ip);
1244+
1245+
/* ... and try to fix it up using the LBR entries: */
1246+
if (intel_pmu_pebs_fixup_ip(regs))
1247+
regs->flags |= PERF_EFLAGS_EXACT;
1248+
}
1249+
} else
1250+
set_linear_ip(regs, pebs->ip);
1251+
12431252

12441253
if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) &&
12451254
x86_pmu.intel_cap.pebs_format >= 1)

0 commit comments

Comments
 (0)