Skip to content

Commit 28b166a

Browse files
aristeuIngo Molnar
authored andcommitted
x86, NMI watchdog: when booting with reset_devices, clear the performance counters
P4s have a quirk that makes necessary to clear P4_CCCR_OVF bit on the CCCR everytime the PMI is triggered. When booting the kernel with reset_devices (more specific kdump case), the counters reach zero and the PMI will be generated. This is not a problem on other processors but on P4s, it'll continue to generate NMIs until that bit is cleared. Since there may be other users of the performance counters, clear and disable all of them when booting with reset_devices option. We have a P4 box here that crashes because of this problem. Since the kdump kernel usually boots with only one processor active, the second logical unit won't be set up, therefore, MSR_P4_IQ_CCCR1 (and other performance counter registers) won't be cleared and P4_CCCR_OVF may be still set because the previous kernel was using this register. An NMI is triggered because of the MSR_P4_IQ_CCCR1 right after the NMI delivery is enabled, triggering the race fixed on my previous email. Signed-off-by: Aristeu Rozanski <[email protected]> Acked-by: Don Zickus <[email protected]> Acked-by: Prarit Bhargava <[email protected]> Acked-by: Vivek Goyal <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
1 parent 72d3105 commit 28b166a

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

arch/x86/kernel/cpu/perfctr-watchdog.c

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,27 @@ static const struct wd_ops p6_wd_ops = {
432432
#define P4_CCCR_ENABLE (1 << 12)
433433
#define P4_CCCR_OVF (1 << 31)
434434

435+
#define P4_CONTROLS 18
436+
static unsigned int p4_controls[18] = {
437+
MSR_P4_BPU_CCCR0,
438+
MSR_P4_BPU_CCCR1,
439+
MSR_P4_BPU_CCCR2,
440+
MSR_P4_BPU_CCCR3,
441+
MSR_P4_MS_CCCR0,
442+
MSR_P4_MS_CCCR1,
443+
MSR_P4_MS_CCCR2,
444+
MSR_P4_MS_CCCR3,
445+
MSR_P4_FLAME_CCCR0,
446+
MSR_P4_FLAME_CCCR1,
447+
MSR_P4_FLAME_CCCR2,
448+
MSR_P4_FLAME_CCCR3,
449+
MSR_P4_IQ_CCCR0,
450+
MSR_P4_IQ_CCCR1,
451+
MSR_P4_IQ_CCCR2,
452+
MSR_P4_IQ_CCCR3,
453+
MSR_P4_IQ_CCCR4,
454+
MSR_P4_IQ_CCCR5,
455+
};
435456
/*
436457
* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
437458
* CRU_ESCR0 (with any non-null event selector) through a complemented
@@ -473,6 +494,26 @@ static int setup_p4_watchdog(unsigned nmi_hz)
473494
evntsel_msr = MSR_P4_CRU_ESCR0;
474495
cccr_msr = MSR_P4_IQ_CCCR0;
475496
cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
497+
498+
/*
499+
* If we're on the kdump kernel or other situation, we may
500+
* still have other performance counter registers set to
501+
* interrupt and they'll keep interrupting forever because
502+
* of the P4_CCCR_OVF quirk. So we need to ACK all the
503+
* pending interrupts and disable all the registers here,
504+
* before reenabling the NMI delivery. Refer to p4_rearm()
505+
* about the P4_CCCR_OVF quirk.
506+
*/
507+
if (reset_devices) {
508+
unsigned int low, high;
509+
int i;
510+
511+
for (i = 0; i < P4_CONTROLS; i++) {
512+
rdmsr(p4_controls[i], low, high);
513+
low &= ~(P4_CCCR_ENABLE | P4_CCCR_OVF);
514+
wrmsr(p4_controls[i], low, high);
515+
}
516+
}
476517
} else {
477518
/* logical cpu 1 */
478519
perfctr_msr = MSR_P4_IQ_PERFCTR1;

0 commit comments

Comments
 (0)