Skip to content

Commit bcfba4f

Browse files
rh-ulrich-otorvalds
authored andcommitted
watchdog: implement error handling for failure to set up hardware perf events
If watchdog_nmi_enable() fails to set up the hardware perf event of one CPU, the entire hard lockup detector is deemed unreliable. Hence, disable the hard lockup detector and shut down the hardware perf events on all CPUs. [[email protected]: update comments to explain some code] Signed-off-by: Ulrich Obergfell <[email protected]> Signed-off-by: Don Zickus <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 83a80a3 commit bcfba4f

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

kernel/watchdog.c

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -502,6 +502,21 @@ static void watchdog(unsigned int cpu)
502502
__this_cpu_write(soft_lockup_hrtimer_cnt,
503503
__this_cpu_read(hrtimer_interrupts));
504504
__touch_watchdog();
505+
506+
/*
507+
* watchdog_nmi_enable() clears the NMI_WATCHDOG_ENABLED bit in the
508+
* failure path. Check for failures that can occur asynchronously -
509+
* for example, when CPUs are on-lined - and shut down the hardware
510+
* perf event on each CPU accordingly.
511+
*
512+
* The only non-obvious place this bit can be cleared is through
513+
* watchdog_nmi_enable(), so a pr_info() is placed there. Placing a
514+
* pr_info here would be too noisy as it would result in a message
515+
* every few seconds if the hardlockup was disabled but the softlockup
516+
* enabled.
517+
*/
518+
if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
519+
watchdog_nmi_disable(cpu);
505520
}
506521

507522
#ifdef CONFIG_HARDLOCKUP_DETECTOR
@@ -552,6 +567,18 @@ static int watchdog_nmi_enable(unsigned int cpu)
552567
goto out_save;
553568
}
554569

570+
/*
571+
* Disable the hard lockup detector if _any_ CPU fails to set up
572+
* set up the hardware perf event. The watchdog() function checks
573+
* the NMI_WATCHDOG_ENABLED bit periodically.
574+
*
575+
* The barriers are for syncing up watchdog_enabled across all the
576+
* cpus, as clear_bit() does not use barriers.
577+
*/
578+
smp_mb__before_atomic();
579+
clear_bit(NMI_WATCHDOG_ENABLED_BIT, &watchdog_enabled);
580+
smp_mb__after_atomic();
581+
555582
/* skip displaying the same error again */
556583
if (cpu > 0 && (PTR_ERR(event) == cpu0_err))
557584
return PTR_ERR(event);
@@ -565,6 +592,9 @@ static int watchdog_nmi_enable(unsigned int cpu)
565592
else
566593
pr_err("disabled (cpu%i): unable to create perf event: %ld\n",
567594
cpu, PTR_ERR(event));
595+
596+
pr_info("Shutting down hard lockup detector on all cpus\n");
597+
568598
return PTR_ERR(event);
569599

570600
/* success path */

0 commit comments

Comments
 (0)