Skip to content

Commit 195daf6

Browse files
rh-ulrich-otorvalds
authored andcommitted
watchdog: enable the new user interface of the watchdog mechanism
With the current user interface of the watchdog mechanism it is only possible to disable or enable both lockup detectors at the same time. This series introduces new kernel parameters and changes the semantics of some existing kernel parameters, so that the hard lockup detector and the soft lockup detector can be disabled or enabled individually. With this series applied, the user interface is as follows. - parameters in /proc/sys/kernel . soft_watchdog This is a new parameter to control and examine the run state of the soft lockup detector. . nmi_watchdog The semantics of this parameter have changed. It can now be used to control and examine the run state of the hard lockup detector. . watchdog This parameter is still available to control the run state of both lockup detectors at the same time. If this parameter is examined, it shows the logical OR of soft_watchdog and nmi_watchdog. . watchdog_thresh The semantics of this parameter are not affected by the patch. - kernel command line parameters . nosoftlockup The semantics of this parameter have changed. It can now be used to disable the soft lockup detector at boot time. . nmi_watchdog=0 or nmi_watchdog=1 Disable or enable the hard lockup detector at boot time. The patch introduces '=1' as a new option. . nowatchdog The semantics of this parameter are not affected by the patch. It is still available to disable both lockup detectors at boot time. Also, remove the proc_dowatchdog() function which is no longer needed. [[email protected]: wrote changelog] [[email protected]: update documentation for kernel params and sysctl] Signed-off-by: Ulrich Obergfell <[email protected]> Signed-off-by: Don Zickus <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent bcfba4f commit 195daf6

File tree

5 files changed

+97
-89
lines changed

5 files changed

+97
-89
lines changed

Documentation/kernel-parameters.txt

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2236,8 +2236,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
22362236

22372237
nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels
22382238
Format: [panic,][nopanic,][num]
2239-
Valid num: 0
2239+
Valid num: 0 or 1
22402240
0 - turn nmi_watchdog off
2241+
1 - turn nmi_watchdog on
22412242
When panic is specified, panic when an NMI watchdog
22422243
timeout occurs (or 'nopanic' to override the opposite
22432244
default).
@@ -2464,7 +2465,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
24642465

24652466
nousb [USB] Disable the USB subsystem
24662467

2467-
nowatchdog [KNL] Disable the lockup detector (NMI watchdog).
2468+
nowatchdog [KNL] Disable both lockup detectors, i.e.
2469+
soft-lockup and NMI watchdog (hard-lockup).
24682470

24692471
nowb [ARM]
24702472

Documentation/sysctl/kernel.txt

Lines changed: 53 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -77,12 +77,14 @@ show up in /proc/sys/kernel:
7777
- shmmax [ sysv ipc ]
7878
- shmmni
7979
- softlockup_all_cpu_backtrace
80+
- soft_watchdog
8081
- stop-a [ SPARC only ]
8182
- sysrq ==> Documentation/sysrq.txt
8283
- sysctl_writes_strict
8384
- tainted
8485
- threads-max
8586
- unknown_nmi_panic
87+
- watchdog
8688
- watchdog_thresh
8789
- version
8890

@@ -417,16 +419,23 @@ successful IPC object allocation.
417419

418420
nmi_watchdog:
419421

420-
Enables/Disables the NMI watchdog on x86 systems. When the value is
421-
non-zero the NMI watchdog is enabled and will continuously test all
422-
online cpus to determine whether or not they are still functioning
423-
properly. Currently, passing "nmi_watchdog=" parameter at boot time is
424-
required for this function to work.
422+
This parameter can be used to control the NMI watchdog
423+
(i.e. the hard lockup detector) on x86 systems.
425424

426-
If LAPIC NMI watchdog method is in use (nmi_watchdog=2 kernel
427-
parameter), the NMI watchdog shares registers with oprofile. By
428-
disabling the NMI watchdog, oprofile may have more registers to
429-
utilize.
425+
0 - disable the hard lockup detector
426+
1 - enable the hard lockup detector
427+
428+
The hard lockup detector monitors each CPU for its ability to respond to
429+
timer interrupts. The mechanism utilizes CPU performance counter registers
430+
that are programmed to generate Non-Maskable Interrupts (NMIs) periodically
431+
while a CPU is busy. Hence, the alternative name 'NMI watchdog'.
432+
433+
The NMI watchdog is disabled by default if the kernel is running as a guest
434+
in a KVM virtual machine. This default can be overridden by adding
435+
436+
nmi_watchdog=1
437+
438+
to the guest kernel command line (see Documentation/kernel-parameters.txt).
430439

431440
==============================================================
432441

@@ -816,6 +825,22 @@ NMI.
816825

817826
==============================================================
818827

828+
soft_watchdog
829+
830+
This parameter can be used to control the soft lockup detector.
831+
832+
0 - disable the soft lockup detector
833+
1 - enable the soft lockup detector
834+
835+
The soft lockup detector monitors CPUs for threads that are hogging the CPUs
836+
without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
837+
from running. The mechanism depends on the CPUs ability to respond to timer
838+
interrupts which are needed for the 'watchdog/N' threads to be woken up by
839+
the watchdog timer function, otherwise the NMI watchdog - if enabled - can
840+
detect a hard lockup condition.
841+
842+
==============================================================
843+
819844
tainted:
820845

821846
Non-zero if the kernel has been tainted. Numeric values, which
@@ -858,6 +883,25 @@ example. If a system hangs up, try pressing the NMI switch.
858883

859884
==============================================================
860885

886+
watchdog:
887+
888+
This parameter can be used to disable or enable the soft lockup detector
889+
_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time.
890+
891+
0 - disable both lockup detectors
892+
1 - enable both lockup detectors
893+
894+
The soft lockup detector and the NMI watchdog can also be disabled or
895+
enabled individually, using the soft_watchdog and nmi_watchdog parameters.
896+
If the watchdog parameter is read, for example by executing
897+
898+
cat /proc/sys/kernel/watchdog
899+
900+
the output of this command (0 or 1) shows the logical OR of soft_watchdog
901+
and nmi_watchdog.
902+
903+
==============================================================
904+
861905
watchdog_thresh:
862906

863907
This value can be used to control the frequency of hrtimer and NMI

include/linux/nmi.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,6 @@ extern int proc_soft_watchdog(struct ctl_table *, int ,
8282
void __user *, size_t *, loff_t *);
8383
extern int proc_watchdog_thresh(struct ctl_table *, int ,
8484
void __user *, size_t *, loff_t *);
85-
extern int proc_dowatchdog(struct ctl_table *, int ,
86-
void __user *, size_t *, loff_t *);
8785
#endif
8886

8987
#ifdef CONFIG_HAVE_ACPI_APEI_NMI

kernel/sysctl.c

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -846,7 +846,7 @@ static struct ctl_table kern_table[] = {
846846
.data = &watchdog_user_enabled,
847847
.maxlen = sizeof (int),
848848
.mode = 0644,
849-
.proc_handler = proc_dowatchdog,
849+
.proc_handler = proc_watchdog,
850850
.extra1 = &zero,
851851
.extra2 = &one,
852852
},
@@ -855,10 +855,32 @@ static struct ctl_table kern_table[] = {
855855
.data = &watchdog_thresh,
856856
.maxlen = sizeof(int),
857857
.mode = 0644,
858-
.proc_handler = proc_dowatchdog,
858+
.proc_handler = proc_watchdog_thresh,
859859
.extra1 = &zero,
860860
.extra2 = &sixty,
861861
},
862+
{
863+
.procname = "nmi_watchdog",
864+
.data = &nmi_watchdog_enabled,
865+
.maxlen = sizeof (int),
866+
.mode = 0644,
867+
.proc_handler = proc_nmi_watchdog,
868+
.extra1 = &zero,
869+
#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
870+
.extra2 = &one,
871+
#else
872+
.extra2 = &zero,
873+
#endif
874+
},
875+
{
876+
.procname = "soft_watchdog",
877+
.data = &soft_watchdog_enabled,
878+
.maxlen = sizeof (int),
879+
.mode = 0644,
880+
.proc_handler = proc_soft_watchdog,
881+
.extra1 = &zero,
882+
.extra2 = &one,
883+
},
862884
{
863885
.procname = "softlockup_panic",
864886
.data = &softlockup_panic,
@@ -879,15 +901,6 @@ static struct ctl_table kern_table[] = {
879901
.extra2 = &one,
880902
},
881903
#endif /* CONFIG_SMP */
882-
{
883-
.procname = "nmi_watchdog",
884-
.data = &watchdog_user_enabled,
885-
.maxlen = sizeof (int),
886-
.mode = 0644,
887-
.proc_handler = proc_dowatchdog,
888-
.extra1 = &zero,
889-
.extra2 = &one,
890-
},
891904
#endif
892905
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
893906
{

kernel/watchdog.c

Lines changed: 16 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -110,15 +110,9 @@ static int __init hardlockup_panic_setup(char *str)
110110
else if (!strncmp(str, "nopanic", 7))
111111
hardlockup_panic = 0;
112112
else if (!strncmp(str, "0", 1))
113-
watchdog_user_enabled = 0;
114-
else if (!strncmp(str, "1", 1) || !strncmp(str, "2", 1)) {
115-
/*
116-
* Setting 'nmi_watchdog=1' or 'nmi_watchdog=2' (legacy option)
117-
* has the same effect.
118-
*/
119-
watchdog_user_enabled = 1;
120-
watchdog_enable_hardlockup_detector(true);
121-
}
113+
watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
114+
else if (!strncmp(str, "1", 1))
115+
watchdog_enabled |= NMI_WATCHDOG_ENABLED;
122116
return 1;
123117
}
124118
__setup("nmi_watchdog=", hardlockup_panic_setup);
@@ -137,19 +131,18 @@ __setup("softlockup_panic=", softlockup_panic_setup);
137131

138132
static int __init nowatchdog_setup(char *str)
139133
{
140-
watchdog_user_enabled = 0;
134+
watchdog_enabled = 0;
141135
return 1;
142136
}
143137
__setup("nowatchdog", nowatchdog_setup);
144138

145-
/* deprecated */
146139
static int __init nosoftlockup_setup(char *str)
147140
{
148-
watchdog_user_enabled = 0;
141+
watchdog_enabled &= ~SOFT_WATCHDOG_ENABLED;
149142
return 1;
150143
}
151144
__setup("nosoftlockup", nosoftlockup_setup);
152-
/* */
145+
153146
#ifdef CONFIG_SMP
154147
static int __init softlockup_all_cpu_backtrace_setup(char *str)
155148
{
@@ -264,10 +257,11 @@ static int is_softlockup(unsigned long touch_ts)
264257
{
265258
unsigned long now = get_timestamp();
266259

267-
/* Warn about unreasonable delays: */
268-
if (time_after(now, touch_ts + get_softlockup_thresh()))
269-
return now - touch_ts;
270-
260+
if (watchdog_enabled & SOFT_WATCHDOG_ENABLED) {
261+
/* Warn about unreasonable delays. */
262+
if (time_after(now, touch_ts + get_softlockup_thresh()))
263+
return now - touch_ts;
264+
}
271265
return 0;
272266
}
273267

@@ -532,6 +526,10 @@ static int watchdog_nmi_enable(unsigned int cpu)
532526
struct perf_event_attr *wd_attr;
533527
struct perf_event *event = per_cpu(watchdog_ev, cpu);
534528

529+
/* nothing to do if the hard lockup detector is disabled */
530+
if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
531+
goto out;
532+
535533
/*
536534
* Some kernels need to default hard lockup detection to
537535
* 'disabled', for example a guest on a hypervisor.
@@ -856,59 +854,12 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
856854
mutex_unlock(&watchdog_proc_mutex);
857855
return err;
858856
}
859-
860-
/*
861-
* proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh
862-
*/
863-
864-
int proc_dowatchdog(struct ctl_table *table, int write,
865-
void __user *buffer, size_t *lenp, loff_t *ppos)
866-
{
867-
int err, old_thresh, old_enabled;
868-
bool old_hardlockup;
869-
870-
mutex_lock(&watchdog_proc_mutex);
871-
old_thresh = ACCESS_ONCE(watchdog_thresh);
872-
old_enabled = ACCESS_ONCE(watchdog_user_enabled);
873-
old_hardlockup = watchdog_hardlockup_detector_is_enabled();
874-
875-
err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
876-
if (err || !write)
877-
goto out;
878-
879-
set_sample_period();
880-
/*
881-
* Watchdog threads shouldn't be enabled if they are
882-
* disabled. The 'watchdog_running' variable check in
883-
* watchdog_*_all_cpus() function takes care of this.
884-
*/
885-
if (watchdog_user_enabled && watchdog_thresh) {
886-
/*
887-
* Prevent a change in watchdog_thresh accidentally overriding
888-
* the enablement of the hardlockup detector.
889-
*/
890-
if (watchdog_user_enabled != old_enabled)
891-
watchdog_enable_hardlockup_detector(true);
892-
err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
893-
} else
894-
watchdog_disable_all_cpus();
895-
896-
/* Restore old values on failure */
897-
if (err) {
898-
watchdog_thresh = old_thresh;
899-
watchdog_user_enabled = old_enabled;
900-
watchdog_enable_hardlockup_detector(old_hardlockup);
901-
}
902-
out:
903-
mutex_unlock(&watchdog_proc_mutex);
904-
return err;
905-
}
906857
#endif /* CONFIG_SYSCTL */
907858

908859
void __init lockup_detector_init(void)
909860
{
910861
set_sample_period();
911862

912-
if (watchdog_user_enabled)
863+
if (watchdog_enabled)
913864
watchdog_enable_all_cpus(false);
914865
}

0 commit comments

Comments
 (0)