Skip to content

Commit 5097cbc

Browse files
oleg-nesterovIngo Molnar
authored andcommitted
sched/isolation: Prevent boot crash when the boot CPU is nohz_full
Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not include the boot CPU, which is no longer true after: 08ae95f ("nohz_full: Allow the boot CPU to be nohz_full"). However after: aae17eb ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work") the kernel will crash at boot time in this case; housekeeping_any_cpu() returns an invalid CPU number until smp_init() brings the first housekeeping CPU up. Change housekeeping_any_cpu() to check the result of cpumask_any_and() and return smp_processor_id() in this case. This is just the simple and backportable workaround which fixes the symptom, but smp_processor_id() at boot time should be safe at least for type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu logic. There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to offline tick_do_timer_cpu (the 1st online housekeeping CPU). Fixes: aae17eb ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work") Reported-by: Chris von Recklinghausen <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Phil Auld <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected] Closes: https://lore.kernel.org/all/[email protected]/
1 parent 1560d1f commit 5097cbc

File tree

2 files changed

+12
-6
lines changed

2 files changed

+12
-6
lines changed

Documentation/timers/no_hz.rst

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -129,11 +129,8 @@ adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
129129
online to handle timekeeping tasks in order to ensure that system
130130
calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
131131
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
132-
user processes to observe slight drifts in clock rate.) Therefore, the
133-
boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
134-
"nohz_full=" mask that includes the boot CPU will result in a boot-time
135-
error message, and the boot CPU will be removed from the mask. Note that
136-
this means that your system must have at least two CPUs in order for
132+
user processes to observe slight drifts in clock rate.) Note that this
133+
means that your system must have at least two CPUs in order for
137134
CONFIG_NO_HZ_FULL=y to do anything for you.
138135

139136
Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.

kernel/sched/isolation.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,16 @@ int housekeeping_any_cpu(enum hk_type type)
4646
if (cpu < nr_cpu_ids)
4747
return cpu;
4848

49-
return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
49+
cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
50+
if (likely(cpu < nr_cpu_ids))
51+
return cpu;
52+
/*
53+
* Unless we have another problem this can only happen
54+
* at boot time before start_secondary() brings the 1st
55+
* housekeeping CPU up.
56+
*/
57+
WARN_ON_ONCE(system_state == SYSTEM_RUNNING ||
58+
type != HK_TYPE_TIMER);
5059
}
5160
}
5261
return smp_processor_id();

0 commit comments

Comments
 (0)