Skip to content

Commit f2748bd

Browse files
npigginmpe
authored andcommitted
powerpc/powernv: Always stop secondaries before reboot/shutdown
Currently powernv reboot and shutdown requests just leave secondaries to do their own things. This is undesirable because they can trigger any number of watchdogs while waiting for reboot, but also we don't know what else they might be doing -- they might be causing trouble, trampling memory, etc. The opal scheduled flash update code already ran into watchdog problems due to flashing taking a long time, and it was fixed with 2196c6f ("powerpc/powernv: Return secondary CPUs to firmware before FW update"), which returns secondaries to opal. It's been found that regular reboots can take over 10 seconds, which can result in the hard lockup watchdog firing, reboot: Restarting system [ 360.038896709,5] OPAL: Reboot request... Watchdog CPU:0 Hard LOCKUP Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 Watchdog CPU:16 Hard LOCKUP watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] This patch removes the special case for flash update, and calls smp_send_stop in all cases before calling reboot/shutdown. smp_send_stop could return CPUs to OPAL, the main reason not to is that the request could come from a NMI that interrupts OPAL code, so re-entry to OPAL can cause a number of problems. Putting secondaries into simple spin loops improves the chances of a successful reboot. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
1 parent 855bfe0 commit f2748bd

File tree

3 files changed

+7
-38
lines changed

3 files changed

+7
-38
lines changed

arch/powerpc/include/asm/opal.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -325,7 +325,7 @@ struct rtc_time;
325325
extern unsigned long opal_get_boot_time(void);
326326
extern void opal_nvram_init(void);
327327
extern void opal_flash_update_init(void);
328-
extern void opal_flash_term_callback(void);
328+
extern void opal_flash_update_print_message(void);
329329
extern int opal_elog_init(void);
330330
extern void opal_platform_dump_init(void);
331331
extern void opal_sys_param_init(void);

arch/powerpc/platforms/powernv/opal-flash.c

Lines changed: 1 addition & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -303,26 +303,9 @@ static int opal_flash_update(int op)
303303
return rc;
304304
}
305305

306-
/* Return CPUs to OPAL before starting FW update */
307-
static void flash_return_cpu(void *info)
308-
{
309-
int cpu = smp_processor_id();
310-
311-
if (!cpu_online(cpu))
312-
return;
313-
314-
/* Disable IRQ */
315-
hard_irq_disable();
316-
317-
/* Return the CPU to OPAL */
318-
opal_return_cpu();
319-
}
320-
321306
/* This gets called just before system reboots */
322-
void opal_flash_term_callback(void)
307+
void opal_flash_update_print_message(void)
323308
{
324-
struct cpumask mask;
325-
326309
if (update_flash_data.status != FLASH_IMG_READY)
327310
return;
328311

@@ -333,15 +316,6 @@ void opal_flash_term_callback(void)
333316

334317
/* Small delay to help getting the above message out */
335318
msleep(500);
336-
337-
/* Return secondary CPUs to firmware */
338-
cpumask_copy(&mask, cpu_online_mask);
339-
cpumask_clear_cpu(smp_processor_id(), &mask);
340-
if (!cpumask_empty(&mask))
341-
smp_call_function_many(&mask,
342-
flash_return_cpu, NULL, false);
343-
/* Hard disable interrupts */
344-
hard_irq_disable();
345319
}
346320

347321
/*

arch/powerpc/platforms/powernv/setup.c

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -201,17 +201,12 @@ static void pnv_prepare_going_down(void)
201201
*/
202202
opal_event_shutdown();
203203

204-
/* Soft disable interrupts */
205-
local_irq_disable();
204+
/* Print flash update message if one is scheduled. */
205+
opal_flash_update_print_message();
206206

207-
/*
208-
* Return secondary CPUs to firwmare if a flash update
209-
* is pending otherwise we will get all sort of error
210-
* messages about CPU being stuck etc.. This will also
211-
* have the side effect of hard disabling interrupts so
212-
* past this point, the kernel is effectively dead.
213-
*/
214-
opal_flash_term_callback();
207+
smp_send_stop();
208+
209+
hard_irq_disable();
215210
}
216211

217212
static void __noreturn pnv_restart(char *cmd)

0 commit comments

Comments
 (0)