Skip to content

Commit e2a8f20

Browse files
Baoquan Heakpm00
authored andcommitted
Crash: add lock to serialize crash hotplug handling
Eric reported that handling corresponding crash hotplug event can be failed easily when many memory hotplug event are notified in a short period. They failed because failing to take __kexec_lock. ======= [ 78.714569] Fallback order for Node 0: 0 [ 78.714575] Built 1 zonelists, mobility grouping on. Total pages: 1817886 [ 78.717133] Policy zone: Normal [ 78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate [ 78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate [ 80.056643] PEFILE: Unsigned PE binary ======= The memory hotplug events are notified very quickly and very many, while the handling of crash hotplug is much slower relatively. So the atomic variable __kexec_lock and kexec_trylock() can't guarantee the serialization of crash hotplug handling. Here, add a new mutex lock __crash_hotplug_lock to serialize crash hotplug handling specifically. This doesn't impact the usage of __kexec_lock. Link: https://lkml.kernel.org/r/[email protected] Fixes: 2472627 ("crash: add generic infrastructure for crash hotplug support") Signed-off-by: Baoquan He <[email protected]> Tested-by: Eric DeVolder <[email protected]> Reviewed-by: Eric DeVolder <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Cc: Sourabh Jain <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent bbe246f commit e2a8f20

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

kernel/crash_core.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -739,6 +739,17 @@ subsys_initcall(crash_notes_memory_init);
739739
#undef pr_fmt
740740
#define pr_fmt(fmt) "crash hp: " fmt
741741

742+
/*
743+
* Different than kexec/kdump loading/unloading/jumping/shrinking which
744+
* usually rarely happen, there will be many crash hotplug events notified
745+
* during one short period, e.g one memory board is hot added and memory
746+
* regions are online. So mutex lock __crash_hotplug_lock is used to
747+
* serialize the crash hotplug handling specifically.
748+
*/
749+
DEFINE_MUTEX(__crash_hotplug_lock);
750+
#define crash_hotplug_lock() mutex_lock(&__crash_hotplug_lock)
751+
#define crash_hotplug_unlock() mutex_unlock(&__crash_hotplug_lock)
752+
742753
/*
743754
* This routine utilized when the crash_hotplug sysfs node is read.
744755
* It reflects the kernel's ability/permission to update the crash
@@ -748,9 +759,11 @@ int crash_check_update_elfcorehdr(void)
748759
{
749760
int rc = 0;
750761

762+
crash_hotplug_lock();
751763
/* Obtain lock while reading crash information */
752764
if (!kexec_trylock()) {
753765
pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
766+
crash_hotplug_unlock();
754767
return 0;
755768
}
756769
if (kexec_crash_image) {
@@ -761,6 +774,7 @@ int crash_check_update_elfcorehdr(void)
761774
}
762775
/* Release lock now that update complete */
763776
kexec_unlock();
777+
crash_hotplug_unlock();
764778

765779
return rc;
766780
}
@@ -783,9 +797,11 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
783797
{
784798
struct kimage *image;
785799

800+
crash_hotplug_lock();
786801
/* Obtain lock while changing crash information */
787802
if (!kexec_trylock()) {
788803
pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
804+
crash_hotplug_unlock();
789805
return;
790806
}
791807

@@ -852,6 +868,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
852868
out:
853869
/* Release lock now that update complete */
854870
kexec_unlock();
871+
crash_hotplug_unlock();
855872
}
856873

857874
static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)

0 commit comments

Comments
 (0)