Skip to content

Commit 9d22c96

Browse files
committed
x86/topology: Handle bogus ACPI tables correctly
The ACPI specification clearly states how the processors should be enumerated in the MADT: "To ensure that the boot processor is supported post initialization, two guidelines should be followed. The first is that OSPM should initialize processors in the order that they appear in the MADT. The second is that platform firmware should list the boot processor as the first processor entry in the MADT. ... Failure of OSPM implementations and platform firmware to abide by these guidelines can result in both unpredictable and non optimal platform operation." The kernel relies on that ordering to detect the real BSP on crash kernels which is important to avoid sending a INIT IPI to it as that would cause a full machine reset. On a Dell XPS 16 9640 the BIOS ignores this rule and enumerates the CPUs in the wrong order. As a consequence the kernel falsely detects a crash kernel and disables the corresponding CPU. Prevent this by checking the IA32_APICBASE MSR for the BSP bit on the boot CPU. If that bit is set, then the MADT based BSP detection can be safely ignored. If the kernel detects a mismatch between the BSP bit and the first enumerated MADT entry then emit a firmware bug message. This obviously also has to be taken into account when the boot APIC ID and the first enumerated APIC ID match. If the boot CPU does not have the BSP bit set in the APICBASE MSR then there is no way for the boot CPU to determine which of the CPUs is the real BSP. Sending an INIT to the real BSP would reset the machine so the only sane way to deal with that is to limit the number of CPUs to one and emit a corresponding warning message. Fixes: 5c5682b ("x86/cpu: Detect real BSP on crash kernels") Reported-by: Carsten Tolkmit <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Carsten Tolkmit <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/87le48jycb.ffs@tglx Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218837
1 parent 66ee363 commit 9d22c96

File tree

1 file changed

+50
-3
lines changed

1 file changed

+50
-3
lines changed

arch/x86/kernel/cpu/topology.c

Lines changed: 50 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,9 @@ static void topo_set_cpuids(unsigned int cpu, u32 apic_id, u32 acpi_id)
128128

129129
static __init bool check_for_real_bsp(u32 apic_id)
130130
{
131+
bool is_bsp = false, has_apic_base = boot_cpu_data.x86 >= 6;
132+
u64 msr;
133+
131134
/*
132135
* There is no real good way to detect whether this a kdump()
133136
* kernel, but except on the Voyager SMP monstrosity which is not
@@ -144,17 +147,61 @@ static __init bool check_for_real_bsp(u32 apic_id)
144147
if (topo_info.real_bsp_apic_id != BAD_APICID)
145148
return false;
146149

150+
/*
151+
* Check whether the enumeration order is broken by evaluating the
152+
* BSP bit in the APICBASE MSR. If the CPU does not have the
153+
* APICBASE MSR then the BSP detection is not possible and the
154+
* kernel must rely on the firmware enumeration order.
155+
*/
156+
if (has_apic_base) {
157+
rdmsrl(MSR_IA32_APICBASE, msr);
158+
is_bsp = !!(msr & MSR_IA32_APICBASE_BSP);
159+
}
160+
147161
if (apic_id == topo_info.boot_cpu_apic_id) {
148-
topo_info.real_bsp_apic_id = apic_id;
149-
return false;
162+
/*
163+
* If the boot CPU has the APIC BSP bit set then the
164+
* firmware enumeration is agreeing. If the CPU does not
165+
* have the APICBASE MSR then the only choice is to trust
166+
* the enumeration order.
167+
*/
168+
if (is_bsp || !has_apic_base) {
169+
topo_info.real_bsp_apic_id = apic_id;
170+
return false;
171+
}
172+
/*
173+
* If the boot APIC is enumerated first, but the APICBASE
174+
* MSR does not have the BSP bit set, then there is no way
175+
* to discover the real BSP here. Assume a crash kernel and
176+
* limit the number of CPUs to 1 as an INIT to the real BSP
177+
* would reset the machine.
178+
*/
179+
pr_warn("Enumerated BSP APIC %x is not marked in APICBASE MSR\n", apic_id);
180+
pr_warn("Assuming crash kernel. Limiting to one CPU to prevent machine INIT\n");
181+
set_nr_cpu_ids(1);
182+
goto fwbug;
150183
}
151184

152-
pr_warn("Boot CPU APIC ID not the first enumerated APIC ID: %x > %x\n",
185+
pr_warn("Boot CPU APIC ID not the first enumerated APIC ID: %x != %x\n",
153186
topo_info.boot_cpu_apic_id, apic_id);
187+
188+
if (is_bsp) {
189+
/*
190+
* The boot CPU has the APIC BSP bit set. Use it and complain
191+
* about the broken firmware enumeration.
192+
*/
193+
topo_info.real_bsp_apic_id = topo_info.boot_cpu_apic_id;
194+
goto fwbug;
195+
}
196+
154197
pr_warn("Crash kernel detected. Disabling real BSP to prevent machine INIT\n");
155198

156199
topo_info.real_bsp_apic_id = apic_id;
157200
return true;
201+
202+
fwbug:
203+
pr_warn(FW_BUG "APIC enumeration order not specification compliant\n");
204+
return false;
158205
}
159206

160207
static unsigned int topo_unit_count(u32 lvlid, enum x86_topology_domains at_level,

0 commit comments

Comments
 (0)