Skip to content

Linux kernel bugs

Brice Goglin edited this page Dec 13, 2018 · 26 revisions

The following hwloc error messages are caused by the Linux kernel reporting invalid topology information. Recent errors are listed first.

Invalid L3 cpuset on 24-core AMD EPYC processor

****************************************************************************            
* hwloc 1.11.8 has encountered what looks like an error from the operating system.                                                            
*                                                                                                                                             
* L3 (cpuset 0x60000060) intersects with NUMANode (P#0 cpuset 0x3f00003f
nodeset 0x00000001) without inclusion!                                                                 

Fixed in Linux 4.14 in this commit (and backported in 4.13.16):

commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c
Author: Suravee Suthikulpanit <[email protected]>
Date:   Mon Jul 31 10:51:59 2017 +0200

    x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask

Packages Cut in Halves on Intel Xeon E5 v3/v4 with Cluster-on-Die

Each dual-NUMA package is reported as two single-NUMA packages.

Fixed in Linux 3.18 in this commit:

commit cebf15eb09a2fd2fa73ee4faa9c4d2f813cf0f09
Author: Dave Hansen <[email protected]>
Date:   Thu Sep 18 12:33:34 2014 -0700

    x86, sched: Add new topology for multi-NUMA-node CPUs

Invalid L3 cpuset on AMD 12-core Opteron 6200/6300 (Bulldozer and Piledriver)

****************************************************************************
* Hwloc has encountered what looks like an error from the operating system.
*
* object (L3 cpuset 0x000003f0) intersection without inclusion!

The fix was NEVER pushed to Linux.

Use hwloc >=1.11.2 and set HWLOC_COMPONENTS=x86 in your environment to work around the issue.

Invalid NUMA cpuset on AMD Opteron 6200/6300 (Bulldozer and Piledriver)

****************************************************************************
* Hwloc has encountered what looks like an error from the operating system.
*
* Socket (P#2 cpuset 0x0000ffff,0x0) intersects with NUMANode (P#3 cpuset
0x0000ff00,0xff000000) without inclusion!

This is likely not a kernel bug but rather a BIOS reporting invalid SRAT information.

Upgrading the BIOS is the only chance to get a proper fix. Otherwise try hwloc >=1.11.2 and set HWLOC_COMPONENTS=x86 in your environment to work around the issue.

Clone this wiki locally