Skip to content

Commit 4f3fa57

Browse files
Muralidhara M Kbp3tk0v
authored andcommitted
EDAC/amd64: Document heterogeneous system enumeration
Document High Bandwidth Memory (HBM) and AMD heterogeneous system topology and enumeration. [ bp: Simplify and de-marketize, unify, massage. ] Signed-off-by: Muralidhara M K <[email protected]> Co-developed-by: Naveen Krishna Chatradhi <[email protected]> Signed-off-by: Naveen Krishna Chatradhi <[email protected]> Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
1 parent c35977b commit 4f3fa57

File tree

1 file changed

+120
-0
lines changed

1 file changed

+120
-0
lines changed

Documentation/driver-api/edac.rst

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,16 @@ will occupy those chip-select rows.
106106
This term is avoided because it is unclear when needing to distinguish
107107
between chip-select rows and socket sets.
108108

109+
* High Bandwidth Memory (HBM)
110+
111+
HBM is a new memory type with low power consumption and ultra-wide
112+
communication lanes. It uses vertically stacked memory chips (DRAM dies)
113+
interconnected by microscopic wires called "through-silicon vias," or
114+
TSVs.
115+
116+
Several stacks of HBM chips connect to the CPU or GPU through an ultra-fast
117+
interconnect called the "interposer". Therefore, HBM's characteristics
118+
are nearly indistinguishable from on-chip integrated RAM.
109119

110120
Memory Controllers
111121
------------------
@@ -176,3 +186,113 @@ nodes::
176186
the L1 and L2 directories would be "edac_device_block's"
177187

178188
.. kernel-doc:: drivers/edac/edac_device.h
189+
190+
191+
Heterogeneous system support
192+
----------------------------
193+
194+
An AMD heterogeneous system is built by connecting the data fabrics of
195+
both CPUs and GPUs via custom xGMI links. Thus, the data fabric on the
196+
GPU nodes can be accessed the same way as the data fabric on CPU nodes.
197+
198+
The MI200 accelerators are data center GPUs. They have 2 data fabrics,
199+
and each GPU data fabric contains four Unified Memory Controllers (UMC).
200+
Each UMC contains eight channels. Each UMC channel controls one 128-bit
201+
HBM2e (2GB) channel (equivalent to 8 X 2GB ranks). This creates a total
202+
of 4096-bits of DRAM data bus.
203+
204+
While the UMC is interfacing a 16GB (8high X 2GB DRAM) HBM stack, each UMC
205+
channel is interfacing 2GB of DRAM (represented as rank).
206+
207+
Memory controllers on AMD GPU nodes can be represented in EDAC thusly:
208+
209+
GPU DF / GPU Node -> EDAC MC
210+
GPU UMC -> EDAC CSROW
211+
GPU UMC channel -> EDAC CHANNEL
212+
213+
For example: a heterogeneous system with 1 AMD CPU is connected to
214+
4 MI200 (Aldebaran) GPUs using xGMI.
215+
216+
Some more heterogeneous hardware details:
217+
218+
- The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC.
219+
They have chip selects (csrows) and channels. However, the layouts are different
220+
for performance, physical layout, or other reasons.
221+
- CPU UMCs use 1 channel, In this case UMC = EDAC channel. This follows the
222+
marketing speak. CPU has X memory channels, etc.
223+
- CPU UMCs use up to 4 chip selects, So UMC chip select = EDAC CSROW.
224+
- GPU UMCs use 1 chip select, So UMC = EDAC CSROW.
225+
- GPU UMCs use 8 channels, So UMC channel = EDAC channel.
226+
227+
The EDAC subsystem provides a mechanism to handle AMD heterogeneous
228+
systems by calling system specific ops for both CPUs and GPUs.
229+
230+
AMD GPU nodes are enumerated in sequential order based on the PCI
231+
hierarchy, and the first GPU node is assumed to have a Node ID value
232+
following those of the CPU nodes after latter are fully populated::
233+
234+
$ ls /sys/devices/system/edac/mc/
235+
mc0 - CPU MC node 0
236+
mc1 |
237+
mc2 |- GPU card[0] => node 0(mc1), node 1(mc2)
238+
mc3 |
239+
mc4 |- GPU card[1] => node 0(mc3), node 1(mc4)
240+
mc5 |
241+
mc6 |- GPU card[2] => node 0(mc5), node 1(mc6)
242+
mc7 |
243+
mc8 |- GPU card[3] => node 0(mc7), node 1(mc8)
244+
245+
For example, a heterogeneous system with one AMD CPU is connected to
246+
four MI200 (Aldebaran) GPUs using xGMI. This topology can be represented
247+
via the following sysfs entries::
248+
249+
/sys/devices/system/edac/mc/..
250+
251+
CPU # CPU node
252+
├── mc 0
253+
254+
GPU Nodes are enumerated sequentially after CPU nodes have been populated
255+
GPU card 1 # Each MI200 GPU has 2 nodes/mcs
256+
├── mc 1 # GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs
257+
│   ├── csrow 0 # UMC 0
258+
│   │   ├── channel 0 # Each UMC has 8 channels
259+
│   │   ├── channel 1 # size of each channel is 2 GB, so each UMC has 16 GB
260+
│   │   ├── channel 2
261+
│   │   ├── channel 3
262+
│   │   ├── channel 4
263+
│   │   ├── channel 5
264+
│   │   ├── channel 6
265+
│   │   ├── channel 7
266+
│   ├── csrow 1 # UMC 1
267+
│   │   ├── channel 0
268+
│   │   ├── ..
269+
│   │   ├── channel 7
270+
│   ├── .. ..
271+
│   ├── csrow 3 # UMC 3
272+
│   │   ├── channel 0
273+
│   │   ├── ..
274+
│   │   ├── channel 7
275+
│   ├── rank 0
276+
│   ├── .. ..
277+
│   ├── rank 31 # total 32 ranks/dimms from 4 UMCs
278+
279+
├── mc 2 # GPU node 1 == mc2
280+
│   ├── .. # each GPU has total 64 GB
281+
282+
GPU card 2
283+
├── mc 3
284+
│   ├── ..
285+
├── mc 4
286+
│   ├── ..
287+
288+
GPU card 3
289+
├── mc 5
290+
│   ├── ..
291+
├── mc 6
292+
│   ├── ..
293+
294+
GPU card 4
295+
├── mc 7
296+
│   ├── ..
297+
├── mc 8
298+
│   ├── ..

0 commit comments

Comments
 (0)