Skip to content

Commit cc78639

Browse files
authored
[AMDGPU][NFC] AMDGPUUsage.rst: document corefile format (#104419)
This patch adds a description of the core file format used for AMDGPU. Reference implementation for creating and loading AMDGPU core dump is available in [ROCgdb-6.2](https://github.com/ROCm/ROCgdb/tree/rocm-6.2.x/gdb)
1 parent 899a3df commit cc78639

File tree

1 file changed

+113
-0
lines changed

1 file changed

+113
-0
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2378,6 +2378,9 @@ are deprecated and should not be used.
23782378
======== ============================== ======================================
23792379
"AMDGPU" ``NT_AMDGPU_METADATA`` Metadata in Message Pack [MsgPack]_
23802380
binary format.
2381+
"AMDGPU" ``NT_AMDGPU_KFD_CORE_STATE`` Snapshot of runtime, agent and queues
2382+
state for use in core dump. See
2383+
:ref:`amdgpu_corefile_note`.
23812384
======== ============================== ======================================
23822385

23832386
..
@@ -2390,6 +2393,7 @@ are deprecated and should not be used.
23902393
============================== =====
23912394
*reserved* 0-31
23922395
``NT_AMDGPU_METADATA`` 32
2396+
``NT_AMDGPU_KFD_CORE_STATE`` 33
23932397
============================== =====
23942398

23952399
``NT_AMDGPU_METADATA``
@@ -15024,6 +15028,115 @@ instructions are handled as follows:
1502415028
trap handler installed.
1502515029
=============== =============== ===========================================
1502615030

15031+
Core file format
15032+
================
15033+
15034+
This section describes the format of core files supporting AMDGPU. Core dumps
15035+
for an AMDGPU program can come in 2 flavors: split or unified core files.
15036+
15037+
The split layout consists of one host core file containing the information to
15038+
rebuild the image of the host process and one AMDGPU core file that contains
15039+
the information for the AMDGPU agents used in the process. The AMDGPU core
15040+
file consists of:
15041+
15042+
* A note describing the state of the AMDGPU agents, AMDGPU queues, and AMDGPU
15043+
runtime for the process (see :ref:`amdgpu_corefile_note`).
15044+
* A list of load segments containing an image of the AMDGPU agents' memory (see
15045+
:ref:`amdgpu_corefile_memory`).
15046+
15047+
The unified core file is the union of all the information contained in
15048+
the two files of the split layout (all notes and load segments). It contains
15049+
all the information required to reconstruct the image of the process across all
15050+
the agents.
15051+
15052+
Core file header
15053+
----------------
15054+
15055+
An AMDGPU core file is an ``ELF64`` core file. The content of the header
15056+
differs in unified core file layout and AMDGPU core file layout.
15057+
15058+
Split files
15059+
~~~~~~~~~~~
15060+
15061+
In the split files layout, the AMDGPU core file is an ``ELF64`` file with the
15062+
header configured as described in :ref:`amdgpu-corefile-headers-table`:
15063+
15064+
.. table:: AMDGPU corefile headers
15065+
:name: amdgpu-corefile-headers-table
15066+
15067+
========================== ===================================
15068+
Field Value
15069+
========================== ===================================
15070+
``e_ident[EI_CLASS]`` ``ELFCLASS64`` (``0x2``)
15071+
``e_ident[EI_DATA]`` ``ELFDATA2LSB`` (``0x1``)
15072+
``e_ident[EI_OSABI]`` ``ELFOSABI_AMDGPU_HSA`` (``0x40``)
15073+
``e_type`` ``ET_CORE``(``0x4``)
15074+
``e_ident[EI_ABIVERSION]`` ``ELFABIVERSION_AMDGPU_HSA_5``
15075+
``e_machine`` ``EM_AMDGPU`` (``0xe0``)
15076+
========================== ===================================
15077+
15078+
Unified file
15079+
~~~~~~~~~~~~
15080+
15081+
In the unified core file mode, the ``ELF64`` headers are set to describe
15082+
the host architecture and process.
15083+
15084+
.. _amdgpu_corefile_note:
15085+
15086+
Core file notes
15087+
---------------
15088+
15089+
An AMDGPU core file must contain one snapshot note in a ``PT_NOTE`` segment.
15090+
When using a split core file layout, this note is in the AMDGPU file.
15091+
15092+
The note record vendor field is "``AMDGPU``" and the record type is
15093+
"``NT_AMDGPU_KFD_CORE_STATE``" (see :ref:`amdgpu-note-records-v3-onwards`)
15094+
15095+
The content of the note is defined in table
15096+
:ref:`amdgpu-core-snapshot-note-layout-table-v1`:
15097+
15098+
.. table:: AMDGPU snapshot note format V1
15099+
:name: amdgpu-core-snapshot-note-layout-table-v1
15100+
15101+
================================ ======================================= ======================= ============== ===========================
15102+
Field Type Size (bytes) Byte alignment Comment
15103+
================================ ======================================= ======================= ============== ===========================
15104+
``version_major`` ``uint32`` 4 4 ``KFD_IOCTL_MAJOR_VERSION``
15105+
``version_minor`` ``uint32`` 4 4 ``KFD_IOCTL_MINOR_VERSION``
15106+
``runtime_info_size`` ``uint64`` 8 8 Must be a multiple of 8
15107+
``n_agents`` ``uint32`` 4 8
15108+
``agent_info_entry_size`` ``uint32`` 4 4 Must be a multiple of 8
15109+
``n_queues`` ``uint32`` 4 8
15110+
``queue_info_entry_size`` ``uint32`` 4 4 Must be a multiple of 8
15111+
``runtime_info`` ``kfd_runtime_info`` ``runtime_info_size`` 8
15112+
``agents_info`` ``kfd_dbg_device_info_entry[n_agents]`` ``n_agents * 8
15113+
agent_info_entry_size``
15114+
``queues_info`` ``kfd_queue_snapshot_entry[n_queues]`` ``n_queues *
15115+
queue_info_entry_size`` 8
15116+
================================ ======================================= ======================= ============== ===========================
15117+
15118+
The definition of all the ``kfd_*`` types comes from the
15119+
``include/uapi/linux/kfd_ioctl.h`` header file from the KFD repository. It is
15120+
usually installed in ``/usr/include/linux/kfd_ioctl.h``. The version of the
15121+
``kfd_ioctl.h`` file used must define values for
15122+
``KFD_IOCTL_MAJOR_VERSION`` and ``KFD_IOCTL_MINOR_VERSION`` matching
15123+
the values of ``kfd_version_major`` and ``kfd_version_major`` from the
15124+
note.
15125+
15126+
.. _amdgpu_corefile_memory:
15127+
15128+
Memory segments
15129+
---------------
15130+
15131+
An AMDGPU core file must contain an image of the AMDGPU agents' memory in load
15132+
segments (of type ``PT_LOAD``). Those segments must correspond to the memory
15133+
regions where the content of the agent memory is mapped into the host process
15134+
by the ROCr runtime (note that those memory mappings are usually not readable
15135+
by the process itself).
15136+
15137+
When using the split core file layout, those segments must be included in the
15138+
AMDGPU core file.
15139+
1502715140
Source Languages
1502815141
================
1502915142

0 commit comments

Comments
 (0)