Skip to content

Commit d681461

Browse files
authored
[AMDGPU] Add doc updates for kernarg preloading (#67516)
1 parent b2f50b4 commit d681461

File tree

1 file changed

+55
-11
lines changed

1 file changed

+55
-11
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 55 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -360,7 +360,7 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
360360
``gfx90a`` ``amdgcn`` dGPU - sramecc - Absolute - *rocm-amdhsa* *TBA*
361361
- tgsplit flat
362362
- xnack scratch .. TODO::
363-
- Packed
363+
- kernarg preload - Packed
364364
work-item Add product
365365
IDs names.
366366

@@ -381,21 +381,21 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following
381381
``gfx940`` ``amdgcn`` dGPU - sramecc - Architected *TBA*
382382
- tgsplit flat
383383
- xnack scratch .. TODO::
384-
- Packed
384+
- kernarg preload - Packed
385385
work-item Add product
386386
IDs names.
387387

388388
``gfx941`` ``amdgcn`` dGPU - sramecc - Architected *TBA*
389389
- tgsplit flat
390390
- xnack scratch .. TODO::
391-
- Packed
391+
- kernarg preload - Packed
392392
work-item Add product
393393
IDs names.
394394

395395
``gfx942`` ``amdgcn`` dGPU - sramecc - Architected *TBA*
396396
- tgsplit flat
397397
- xnack scratch .. TODO::
398-
- Packed
398+
- kernarg preload - Packed
399399
work-item Add product
400400
IDs names.
401401

@@ -4375,12 +4375,24 @@ The fields used by CP for code objects before V3 also match those specified in
43754375
dynamically sized stack.
43764376
This is only set in code
43774377
object v5 and later.
4378-
463:460 1 bit Reserved, must be 0.
4379-
464 1 bit RESERVED_464 Deprecated, must be 0.
4380-
467:465 3 bits Reserved, must be 0.
4381-
468 1 bit RESERVED_468 Deprecated, must be 0.
4382-
469:471 3 bits Reserved, must be 0.
4383-
511:472 5 bytes Reserved, must be 0.
4378+
463:460 4 bits Reserved, must be 0.
4379+
470:464 7 bits KERNARG_PRELOAD_SPEC_LENGTH GFX6-GFX9
4380+
- Reserved, must be 0.
4381+
GFX90A, GFX940
4382+
- The number of dwords from
4383+
the kernarg segment to preload
4384+
into User SGPRs before kernel
4385+
execution. (see
4386+
:ref:`amdgpu-amdhsa-kernarg-preload`).
4387+
479:471 9 bits KERNARG_PRELOAD_SPEC_OFFSET GFX6-GFX9
4388+
- Reserved, must be 0.
4389+
GFX90A, GFX940
4390+
- An offset in dwords into the
4391+
kernarg segment to begin
4392+
preloading data into User
4393+
SGPRs. (see
4394+
:ref:`amdgpu-amdhsa-kernarg-preload`).
4395+
511:480 4 bytes Reserved, must be 0.
43844396
512 **Total size 64 bytes.**
43854397
======= ====================================================================
43864398

@@ -5002,7 +5014,7 @@ for enabled registers are dense starting at SGPR0: the first enabled register is
50025014
SGPR0, the next enabled register is SGPR1 etc.; disabled registers do not have
50035015
an SGPR number.
50045016

5005-
The initial SGPRs comprise up to 16 User SRGPs that are set by CP and apply to
5017+
The initial SGPRs comprise up to 16 User SGPRs that are set by CP and apply to
50065018
all wavefronts of the grid. It is possible to specify more than 16 User SGPRs
50075019
using the ``enable_sgpr_*`` bit fields, in which case only the first 16 are
50085020
actually initialized. These are then immediately followed by the System SGPRs
@@ -5045,6 +5057,9 @@ SGPR register initial state is defined in
50455057
then Flat Scratch Init 2 See
50465058
(enable_sgpr_flat_scratch :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
50475059
_init)
5060+
then Preloaded Kernargs N/A See
5061+
(kernarg_preload_spec :ref:`amdgpu-amdhsa-kernarg-preload`.
5062+
_length)
50485063
then Private Segment Size 1 The 32-bit byte size of a
50495064
(enable_sgpr_private single work-item's memory
50505065
_segment_size) allocation. This is the
@@ -5177,6 +5192,31 @@ following properties:
51775192
* MTYPE set to support memory coherence that matches the runtime (such as CC for
51785193
APU and NC for dGPU).
51795194

5195+
.. _amdgpu-amdhsa-kernarg-preload:
5196+
5197+
Preloaded Kernel Arguments
5198+
++++++++++++++++++++++++++
5199+
5200+
On hardware that supports this feature, kernel arguments can be preloaded into
5201+
User SGPRs, up to the maximum number of User SGPRs available. The allocation of
5202+
Preload SGPRs occurs directly after the last enabled non-kernarg preload User
5203+
SGPR. (See :ref:`amdgpu-amdhsa-initial-kernel-execution-state`)
5204+
5205+
The data preloaded is copied from the kernarg segment, the amount of data is
5206+
determined by the value specified in the kernarg_preload_spec_length field of
5207+
the kernel descriptor. This data is then loaded into consecutive User SGPRs. The
5208+
number of SGPRs receiving preloaded kernarg data corresponds with the value
5209+
given by kernarg_preload_spec_length. The preloading starts at the dword offset
5210+
within the kernarg segment, which is specified by the
5211+
kernarg_preload_spec_offset field.
5212+
5213+
If the kernarg_preload_spec_length is non-zero, the CP firmware will append an
5214+
additional 256 bytes to the kernel_code_entry_byte_offset. This addition
5215+
facilitates the incorporation of a prologue to the kernel entry to handle cases
5216+
where code designed for kernarg preloading is executed on hardware equipped with
5217+
incompatible firmware. If hardware has compatible firmware the 256 bytes at the
5218+
start of the kernel entry will be skipped.
5219+
51805220
.. _amdgpu-amdhsa-kernel-prolog:
51815221

51825222
Kernel Prolog
@@ -15352,6 +15392,10 @@ terminated by an ``.end_amdhsa_kernel`` directive.
1535215392
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
1535315393
``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in
1535415394
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`.
15395+
``.amdhsa_user_sgpr_kernarg_preload_length`` 0 GFX90A, Controls KERNARG_PRELOAD_SPEC_LENGTH in
15396+
GFX940 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
15397+
``.amdhsa_user_sgpr_kernarg_preload_offset`` 0 GFX90A, Controls KERNARG_PRELOAD_SPEC_OFFSET in
15398+
GFX940 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
1535515399
======================================================== =================== ============ ===================
1535615400

1535715401
.amdgpu_metadata

0 commit comments

Comments
 (0)