Skip to content

Commit ffddff3

Browse files
committed
[SYCL] Global offset docs
1 parent a916ce6 commit ffddff3

File tree

2 files changed

+34
-6
lines changed

2 files changed

+34
-6
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14985,6 +14985,33 @@ track the usage for each kernel. However, in some cases careful organization of
1498514985
the kernels and functions in the source file means there is minimal additional
1498614986
effort required to accurately calculate GPR usage.
1498714987

14988+
SYCL Kernel Metadata
14989+
====================
14990+
14991+
This section describes the additional metadata that is inserted for SYCL
14992+
kernels. As SYCL is a single source programming model functions can either
14993+
execute on a host or a device (i.e. GPU). Device kernels are akin to kernel
14994+
entry-points in GPU program. To mark an LLVM IR function as a device kernel
14995+
function, we make use of special LLVM metadata. The AMDGCN back-end will look
14996+
for a named metadata node called ``amdgcn.annotations``. This named metadata
14997+
must contain a list of metadata that describe the kernel IR. For our purposes,
14998+
we need to declare a metadata node that assigns the `"kernel"` attribute to the
14999+
LLVM IR function that should be emitted as a SYCL kernel function. These
15000+
metadata nodes take the form:
15001+
15002+
.. code-block:: text
15003+
15004+
!{<function ref>, metadata !"kernel", i32 1}
15005+
15006+
Consider the metadata generated by global-offset pass, showing a void kernel
15007+
function `example_kernel_with_offset` taking one argument, a pointer to 3 i32
15008+
integers:
15009+
15010+
.. code-block:: llvm
15011+
15012+
!amdgcn.annotations = !{!0}
15013+
!0 = !{void ([3 x i32]*)* @_ZTS14example_kernel_with_offset, !"kernel", i32 1}
15014+
1498815015
Additional Documentation
1498915016
========================
1499015017

sycl/doc/design/CompilerAndRuntimeDesign.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -659,11 +659,12 @@ PI interface.
659659
The CUDA API does not natively support the global offset parameter
660660
expected by the SYCL.
661661

662-
In order to emulate this and make generated kernel compliant, an
663-
intrinsic `llvm.nvvm.implicit.offset` (clang builtin
664-
`__builtin_ptx_implicit_offset`) was introduced materializing the use
665-
of this implicit parameter for the NVPTX backend. The intrinsic returns
666-
a pointer to `i32` referring to a 3 elements array.
662+
In order to emulate this and make generated kernel compliant, an intrinsic
663+
`llvm.nvvm.implicit.offset` (clang builtin `__builtin_ptx_implicit_offset`) was
664+
introduced materializing the use of this implicit parameter for the NVPTX
665+
backend. AMDGCN uses the same approach with `llvm.andgpu.implicit.offset` and
666+
`__builtin_amdgcn_implicit_offset`. The intrinsic returns a pointer to `i32`
667+
referring to a 3 elements array.
667668

668669
Each non-kernel function reaching the implicit offset intrinsic in the
669670
call graph is augmented with an extra implicit parameter of type
@@ -682,7 +683,7 @@ on the following logic:
682683

683684
- If the 2 versions exist, the original kernel is called if global
684685
offset is 0 otherwise it will call the cloned one and pass the
685-
offset by value;
686+
offset by value (for CUDA backend), or by ref for AMD;
686687
- If only 1 function exist, it is assumed that the kernel makes no use
687688
of this parameter and therefore ignores it.
688689

0 commit comments

Comments
 (0)