Skip to content

Commit 3e03f30

Browse files
authored
[ESIMD] Stateful to stateless mem access conversion design. (#6187)
Signed-off-by: Konstantin S Bobrovsky <[email protected]>
1 parent ade1870 commit 3e03f30

File tree

2 files changed

+168
-0
lines changed

2 files changed

+168
-0
lines changed
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
# ESIMD "stateless" accessors support design
2+
3+
This document describes design of automatic conversion of "stateful" memory
4+
accesses via SYCL accessors to "stateless" accesses within ESIMD kernels by the
5+
SYCL compiler.
6+
7+
## Overview of Intel GPU memory access types
8+
Intel GPU hardware has two main modes of accessing memory - *stateless* and
9+
*stateful*, with specific memory access data port messages corresponding to each
10+
of the modes (`send` instructions with dataport fixed function target).
11+
- In a *stateless* access base memory location(s) is represented with a single
12+
virtual memory address, which can be a USM pointer.
13+
- In *stateful* - with a <*surface index*, *32-bit offset*> pair of values, where
14+
the *surface index* is an index into a "binding table" which contains surface
15+
descriptors available to the kernel. Surface is a contigous memory area
16+
accessible through its descriptor by stateful accesses. Each descriptor contains
17+
various information about the surface - for example, its size and format.
18+
19+
Pointers used in statless accesses are usually coming from USM or C++ memory
20+
allocation routines and are passed directly by the runtime as kernel arguments.
21+
22+
The stateful access style has a number of drawbacks which makes it undesirable
23+
to use in HPC application. The biggest one is 4Gb limitation on the surface
24+
size. Another one is problems with creating data structures with nested
25+
pointer fields or double indirection on host and use them on the device.
26+
27+
## Accessor and USM pointer kernel argument passing details
28+
ESIMD compiler when compiling a kernel records information about each memory
29+
argument and stores it together with the kernel's SPIRV. Basically, for each
30+
kernel argument, there is information whether it is a memory argument, and, if
31+
yes, whether it is surface-based or pointer-based.
32+
33+
When JITting the kernel, the scalar GPU compiler back-end can convert memory
34+
arguments and memory accesses between the two modes depending on optimization
35+
or other settings, and record final type of memory argument with the generated
36+
kernel executable. GPU runtime uses that information to wrap/not wrap incoming
37+
memory pointer with a surface before passing it onto the harware into the
38+
actual kernel argument.
39+
40+
The vector back-end can't do this in many cases, as memory accesses in SPIRV
41+
are represented by hardware-specific intrinsics rather then standard generic
42+
memory access SPIRV instructions. This design basically enables the vector BE
43+
to redirect code generation for stateful memory access APIs to stateless or
44+
stateful intrinsics, and also generate correct annotations. Since it uses the
45+
same runtime, which relies on parameter annotation when making the wrap/no-wrap
46+
decision, the runtime part does not need much changes.
47+
48+
## Problem definition
49+
50+
Currently, ESIMD compiler always maps buffer/accessor-based memory accesses to
51+
stateful accesses, thus imposing the 4Gb datum size limitation on user programs
52+
with accessors.
53+
54+
## Proposed solution
55+
56+
### Short/mid-term
57+
58+
#### API header changes
59+
The general idea is to introduce C++ preprocessor macro
60+
`ESIMD_FORCE_STATELESS_MEM_ACCESS` which will control code generation for the
61+
stateful memory access APIs - such as:
62+
63+
```cpp
64+
template <typename Tx, int N, typename AccessorTy,
65+
typename Flags = vector_aligned_tag,
66+
typename = std::enable_if_t<is_simd_flag_type_v<Flags>>,
67+
class T = detail::__raw_t<Tx>>
68+
__ESIMD_API simd<Tx, N> block_load(AccessorTy acc, uint32_t offset,
69+
Flags = {});
70+
71+
template <typename T, int N, typename AccessorTy>
72+
__ESIMD_API std::enable_if_t<(sizeof(T) <= 4) &&
73+
(N == 1 || N == 8 || N == 16 || N == 32) &&
74+
!std::is_pointer<AccessorTy>::value,
75+
simd<T, N>>
76+
gather(AccessorTy acc, simd<uint32_t, N> offsets, uint32_t glob_offset = 0,
77+
simd_mask<N> mask = 1);
78+
```
79+
80+
Implementation of the APIs would follow this pattern:
81+
82+
```cpp
83+
// this API should verify that accessor is to global address space, this is needed both for
84+
// the case with conversion to stateless and the case w/o.
85+
T stateful_memory_api(accessor acc, uint32 offset, args...) {
86+
#ifdef ESIMD_FORCE_STATELESS_MEM_ACCESS
87+
accessor_elelemt_type *ptr = acc.get_pointer() + offset;
88+
return stateless_memory_api(ptr, args...);
89+
#else
90+
<original implementation>
91+
#endif
92+
}
93+
```
94+
95+
The new macro is supposed to be set by users directly or via some logic based on
96+
other macros set by users.
97+
98+
#### Compiler changes
99+
100+
The API part of the implementation is as simple as above, the compiler
101+
one is slightly more complicated. Compiler needs to make sure that in presence
102+
of `ESIMD_FORCE_STATELESS_MEM_ACCESS` macro, the actual memory parameter
103+
annotation described above is correct and tells that memory is a pointer, not a
104+
surface index. Parameter annotations are generated by the front-end - these are
105+
`kernel_arg_accessor_ptr` and `kernel_arg_type` metadata nodes, which are
106+
then translated to `buffer_t` (for surface) or `svmptr_t` (for pointer)
107+
metadata annotations consumed by the back-end.
108+
109+
##### Variant 1
110+
This is the recommended variant. A new driver option is added -
111+
`-fsycl-esimd-force-stateless-mem-access`. Under this option:
112+
- SYCL C++ device compiler FE defines the `ESIMD_FORCE_STATELESS_MEM_ACCESS`
113+
macro
114+
- `sycl-post-link` tool is run with a new option
115+
`-esimd-force-stateless-mem-access`. Under this option, the tool
116+
configures the LowerESIMD.cpp pass to ignore the `kernel_arg_accessor_ptr`
117+
and always generate `svmptr_t` annotation for memory arguments.
118+
119+
##### Variant 2
120+
Clang C++ FE is changed to generate desired `kernel_arg_accessor_ptr`
121+
metadata depending on `ESIMD_FORCE_STATELESS_MEM_ACCESS` macro setting. If
122+
set, it will mark all memory arguments as pointers in
123+
`kernel_arg_accessor_ptr` and `kernel_arg_type` MD nodes.
124+
125+
126+
##### Variant 3 (no go)
127+
Definition of `SYCL_ESIMD_KERNEL` is changed depending on presence of
128+
`ESIMD_FORCE_STATELESS_MEM_ACCESS`:
129+
130+
```cpp
131+
#ifdef ESIMD_FORCE_STATELESS_MEM_ACCESS
132+
#define SYCL_ESIMD_KERNEL __attribute__((sycl_explicit_simd)) __attribute__((sycl_explicit_simd_force_stateless))
133+
#else
134+
#define SYCL_ESIMD_KERNEL __attribute__((sycl_explicit_simd))
135+
#endif
136+
```
137+
Then LowerESIMD lowers parameter annotation depending on
138+
`sycl_explicit_simd_force_stateless` attribute presence.
139+
The drawback is that is allowed to use `[[intel::sycl_explicit_simd]]` w/o
140+
`SYCL_ESIMD_KERNEL`
141+
142+
#### ESIMD Verifier changes
143+
144+
All the compiler variants require that accessor::get_pointer() can be used in
145+
the device code. `ESIMDVerifier.cpp` needs to additionally allow the following
146+
regexps:
147+
```
148+
"^cl::sycl::accessor<.+>::getPointerAdjusted",
149+
"^cl::sycl::accessor<.+>::getQualifiedPtr",
150+
"^cl::sycl::accessor<.+>::get_pointer",
151+
"^cl::sycl::multi_ptr<.+>::.+"
152+
```
153+
But only if it is run in "force-stateless" mode.
154+
155+
### Long-term
156+
157+
Long term solution would be replacing the
158+
```
159+
#ifdef ESIMD_FORCE_STATELESS_MEM_ACCESS
160+
```
161+
with
162+
```
163+
if_device_has(platform_requires_stateless_access)
164+
```
165+
and removing all the changes in other components.
166+
Plus VC BE need to be taught to generate correct pointer parameter annotation
167+
not relying on the middle-end providing it.

sycl/doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Design Documents for the oneAPI DPC++ Compiler
4242
design/ITTAnnotations
4343
design/DeviceGlobal
4444
design/CompileTimeProperties
45+
design/ESIMDStatelesAccessors
4546
New OpenCL Extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/design/opencl-extensions>
4647
New SPIR-V Extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/design/spirv-extensions>
4748

0 commit comments

Comments
 (0)