-
Notifications
You must be signed in to change notification settings - Fork 789
[ESIMD] Stateful to stateless mem access conversion design. #6187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
# ESIMD "stateless" accessors support design | ||
|
||
This document describes design of automatic conversion of "stateful" memory | ||
accesses via SYCL accessors to "stateless" accesses within ESIMD kernels by the | ||
SYCL compiler. | ||
|
||
## Overview of Intel GPU memory access types | ||
Intel GPU hardware has two main modes of accessing memory - *stateless* and | ||
*stateful*, with specific memory access data port messages corresponding to each | ||
of the modes (`send` instructions with dataport fixed function target). | ||
- In a *stateless* access base memory location(s) is represented with a single | ||
virtual memory address, which can be a USM pointer. | ||
- In *stateful* - with a <*surface index*, *32-bit offset*> pair of values, where | ||
the *surface index* is an index into a "binding table" which contains surface | ||
descriptors available to the kernel. Surface is a contigous memory area | ||
accessible through its descriptor by stateful accesses. Each descriptor contains | ||
various information about the surface - for example, its size and format. | ||
|
||
Pointers used in statless accesses are usually coming from USM or C++ memory | ||
allocation routines and are passed directly by the runtime as kernel arguments. | ||
|
||
The stateful access style has a number of drawbacks which makes it undesirable | ||
to use in HPC application. The biggest one is 4Gb limitation on the surface | ||
size. Another one is problems with creating data structures with nested | ||
pointer fields or double indirection on host and use them on the device. | ||
|
||
## Accessor and USM pointer kernel argument passing details | ||
ESIMD compiler when compiling a kernel records information about each memory | ||
argument and stores it together with the kernel's SPIRV. Basically, for each | ||
kernel argument, there is information whether it is a memory argument, and, if | ||
yes, whether it is surface-based or pointer-based. | ||
|
||
When JITting the kernel, the scalar GPU compiler back-end can convert memory | ||
arguments and memory accesses between the two modes depending on optimization | ||
or other settings, and record final type of memory argument with the generated | ||
kernel executable. GPU runtime uses that information to wrap/not wrap incoming | ||
memory pointer with a surface before passing it onto the harware into the | ||
actual kernel argument. | ||
|
||
The vector back-end can't do this in many cases, as memory accesses in SPIRV | ||
are represented by hardware-specific intrinsics rather then standard generic | ||
memory access SPIRV instructions. This design basically enables the vector BE | ||
to redirect code generation for stateful memory access APIs to stateless or | ||
stateful intrinsics, and also generate correct annotations. Since it uses the | ||
same runtime, which relies on parameter annotation when making the wrap/no-wrap | ||
decision, the runtime part does not need much changes. | ||
|
||
## Problem definition | ||
|
||
Currently, ESIMD compiler always maps buffer/accessor-based memory accesses to | ||
stateful accesses, thus imposing the 4Gb datum size limitation on user programs | ||
with accessors. | ||
|
||
## Proposed solution | ||
|
||
### Short/mid-term | ||
|
||
#### API header changes | ||
The general idea is to introduce C++ preprocessor macro | ||
`ESIMD_FORCE_STATELESS_MEM_ACCESS` which will control code generation for the | ||
stateful memory access APIs - such as: | ||
|
||
```cpp | ||
template <typename Tx, int N, typename AccessorTy, | ||
typename Flags = vector_aligned_tag, | ||
typename = std::enable_if_t<is_simd_flag_type_v<Flags>>, | ||
class T = detail::__raw_t<Tx>> | ||
__ESIMD_API simd<Tx, N> block_load(AccessorTy acc, uint32_t offset, | ||
Flags = {}); | ||
|
||
template <typename T, int N, typename AccessorTy> | ||
__ESIMD_API std::enable_if_t<(sizeof(T) <= 4) && | ||
(N == 1 || N == 8 || N == 16 || N == 32) && | ||
!std::is_pointer<AccessorTy>::value, | ||
simd<T, N>> | ||
gather(AccessorTy acc, simd<uint32_t, N> offsets, uint32_t glob_offset = 0, | ||
simd_mask<N> mask = 1); | ||
``` | ||
|
||
Implementation of the APIs would follow this pattern: | ||
|
||
```cpp | ||
// this API should verify that accessor is to global address space, this is needed both for | ||
// the case with conversion to stateless and the case w/o. | ||
T stateful_memory_api(accessor acc, uint32 offset, args...) { | ||
#ifdef ESIMD_FORCE_STATELESS_MEM_ACCESS | ||
accessor_elelemt_type *ptr = acc.get_pointer() + offset; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be good to add sanity check to ensure only accessor with global address space ptr will be accepted here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point! |
||
return stateless_memory_api(ptr, args...); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is mapped to legacy stateless memory API, it will also work on old platform. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right. providing cache hint control for generic APIs like gather, block_load needs to be designed. |
||
#else | ||
<original implementation> | ||
#endif | ||
} | ||
``` | ||
|
||
The new macro is supposed to be set by users directly or via some logic based on | ||
other macros set by users. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kbobrovs I'm assuming that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @petercad - yes. |
||
|
||
#### Compiler changes | ||
|
||
The API part of the implementation is as simple as above, the compiler | ||
one is slightly more complicated. Compiler needs to make sure that in presence | ||
of `ESIMD_FORCE_STATELESS_MEM_ACCESS` macro, the actual memory parameter | ||
annotation described above is correct and tells that memory is a pointer, not a | ||
surface index. Parameter annotations are generated by the front-end - these are | ||
`kernel_arg_accessor_ptr` and `kernel_arg_type` metadata nodes, which are | ||
then translated to `buffer_t` (for surface) or `svmptr_t` (for pointer) | ||
metadata annotations consumed by the back-end. | ||
|
||
##### Variant 1 | ||
This is the recommended variant. A new driver option is added - | ||
`-fsycl-esimd-force-stateless-mem-access`. Under this option: | ||
- SYCL C++ device compiler FE defines the `ESIMD_FORCE_STATELESS_MEM_ACCESS` | ||
macro | ||
- `sycl-post-link` tool is run with a new option | ||
`-esimd-force-stateless-mem-access`. Under this option, the tool | ||
configures the LowerESIMD.cpp pass to ignore the `kernel_arg_accessor_ptr` | ||
and always generate `svmptr_t` annotation for memory arguments. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to mix kernels with stateless accessors and kernels with surface accessors in this variant? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In user level code - yes, but compiler will generate stateless for all accessors. |
||
##### Variant 2 | ||
Clang C++ FE is changed to generate desired `kernel_arg_accessor_ptr` | ||
metadata depending on `ESIMD_FORCE_STATELESS_MEM_ACCESS` macro setting. If | ||
set, it will mark all memory arguments as pointers in | ||
`kernel_arg_accessor_ptr` and `kernel_arg_type` MD nodes. | ||
|
||
|
||
##### Variant 3 (no go) | ||
Definition of `SYCL_ESIMD_KERNEL` is changed depending on presence of | ||
`ESIMD_FORCE_STATELESS_MEM_ACCESS`: | ||
|
||
```cpp | ||
#ifdef ESIMD_FORCE_STATELESS_MEM_ACCESS | ||
#define SYCL_ESIMD_KERNEL __attribute__((sycl_explicit_simd)) __attribute__((sycl_explicit_simd_force_stateless)) | ||
#else | ||
#define SYCL_ESIMD_KERNEL __attribute__((sycl_explicit_simd)) | ||
#endif | ||
``` | ||
Then LowerESIMD lowers parameter annotation depending on | ||
`sycl_explicit_simd_force_stateless` attribute presence. | ||
The drawback is that is allowed to use `[[intel::sycl_explicit_simd]]` w/o | ||
`SYCL_ESIMD_KERNEL` | ||
|
||
#### ESIMD Verifier changes | ||
|
||
All the compiler variants require that accessor::get_pointer() can be used in | ||
the device code. `ESIMDVerifier.cpp` needs to additionally allow the following | ||
regexps: | ||
``` | ||
"^cl::sycl::accessor<.+>::getPointerAdjusted", | ||
"^cl::sycl::accessor<.+>::getQualifiedPtr", | ||
"^cl::sycl::accessor<.+>::get_pointer", | ||
"^cl::sycl::multi_ptr<.+>::.+" | ||
``` | ||
But only if it is run in "force-stateless" mode. | ||
|
||
### Long-term | ||
|
||
Long term solution would be replacing the | ||
``` | ||
#ifdef ESIMD_FORCE_STATELESS_MEM_ACCESS | ||
``` | ||
with | ||
``` | ||
if_device_has(platform_requires_stateless_access) | ||
``` | ||
and removing all the changes in other components. | ||
Plus VC BE need to be taught to generate correct pointer parameter annotation | ||
not relying on the middle-end providing it. |
Uh oh!
There was an error while loading. Please reload this page.