Skip to content

[ESIMD] Stateful to stateless mem access conversion design. #6187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions sycl/doc/design/ESIMDStatelesAccessors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# ESIMD "stateless" accessors support design

This document describes design of automatic conversion of "stateful" memory
accesses via SYCL accessors to "stateless" accesses within ESIMD kernels by the
SYCL compiler.

## Overview of Intel GPU memory access types
Intel GPU hardware has two main modes of accessing memory - *stateless* and
*stateful*, with specific memory access data port messages corresponding to each
of the modes (`send` instructions with dataport fixed function target).
- In a *stateless* access base memory location(s) is represented with a single
virtual memory address, which can be a USM pointer.
- In *stateful* - with a <*surface index*, *32-bit offset*> pair of values, where
the *surface index* is an index into a "binding table" which contains surface
descriptors available to the kernel. Surface is a contigous memory area
accessible through its descriptor by stateful accesses. Each descriptor contains
various information about the surface - for example, its size and format.

Pointers used in statless accesses are usually coming from USM or C++ memory
allocation routines and are passed directly by the runtime as kernel arguments.

The stateful access style has a number of drawbacks which makes it undesirable
to use in HPC application. The biggest one is 4Gb limitation on the surface
size. Another one is problems with creating data structures with nested
pointer fields or double indirection on host and use them on the device.

## Accessor and USM pointer kernel argument passing details
ESIMD compiler when compiling a kernel records information about each memory
argument and stores it together with the kernel's SPIRV. Basically, for each
kernel argument, there is information whether it is a memory argument, and, if
yes, whether it is surface-based or pointer-based.

When JITting the kernel, the scalar GPU compiler back-end can convert memory
arguments and memory accesses between the two modes depending on optimization
or other settings, and record final type of memory argument with the generated
kernel executable. GPU runtime uses that information to wrap/not wrap incoming
memory pointer with a surface before passing it onto the harware into the
actual kernel argument.

The vector back-end can't do this in many cases, as memory accesses in SPIRV
are represented by hardware-specific intrinsics rather then standard generic
memory access SPIRV instructions. This design basically enables the vector BE
to redirect code generation for stateful memory access APIs to stateless or
stateful intrinsics, and also generate correct annotations. Since it uses the
same runtime, which relies on parameter annotation when making the wrap/no-wrap
decision, the runtime part does not need much changes.

## Problem definition

Currently, ESIMD compiler always maps buffer/accessor-based memory accesses to
stateful accesses, thus imposing the 4Gb datum size limitation on user programs
with accessors.

## Proposed solution

### Short/mid-term

#### API header changes
The general idea is to introduce C++ preprocessor macro
`ESIMD_FORCE_STATELESS_MEM_ACCESS` which will control code generation for the
stateful memory access APIs - such as:

```cpp
template <typename Tx, int N, typename AccessorTy,
typename Flags = vector_aligned_tag,
typename = std::enable_if_t<is_simd_flag_type_v<Flags>>,
class T = detail::__raw_t<Tx>>
__ESIMD_API simd<Tx, N> block_load(AccessorTy acc, uint32_t offset,
Flags = {});

template <typename T, int N, typename AccessorTy>
__ESIMD_API std::enable_if_t<(sizeof(T) <= 4) &&
(N == 1 || N == 8 || N == 16 || N == 32) &&
!std::is_pointer<AccessorTy>::value,
simd<T, N>>
gather(AccessorTy acc, simd<uint32_t, N> offsets, uint32_t glob_offset = 0,
simd_mask<N> mask = 1);
```

Implementation of the APIs would follow this pattern:

```cpp
// this API should verify that accessor is to global address space, this is needed both for
// the case with conversion to stateless and the case w/o.
T stateful_memory_api(accessor acc, uint32 offset, args...) {
#ifdef ESIMD_FORCE_STATELESS_MEM_ACCESS
accessor_elelemt_type *ptr = acc.get_pointer() + offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to add sanity check to ensure only accessor with global address space ptr will be accepted here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point!

return stateless_memory_api(ptr, args...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is mapped to legacy stateless memory API, it will also work on old platform.
However, one limitation is that there is no support for cache hint control in legacy API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. providing cache hint control for generic APIs like gather, block_load needs to be designed.

#else
<original implementation>
#endif
}
```

The new macro is supposed to be set by users directly or via some logic based on
other macros set by users.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbobrovs I'm assuming that ESIMD_FORCE_STATELESS_MEM_ACCESS must be defined in order to use accessor::get_pointer() in ESIMD kernels, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petercad - yes. -fsycl-esimd-force-stateless-mem-access should ensure this automatically


#### Compiler changes

The API part of the implementation is as simple as above, the compiler
one is slightly more complicated. Compiler needs to make sure that in presence
of `ESIMD_FORCE_STATELESS_MEM_ACCESS` macro, the actual memory parameter
annotation described above is correct and tells that memory is a pointer, not a
surface index. Parameter annotations are generated by the front-end - these are
`kernel_arg_accessor_ptr` and `kernel_arg_type` metadata nodes, which are
then translated to `buffer_t` (for surface) or `svmptr_t` (for pointer)
metadata annotations consumed by the back-end.

##### Variant 1
This is the recommended variant. A new driver option is added -
`-fsycl-esimd-force-stateless-mem-access`. Under this option:
- SYCL C++ device compiler FE defines the `ESIMD_FORCE_STATELESS_MEM_ACCESS`
macro
- `sycl-post-link` tool is run with a new option
`-esimd-force-stateless-mem-access`. Under this option, the tool
configures the LowerESIMD.cpp pass to ignore the `kernel_arg_accessor_ptr`
and always generate `svmptr_t` annotation for memory arguments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to mix kernels with stateless accessors and kernels with surface accessors in this variant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In user level code - yes, but compiler will generate stateless for all accessors.

##### Variant 2
Clang C++ FE is changed to generate desired `kernel_arg_accessor_ptr`
metadata depending on `ESIMD_FORCE_STATELESS_MEM_ACCESS` macro setting. If
set, it will mark all memory arguments as pointers in
`kernel_arg_accessor_ptr` and `kernel_arg_type` MD nodes.


##### Variant 3 (no go)
Definition of `SYCL_ESIMD_KERNEL` is changed depending on presence of
`ESIMD_FORCE_STATELESS_MEM_ACCESS`:

```cpp
#ifdef ESIMD_FORCE_STATELESS_MEM_ACCESS
#define SYCL_ESIMD_KERNEL __attribute__((sycl_explicit_simd)) __attribute__((sycl_explicit_simd_force_stateless))
#else
#define SYCL_ESIMD_KERNEL __attribute__((sycl_explicit_simd))
#endif
```
Then LowerESIMD lowers parameter annotation depending on
`sycl_explicit_simd_force_stateless` attribute presence.
The drawback is that is allowed to use `[[intel::sycl_explicit_simd]]` w/o
`SYCL_ESIMD_KERNEL`

#### ESIMD Verifier changes

All the compiler variants require that accessor::get_pointer() can be used in
the device code. `ESIMDVerifier.cpp` needs to additionally allow the following
regexps:
```
"^cl::sycl::accessor<.+>::getPointerAdjusted",
"^cl::sycl::accessor<.+>::getQualifiedPtr",
"^cl::sycl::accessor<.+>::get_pointer",
"^cl::sycl::multi_ptr<.+>::.+"
```
But only if it is run in "force-stateless" mode.

### Long-term

Long term solution would be replacing the
```
#ifdef ESIMD_FORCE_STATELESS_MEM_ACCESS
```
with
```
if_device_has(platform_requires_stateless_access)
```
and removing all the changes in other components.
Plus VC BE need to be taught to generate correct pointer parameter annotation
not relying on the middle-end providing it.
1 change: 1 addition & 0 deletions sycl/doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Design Documents for the oneAPI DPC++ Compiler
design/ITTAnnotations
design/DeviceGlobal
design/CompileTimeProperties
design/ESIMDStatelesAccessors
New OpenCL Extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/design/opencl-extensions>
New SPIR-V Extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/design/spirv-extensions>

Expand Down