Skip to content

Commit 59dd387

Browse files
kbobrovskbobrovsbader
authored
[SYCL][Doc] Add ESIMD design and implementation notes (#7824)
Signed-off-by: Konstantin S Bobrovsky <[email protected]> --------- Signed-off-by: Konstantin S Bobrovsky <[email protected]> Co-authored-by: kbobrovs <[email protected]> Co-authored-by: Alexey Bader <[email protected]>
1 parent 7eccec3 commit 59dd387

File tree

3 files changed

+276
-0
lines changed

3 files changed

+276
-0
lines changed

sycl/doc/design/ESIMDDesignNotes.md

Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
# Explicit SIMD design notes
2+
3+
This documents is a collection of notes describing design and/or implementation
4+
of various parts of the ESIMD programming model support within the DPC++.
5+
6+
## Overview of ESIMD support in DPC++ components
7+
8+
ESIMD support is spread across a number of components in the oneAPI software
9+
stack, spanning compile time, link time and runtime. The picture below shows
10+
simplified view of the DPC++ compiler and runtime diagram and where ESIMD
11+
(sub-)components fit in it.
12+
13+
![Esimd Components](images/esimd_components.png)
14+
15+
### User program
16+
17+
User program can contain both SYCL and ESIMD kernels, either in the same or
18+
different translation units. DPC++ ESIMD support will automatically split the
19+
device code into SYCL and ESIMD parts to redirect them to different back-ends.
20+
To facilitate this splitting, compiler will automatically identify markup and
21+
clone parts of the ESIMD callgraph starting from kernels and functions
22+
explicitly marked with the `intel::sycl_explicit_simd` attribute.
23+
24+
### Device headers
25+
26+
There are two distinct parts of ESIMD support in the SYCL device headers. The
27+
first one is ESIMD-related "tweaks" within the usual SYCL headers, and the
28+
second is the ESIMD APIs themselves.
29+
30+
#### ESIMD tweaks in SYCL device headers
31+
32+
The most important one is device-side definition of the sycl::accessor class
33+
definition. It has different layout for SYCL and ESIMD (but the same size due
34+
to padding), as well as different initialization function used to assemble
35+
accessor object on the device side from incoming kernel arguments. For SYCL,
36+
the function name is `sycl::accessor::__init`, for ESIMD it is
37+
`sycl::accessor::__init_esimd`. The difference is caused by the ESIMD limitation
38+
\- it does not support offset, memory and access range for an accessor object.
39+
40+
There are also couple ESIMD-specific proxy classes used by ESIMD API
41+
implementation to access internals of ESIMD objects:
42+
43+
- `sycl::ext::intel::esimd::detail::AccessorPrivateProxy`
44+
this one is used by ESIMD memory APIs to access internals of `sycl::accessor`
45+
objects
46+
- `sycl::ext::intel::esimd::detail::WrapperElementTypeProxy`
47+
used to access internals of SYCL types such as `sycl::detail::half_impl::half`
48+
49+
#### ESIMD API headers
50+
51+
These headers define ESIMD APIs to be used by ESIMD user kernels. For example,
52+
the basic vector data type `sycl::ext::intel::esimd::simd`,
53+
`sycl::ext::intel::esimd::gather` memory APIs
54+
55+
Source locations:
56+
57+
- `sycl/ext/intel/esimd`
58+
- `sycl/ext/intel/experimental/esimd`
59+
60+
### Clang driver
61+
62+
TODO: describe driver modifications.
63+
64+
Source locations:
65+
66+
- `clang/lib/Driver/ToolChains/Clang.cpp`
67+
- `clang/include/clang/Driver/Options.td`
68+
69+
### Clang front-end
70+
71+
#### Semantic analyzer, integration header generator
72+
73+
- Generation of ESIMD kernel signature and prolog code (which re-assembles
74+
objects such as accessors from incoming arguments) is tweaked for ESIMD
75+
kernels:
76+
- accessors generate fewer kernel arguments
77+
- accessor object is assembled in the prolog using `__init_esimd` function
78+
instead of `__init`. See function name definition and its usage
79+
[here](https://github.com/intel/llvm/blob/eb33bbcfbeab7af1a7f58fb4dc6b53bc47f73dba/clang/lib/Sema/SemaSYCL.cpp#L62).
80+
- Defines `intel::sycl_explicit_simd` function and variable attribute to markup
81+
kernels top-level ESIMD external functions and "private globals".
82+
- Performs various ESIMD-specific semantic analysis and diagnostics. All kinds
83+
of ESIMD error messages can be found in
84+
`clang/include/clang/Basic/DiagnosticSemaKinds.td`.
85+
- Integration header generator inserts additional
86+
`static constexpr bool isESIMD()` function into kernel discriptor type, which
87+
is used by the host compiler to obtain compile-time details for the kernel via
88+
inclusion of the integration header generated by the generator. For ESIMD
89+
kernels `isESIMD()` returns true, and this is used in SYCL RT headers to
90+
distinguish ESIMD kernels from SYCL kernels, e.g. to customize kernel argument
91+
setting for accessors.
92+
- defines `sycl_esimd_vectorize` attribute propagated to the VC BE, aimed at
93+
making it possible to call SIMT function from ESIMD code while vectorizing the
94+
former. Used internally to implement certain math functions in SYCL spec
95+
conformant manner (proper precision) via vectorizing scalar conformant
96+
implementations - see example
97+
[usage](https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/intel/esimd/detail/math_intrin.hpp#L199)
98+
99+
Source locations:
100+
101+
- `clang/lib/Sema/SemaSYCL.cpp`
102+
- `clang/lib/Sema/SemaDecl.cpp`
103+
- `clang/lib/Sema/SemaExpr.cpp`
104+
- `clang/include/clang/Basic/Attr.td`
105+
- `clang/include/clang/Basic/DiagnosticSemaKinds.td`
106+
107+
#### Code (LLVMIR) generator
108+
109+
ESIMD-specific code generator tweaks are mostly translations of internal FE
110+
representation of variaous ESIMD attributes into LLVM IR attributes or metadata.
111+
112+
##### Kernel signature generation
113+
114+
For ESIMD kernels, a number of additional attributes are generated for the
115+
kernel function itself as well as certain argument.
116+
117+
- Kernels are annotated with `sycl_explicit_simd` and
118+
`intel_reqd_sub_group_size` attributes. The latter must always be `1` for a
119+
ESIMD kernel or function.
120+
- An argument which conveys accessor's pointer is assigned a
121+
`kernel_arg_accessor_ptr` attribute
122+
123+
##### Global variable code generation
124+
125+
ESIMD supports "private globals" - global variables which have one copy per
126+
thread of execution (similar to C++ thread_local), normally allocated of Gen
127+
register file. To make a global variable a "private global",
128+
`__attribute__((opencl_private)) __attribute__((sycl_explicit_simd))`
129+
attributes are used. Globals of this can be forced to a specific register using
130+
the `__attribute__((register_num(n)))` attribute. The clang code generator
131+
translates these to `genx_volatile` and `genx_byte_offset` LLVM IR attributes.
132+
133+
##### Function attributes translations
134+
135+
- `sycl_esimd_vectorize` -> `CMGenxSIMT`
136+
137+
Source locations:
138+
139+
- `clang/lib/CodeGen/CGSYCLRuntime.cpp`
140+
- `clang/lib/CodeGen/CodeGenFunction.cpp`
141+
- `clang/lib/CodeGen/CodeGenModule.cpp`
142+
143+
### Clang middle-end
144+
145+
#### ESIMD API restriction verifier
146+
147+
This component is an LLVM IR pass over a compiled translation unit. It checks
148+
for presence of certain SYCL APIs which are disallowed within ESIMD code. For
149+
exaple, SYCL reductions are not allowed in ESIMD. The verifier does this by
150+
demangling all the call targets within ESIMD code and matching them with
151+
internal sub-string filters. Invoked from `clang/lib/CodeGen/BackendUtil.cpp`.
152+
153+
Source locations:
154+
155+
- `llvm/lib/SYCLLowerIR/ESIMD/ESIMDVerifier.cpp`
156+
157+
### sycl-post-link transformations
158+
159+
As a part of the input device code module transformation pipeline, the
160+
sycl-post-link tool splits the input module (or modules resulting from splitting
161+
by other characteristics, such as aspects) into two - SYCL and ESIMD ones.
162+
Shared functions invoked both from SYCL and ESIMD are cloned during the process.
163+
This is necessary because SYCL and ESIMD parts must undergo different set of
164+
transformations before generating resulting SPIR-V. ESIMD modules resulting from
165+
splitting are marked with specific device binary property `isEsimdImage` (see
166+
[source](https://github.com/intel/llvm/blob/9dc14a292f479880b5ab2e95f99a8414b31d1165/llvm/tools/sycl-post-link/sycl-post-link.cpp#L453)
167+
.)
168+
169+
`sycl-post-link` is the post-link process driver, it invokes necessary
170+
transformations as well as optimizations on fully linked device code. As a part
171+
of the process it splits SYCL and ESIMD parts of the code into separate LLVM IR
172+
modules and invokes different set or transformations on them. If a program has
173+
an `invoke_simd` call in it, then sycl-post-link will link SYCL and ESIMD parts
174+
back, cloning overlaping parts as needed.
175+
176+
Source locations:
177+
178+
- `llvm/tools/sycl-post-link/sycl-post-link.cpp`
179+
180+
#### ESIMD Lowerer
181+
182+
ESIMD part of device code undergoes a set of ESIMD-specific transformations.
183+
First, intrinsic lowering and metadata generation phase happens. It is
184+
implemented in the `SYCLLowerESIMDPass` LLVM IR Module pass. Its primary
185+
purposes are:
186+
187+
- translate `__esimd_*` intrinsic calls into corresponding `genx.*` intrinsics
188+
known to the VC BE
189+
- in some cases, there is no direct equivalent (for example,
190+
`__esimd_pack_mask`), in which case the lowerer generates LLVM IR with
191+
desired semantics
192+
- translate some of the `__spirv.*` intrinsics to something acceptable by VC
193+
BE
194+
195+
Source locations:
196+
197+
- `LowerESIMD.cpp`
198+
- `ESIMDOptimizeVecArgCallConv.cpp`
199+
- `LowerESIMDVecArg.cpp`
200+
- `LowerESIMDVLoadVStore.cpp`
201+
202+
#### Genx SPIR-V writer adaptor
203+
204+
`(part of vc-intrinsics repo)`
205+
206+
### SYCL Runtime
207+
208+
SYCL runtime (RT) has a few places where ESIMD is handled specially:
209+
210+
- When setting kernel invocation arguments corresponding to an accessor, RT will
211+
skip setting offset, memory and access ranges arguments (normally set for
212+
usual SYCL kernels), because ESIMD does not support these. In other words, an
213+
accessors used within kernel (and captured in kernel lambda) is translated to
214+
4 SPIR-V kernel arguments for a normal SYCL kernel, and just to 1 argument for
215+
a ESIMD kernel.
216+
[Link](https://github.com/intel/llvm/blob/d7a7de79f8a6498bae52331f4789adcac76b8e8c/sycl/source/handler.cpp#L373).
217+
- When creating JIT compilation options, SYCL runtime checks if the device
218+
binary image to be JIT-compiled has "isESIMDImage" property, in which case it
219+
adds `-vc-codegen` JIT options, which makes Intel GPU runtime use the vector
220+
backend (aka 'VC BE') to JIT-compile the device binary (SPIR-V).
221+
[Link](https://github.com/intel/llvm/blob/d7a7de79f8a6498bae52331f4789adcac76b8e8c/sycl/source/detail/program_manager/program_manager.cpp#L412).
222+
- Code related to ESIMD emulator support and ESIMD kernel invocation via ESIMD
223+
emulator. Plugin
224+
[sources](https://github.com/intel/llvm/blob/sycl/sycl/plugins/esimd_emulator/pi_esimd_emulator.cpp),
225+
ESIMD kernel invocation via the emulator
226+
[link](https://github.com/intel/llvm/blob/d7a7de79f8a6498bae52331f4789adcac76b8e8c/sycl/source/detail/scheduler/commands.cpp#L2369).
227+
228+
## TODOs
229+
230+
This section lists current major ESIMD gaps/TODOs.
231+
232+
1. Move all APIs out of the experimental namespace. One of the major APIs there
233+
is LSC memory accesses. The main roadblock for making it stable API is
234+
absense of specification for cache hints, which should be shared between SYCL
235+
and ESIMD.
236+
1. Architecture specific APIs should be explicitly marked as such in the user
237+
documentation with references to the
238+
[list of architectures](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc)
239+
known to oneAPI.
240+
1. Properly markup architecture-specific APIs, such as `dpas`, with required
241+
aspects, according to the "optional device features"
242+
[design](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md).
243+
This might require splitting implementations into per-architecture variants.
244+
`if_device_has`
245+
[feature](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_if.asciidoc)
246+
may help avoid duplication of common parts and dispatch to
247+
architecture-dependent code at fine-grained level from within a function.
248+
1. As VC BE moves away from `genx.*` intrinsics replacing them with `__spirv_*`
249+
ones defined in various extensions, ESIMD should catch up.
250+
1. Unification of common `simd_view`/`simd` interfaces in fact leads to
251+
significant complication of implementation rather than its intended
252+
simplification via avoiding code duplication, might make sense to have
253+
separate implementations.
254+
255+
## Directions
256+
257+
This section lists possible directions for ESIMD improvements.
258+
259+
1. Support `std::simd`. This is the standard C++ way for explicit SIMD
260+
programming. Can help run (subsest of ESIMD) on CPU efficiently in the
261+
future.
262+
1. Clear (via namespace?) separation of ESIMD APIs into portable and
263+
architecture-specific parts.
264+
1. Standardizing `simd_view` or equivalent. This is effectively a reference
265+
to a *subset* of `esimd::simd` vector object's elements. The subset is
266+
defined in a regular way via starting offset, stride and number of elements
267+
in the subset. This proved to be very useful and loved by users. Missing in
268+
`std::simd`.
269+
1. Design something like `invoke_spmd` (similar to `invoke_simd` extension) to
270+
be able to invoke SPMD functions from ESIMD code while vectorizing the calls
271+
in the back-end. This would replace `sycl_esimd_vectorize` and make this
272+
concept usable by all users, not only internal ESIMD implementation.
273+
1. Create a specification for ESIMD kernel ABI and stand-alone kernel
274+
declaration rules to make ESIMD kernels callable by arbitrary host offload
275+
runtimes, such as Level Zero.
69.5 KB
Loading

sycl/doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Design Documents for the oneAPI DPC++ Compiler
4343
design/DeviceGlobal
4444
design/CompileTimeProperties
4545
design/ESIMDStatelesAccessors
46+
design/ESIMDDesignNotes
4647
design/DeviceIf
4748
design/NonRelocatableDeviceCode
4849
New OpenCL Extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/design/opencl-extensions>

0 commit comments

Comments
 (0)