You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL] Update compiler design doc to reflect changed action graphs.
- Correct the high-level application build diagram.
- Describe the new file-table-tform tool usage.
- Describe clang action graphs used in various SYCL compilation
scenarios in more details.
Signed-off-by: Konstantin S Bobrovsky <[email protected]>
Co-authored-by: Pavel Chupin <[email protected]>
Co-authored-by: mdtoguchi <[email protected]>
Co-authored-by: Artem Gindinson <[email protected]>
DPC++ compiler logically can be split into the host compiler and a number of
16
18
device compilers—one per each supported target. Clang driver orchestrates the
17
19
compilation process, it will invoke the device compiler once per each requested
18
20
target, then it will invoke the host compiler to compile the host part of a
19
-
SYCL source. The result of compilation is a set of so-called "fat objects" -
20
-
one fat object per SYCL source file. A fat object contains compiled host code
21
-
and a number of compiled device code instances—one per each target. Fat
22
-
objects can be linked into "fat binary".
21
+
SYCL source. In the simplest case, when compilation and linkage are done in one
22
+
compiler driver invocation, once compilation is finished, the device object
23
+
files (which are really LLVM IR files) are linked with the `llvm-link` tool.
24
+
The resulting LLVM IR module is then translated into a SPIRV module using the
25
+
`llvm-spirv` tool and wrapped in a host object file using the
26
+
`clang-offload-wrapper` tool. Once all the host object files and the wrapped
27
+
object with device code are ready, the driver invokes the usual platform linker
28
+
and the final executable called "fat binary" is produced. This is a host
29
+
executable or library with embedded linked images for each target specified at the command
30
+
line.
31
+
32
+
There are many variations of the compilation process depending on whether user
33
+
chose to do one or more of the following:
34
+
- perform compilation separately from linkage
35
+
- compile the device SPIRV module ahead-of-time for one or more targets
36
+
- perform device code splitting so that device code is distributed across
37
+
multiple modules rather than enclosed in a single one
38
+
- linkage of static device libraries is requested
39
+
Sections below provide more details on some of those scenarios.
23
40
24
41
SYCL sources can be also compiled as a regular C++ code, in this mode there is
25
-
no "device part" of the code—everything is executed on the host.
42
+
no "device part" of the code — everything is executed on the host.
26
43
27
44
Device compiler is further split into the following major components:
28
45
29
-
-**Front-end** - parses input source, outlines "device part" of the code,
46
+
-**Front-end** - parses input source, "outlines" device part of the code,
30
47
applies additional restrictions on the device code (e.g. no exceptions or
31
48
virtual calls), generates LLVM IR for the device code only and "integration
32
49
header" which provides information like kernel name, parameters order and data
@@ -38,8 +55,17 @@ back-end. Today middle-end transformations include just a couple of passes:
38
55
transformation with only one limitation: back-end compiler should be able to
39
56
handle transformed LLVM IR.
40
57
- Optionally: LLVM IR → SPIR-V translator.
41
-
-**Back-end** - produces native "device" code in ahead-of-time compilation
42
-
mode.
58
+
-**Back-end** - produces native "device" code. It is shown as
59
+
"Target-specific LLVM compiler" box on Diagram 1. It is invoked either at
60
+
compile time (in ahead-of-time compilatin scenario) or at runtime
61
+
(in just-in-time compilation scenario).
62
+
63
+
*Design note: in current design we use SYCL device front-end compiler to produce the
64
+
integration header for two reasons. First, it must be possible to use any host
65
+
compiler to produce SYCL heterogeneous applications. Second, even if the
66
+
same clang compiler is used for the host compilation, information provided in the
67
+
integration header is used (included) by the SYCL runtime implementation, so the
68
+
header must be available before the host compilation starts.*
43
69
44
70
### SYCL support in Clang front-end
45
71
@@ -150,7 +176,69 @@ defines:
150
176
151
177
- target triple and a native tool chain for each target (including "virtual"
152
178
targets like SPIR-V).
153
-
- SYCL offload action based on generic offload action
179
+
- SYCL offload action based on generic offload action.
180
+
181
+
SYCL compilation pipeline has a peculiarity compared to other compilation
182
+
scenarios - some of the actions in the pipeline may output multiple "clusters"
183
+
of files, consumed later by other actions. For example, each device binary maybe
184
+
accompanied by a symbol table and a specialization constant map - additional
185
+
information used by the SYCL runtime library - and it needs to be stored into
186
+
the device binary descriptor by the offload wrapper tool. With device code
187
+
splitting feature enabled, there can be multiple such sets (clusters) of files -
188
+
one per each separate device binary.
189
+
190
+
Current design of clang driver doesn't allow to model that, namely:
191
+
1. Multiple inputs/outputs in the action graph.
192
+
1. Logical grouping of multiple inputs/outputs. For example, an input or output can consist of multiple pairs of files, where each pair represents information for a single device code module: [a file with device code, a file with exported symbols].
193
+
194
+
To support this, SYCL introduces the `file-table-tform` tool. This tool can
195
+
transform file tables following commands passed as input arguments. Each row
196
+
in the table represents a file cluster, each column - a type of data associated
197
+
with a cluster. The tool can replace and extract columns. For example, the
198
+
`sycl-post-link` tool can output two file clusters and the following file
199
+
table referencing all the files in the clusters:
200
+
```
201
+
[Code|Symbols|Properties]
202
+
a_0.bc|a_0.sym|a_0.props
203
+
a_1.bc|a_1.sym|a_1.props
204
+
```
205
+
206
+
When participating in the action graph this tool inputs a file table
207
+
(`TY_Tempfiletable` clang input type) and/or a file list (`TY_Tempfilelist`),
208
+
performs requested transformations and outputs a file table or list. From the
209
+
clang design standpoint there is still single input and output, even though in
210
+
reality there are multiple.
211
+
212
+
For example, depending on compilation options, files from the "Code" column
213
+
above may need to undergo AOT compilation after the device code splitting step,
214
+
performed as a part of the code transformation sequence done by the
215
+
`sycl-post-link` tool. The driver will then:
216
+
- Use the `file-table-tform` to extract the code files and produce a file
217
+
list:
218
+
```
219
+
a_0.bc
220
+
a_1.bc
221
+
```
222
+
- Pass this file list to the `llvm-for-each` tool along with AOT compilation
223
+
command to invoke it on every file in the list. This will result in another
224
+
file list
225
+
```
226
+
a_0.bin
227
+
a_1.bin
228
+
```
229
+
- Then `file-table-tform` is invoked again to replace `.bc` with `.bin` in
230
+
the filetable to get a new filetable:
231
+
```
232
+
[Code|Symbols|Properties]
233
+
a_0.bin|a_0.sym|a_0.props
234
+
a_1.bin|a_1.sym|a_1.props
235
+
```
236
+
- Finally, this filetable is passed to the `clang-offfload-wrapper` tool to
237
+
construct a wrapper object which embeds all those files.
238
+
239
+
Note that the graph does not change when more rows (clusters) or columns
240
+
(e.g. a "manifest" file) are added to the table.
241
+
154
242
155
243
#### Enable SYCL offload
156
244
@@ -188,14 +276,8 @@ a set of target architectures for which to compile device code. By default the
0 commit comments