Skip to content

Commit 3f4b778

Browse files
authored
[SYCL][Docs] Add design document for Device Config File (#9371)
This commit adds a design document for the implementation of a Device Configuration File required by [Device Aspect Traits](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/DeviceAspectTraitDesign.md) and [sycl-aspect-filter](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#aspect-filter-tool). --------- Signed-off-by: Maronas, Marcos <[email protected]>
1 parent d8f6a6a commit 3f4b778

File tree

2 files changed

+348
-0
lines changed

2 files changed

+348
-0
lines changed

sycl/doc/design/DeviceConfigFile.md

Lines changed: 347 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,347 @@
1+
# Implementation Design for Device Configuration File
2+
This design document describes the implementation of the DPC++ Device
3+
Configuration File.
4+
5+
In summary, there several scenarios where we need to know information about a
6+
target at compile-time, which is the main purpose of this Device Configuration
7+
File. Examples are `any_device_has/all_devices_have` which defines macros
8+
depending on the optional features supported by a target; or conditional AOT
9+
compilation based on optional features used in kernels and supported by targets.
10+
11+
## Requirements
12+
We need a default Device Configuration File embedded in the compiler describing
13+
the well known targets at the time of building the compiler. This embedded
14+
knowledge must be extendable, since our AOT toolchain allows compiling for
15+
targets not known at the time of building the compiler so long as the
16+
appropriate toolchain --AOT compiler and driver-- support such targets. In
17+
other words, we need to provide a way for users to add entries for new targets or
18+
update existing targets at application compile time.
19+
20+
An entry of the Device Configuration File should include:
21+
- Name of the target. Target names should be spelled exactly as expected in
22+
`-fsycl-targets`, since these are going to be used to implement validation of
23+
supported targets.
24+
- List of supported aspects.
25+
- List of supported sub-group sizes.
26+
- [Optional] `aot-toolchain` name/identifier describing the toolchain used to compile
27+
for this target. This information is optional because we plan to implement an
28+
auto-detection mechanism that is able to infer the `aot-toolchain` from the
29+
target name for well known targets.
30+
- [Optional] `aot-toolchain-%option_name` information to be passed to the
31+
`aot-toolchain` command. This information is optional. For some targets, the
32+
auto-detection mechanism might be able to infer values for this. One example of this
33+
information would be `ocloc-device %device_id`.
34+
35+
The information provided in the Device Configuration File is required from
36+
different tools and compiler modules:
37+
- Compiler driver:
38+
- `any_device_has/all_devices_have` requires compiler driver to read the
39+
config file and define corresponding macros.
40+
[[DeviceAspectTraitDesign](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/DeviceAspectTraitDesign.md)]
41+
- Compiler driver requires `aot-toolchain` and `ocloc-device` to trigger the
42+
compilation for the required targets.
43+
[https://github.com/intel/llvm/pull/6775/files]
44+
- `sycl-aspect-filter`:
45+
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#aspect-filter-tool
46+
47+
Finally, overhead should be minimal. Particularly, users should not pay for what
48+
they do not use. This motivates our decision to embed the default Device
49+
Configuration File rather than releasing it as a separate file.
50+
51+
## High-Level Design
52+
The default Device Configuration File is a `.td` file located in the compiler
53+
source code. `.td` is the file extension for [LLVM
54+
TableGen](https://llvm.org/docs/TableGen/). This default file will include all
55+
the devices known by the developers at the time of the release. During the
56+
build process, using a custom TableGen backend, we generate a `.inc` C++ file
57+
containing a `std::map` with one key/value element for each entry in the `.td`
58+
file. Using a map we can later update or add new elements if the user provides
59+
new targets at application compile time. Finally, the tools and compiler
60+
modules that need information about the targets can simply query the map to get
61+
it.
62+
63+
Further information about TableGen can be found in [TableGenFundamentals](https://releases.llvm.org/1.9/docs/TableGenFundamentals.html).
64+
65+
### New `TableGen` backend
66+
Note: This [guide](https://llvm.org/docs/TableGen/BackGuide.html) details how
67+
to implement new TableGen backends. Also, the [Search
68+
Indexes](https://llvm.org/docs/TableGen/BackEnds.html#search-indexes) backend
69+
already does something very similar to what we seek. It generates a table that
70+
provides a lookup function, but it cannot be extended with new entries. We can
71+
use _Search Indexes_ backend as inspiration for ours.
72+
73+
Our backend should generate a map where the key is the target name and the value
74+
is an object of a custom class/struct including all the information required.
75+
76+
Firstly, we need to provide a file describing the `DynamicTable` class. An
77+
example for this is `SearchableTable.td`, which describes `GenericEnum`, and
78+
`GenericTable` classes for `gen-searchable-tables` backend. File
79+
`llvm/include/llvm/TableGen/DynamicTable.td` should look like the one below:
80+
```
81+
//===- DynamicTable.td ----------------------------------*- tablegen -*-===//
82+
//
83+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
84+
// See https://llvm.org/LICENSE.txt for license information.
85+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
86+
//
87+
//===----------------------------------------------------------------------===//
88+
//
89+
// This file defines the key top-level classes needed to produce a reasonably
90+
// generic dynamic table that can be updated in runtime. DynamicTable objects
91+
// can be defined using the class in this file:
92+
// 1. (Dynamic) Tables. By instantiating the DynamicTable
93+
// class once, a table with the name of the instantiating def is generated and
94+
// guarded by the GET_name_IMPL preprocessor guard.
95+
//
96+
//===----------------------------------------------------------------------===//
97+
// Define a record derived from this class to generate a dynamic table. This
98+
// table resembles a hashtable with a key-value pair, and can updated in runtime.
99+
//
100+
// The name of the record is used as the name of the global primary array of
101+
// entries of the table in C++.
102+
class DynamicTable {
103+
// Name of a class. The table will have one entry for each record that
104+
// derives from that class.
105+
string FilterClass;
106+
107+
// Name of the C++ struct/class type that holds table entries. The
108+
// declaration of this type is not generated automatically.
109+
string CppTypeName = FilterClass;
110+
111+
// List of the names of fields of collected records that contain the data for
112+
// table entries, in the order that is used for initialization in C++.
113+
//
114+
// TableGen needs to know the type of the fields so that it can format
115+
// the initializers correctly.
116+
//
117+
// For each field of the table named xxx, TableGen will look for a field
118+
// named TypeOf_xxx and use that as a more detailed description of the
119+
// type of the field.
120+
121+
// class MyTableEntry {
122+
// MyEnum V;
123+
// ...
124+
// }
125+
//
126+
// def MyTable : DynamicTable {
127+
// let FilterClass = "MyTableEntry";
128+
// let Fields = ["V", ...];
129+
// string TypeOf_V = "list<int>";
130+
// }
131+
list<string> Fields;
132+
}
133+
```
134+
135+
This file should be included --either directly or indirectly-- in any other
136+
`.td` file that uses `DynamicTable` class.
137+
138+
The default device configuration `.td` file should look like the one below:
139+
```
140+
include "llvm/TableGen/DynamicTable.td"
141+
142+
// Aspect and all the aspects definitions could be outlined
143+
// to another .td file that could be included into this file
144+
class Aspect<string name> {
145+
string Name = name;
146+
}
147+
148+
def AspectCpu : Aspect<"cpu">;
149+
def AspectGpu : Aspect<"gpu">;
150+
def AspectAccelerator : Aspect<"accelerator">;
151+
def AspectCustom : Aspect<"custom">;
152+
def AspectFp16 : Aspect<"fp16">;
153+
def AspectFp64 : Aspect<"fp64">;
154+
def AspectImage : Aspect<"image">;
155+
def AspectOnline_compiler : Aspect<"online_compiler">;
156+
def AspectOnline_linker : Aspect<"online_linker">;
157+
def AspectQueue_profiling : Aspect<"queue_profiling">;
158+
def AspectUsm_device_allocations : Aspect<"usm_device_allocations">;
159+
def AspectUsm_host_allocations : Aspect<"usm_host_allocations">;
160+
def AspectUsm_shared_allocations : Aspect<"usm_shared_allocations">;
161+
def AspectUsm_system_allocations : Aspect<"usm_system_allocations">;
162+
def AspectExt_intel_pci_address : Aspect<"ext_intel_pci_address">;
163+
def AspectExt_intel_gpu_eu_count : Aspect<"ext_intel_gpu_eu_count">;
164+
def AspectExt_intel_gpu_eu_simd_width : Aspect<"ext_intel_gpu_eu_simd_width">;
165+
def AspectExt_intel_gpu_slices : Aspect<"ext_intel_gpu_slices">;
166+
def AspectExt_intel_gpu_subslices_per_slice : Aspect<"ext_intel_gpu_subslices_per_slice">;
167+
def AspectExt_intel_gpu_eu_count_per_subslice : Aspect<"ext_intel_gpu_eu_count_per_subslice">;
168+
def AspectExt_intel_max_mem_bandwidth : Aspect<"ext_intel_max_mem_bandwidth">;
169+
def AspectExt_intel_mem_channel : Aspect<"ext_intel_mem_channel">;
170+
def AspectUsm_atomic_host_allocations : Aspect<"usm_atomic_host_allocations">;
171+
def AspectUsm_atomic_shared_allocations : Aspect<"usm_atomic_shared_allocations">;
172+
def AspectAtomic64 : Aspect<"atomic64">;
173+
def AspectExt_intel_device_info_uuid : Aspect<"ext_intel_device_info_uuid">;
174+
def AspectExt_oneapi_srgb : Aspect<"ext_oneapi_srgb">;
175+
def AspectExt_oneapi_native_assert : Aspect<"ext_oneapi_native_assert">;
176+
def AspectHost_debuggable : Aspect<"host_debuggable">;
177+
def AspectExt_intel_gpu_hw_threads_per_eu : Aspect<"ext_intel_gpu_hw_threads_per_eu">;
178+
def AspectExt_oneapi_cuda_async_barrier : Aspect<"ext_oneapi_cuda_async_barrier">;
179+
def AspectExt_oneapi_bfloat16_math_functions : Aspect<"ext_oneapi_bfloat16_math_functions">;
180+
def AspectExt_intel_free_memory : Aspect<"ext_intel_free_memory">;
181+
def AspectExt_intel_device_id : Aspect<"ext_intel_device_id">;
182+
def AspectExt_intel_memory_clock_rate : Aspect<"ext_intel_memory_clock_rate">;
183+
def AspectExt_intel_memory_bus_width : Aspect<"ext_intel_memory_bus_width">;
184+
def AspectEmulated : Aspect<"emulated">;
185+
186+
def TargetTable : DynamicTable {
187+
let FilterClass = "TargetInfo";
188+
let Fields = ["TargetName", "aspects", "maySupportOtherAspects",
189+
"subGroupSizes", "aotToolchain", "aotToolchainOptions"];
190+
string TypeOf_aspects = "list<Aspect>";
191+
string TypeOf_subGroupSizes = "list<int>"
192+
}
193+
194+
class TargetInfo <string tgtName, list<Aspect> aspectList, bit otherAspects,
195+
list<int> listSubGroupSizes, string toolchain, string options>
196+
{
197+
list<Aspect> aspects = aspectList;
198+
bits<1> maySupportOtherAspects = otherAspects;
199+
list<int> subGroupSizes = listSubGroupSizes;
200+
string aotToolchain = toolchain;
201+
string aotToolchainOptions = options;
202+
}
203+
204+
def : TargetInfo<"TargetA", [AspectCpu, AspectAtomic64],
205+
0, [8, 16], "ocloc", "-device tgtA">;
206+
def : TargetInfo<"TargetB", [AspectGpu, AspectFp16],
207+
0, [8, 16], "ocloc", "-device tgtB">;
208+
def : TargetInfo<"TargetC", [AspectEmulated, AspectImage],
209+
0, [8, 32], "ocloc", "-device tgtC, -option2 val">;
210+
```
211+
Note: backends tested don't allow lists within `TargetInfo` class. This is a
212+
backend limitation, rather than a TableGen limitation. Thus, we should be able
213+
to lift this limitation in our own backend, as shown in the initial prototype
214+
implemented to drive the design.
215+
216+
The generated `.inc` file should look like the example below:
217+
```c++
218+
std::map<std::string, TargetInfo> TargetTable = {
219+
{"TargetA",
220+
{{"cpu", "atomic64"}, 0, {8, 16}, "ocloc", "-device tgtA"}},
221+
{"TargetB",
222+
{{"gpu", "fp16"}, 0, {8, 16}, "ocloc", "-device tgtB"}},
223+
{"TargetC",
224+
{{"emulated", "image"}, 0, {8, 32}, "ocloc", "-device tgtC, -option2 val"}}};
225+
```
226+
227+
We also need a header file that includes the `.inc` file generated by the
228+
TableGen backend. Other backends don't generate the definition of `struct
229+
TargetInfo`, and this seems a good idea to me: it simplifies the backend
230+
implementation, and it is easier for developers to check the data structure
231+
to understand how to work with it. The idea is simply to define the struct
232+
in this header file. This header file should look like the code below:
233+
```c++
234+
namespace DeviceConfigFile {
235+
struct TargetInfo {
236+
bool maySupportOtherAspects;
237+
std::vector<std::string> aspects;
238+
std::vector<unsigned> subGroupSizes;
239+
std::string aotToolchain;
240+
std::string aotToolchainOptions;
241+
};
242+
243+
#include "device_config_file.inc"
244+
using TargetTable_t = std::map<std::string, TargetInfo>;
245+
}; // namespace DeviceConfigFile
246+
```
247+
248+
Other modules can query the map to get the information like in the example
249+
below:
250+
```c++
251+
DeviceConfigFile::TargetInfo info = DeviceConfigFile::targets.find("TargetA");
252+
if (info == DeviceConfigFile::targets.end()) {
253+
/* Target not found */
254+
...
255+
} else {
256+
auto aspects = info.aspects;
257+
auto maySupportOtherAspects = info.maySupportOtherAspects;
258+
auto subGroupSizes = info.subGroupSizes;
259+
...
260+
}
261+
```
262+
263+
## Tools and Modules Interacting with Device Config File
264+
This is a list of the tools and compiler modules that require using the file:
265+
- The *compiler driver* needs the file to determine the set of legal values for
266+
`-fsycl-targets`.
267+
- The *compiler driver* needs the file to define macros for `any_device_has/all_devices_have`.
268+
- *Clang* needs the file to emit diagnostics related to `-fsycl-fixed-targets.`
269+
- `sycl-post-link` needs the file to filter kernels in device images when doing AOT
270+
compilation.
271+
272+
Following, you can find the changes required in different parts of the project
273+
in more detail.
274+
275+
### Changes to Build Infrastructure
276+
We need the information about the targets in multiple tools and compiler
277+
modules listed in [Requirements](#Requirements). Thus, we need to make sure
278+
that the generation of the `.inc` file out of the `.td` file is done in time
279+
for all the consumers. The command we need to run for TableGen is `llvm-tblgen
280+
-gen-dynamic-tables -I /llvm-root/llvm/include/ input.td -o output.inc`.
281+
Additionally, we need to set dependencies adequately so that this command is
282+
run before any of the consumers need it.
283+
284+
### Changes to the DPC++ Frontend
285+
To allow users to add new targets we provide a new flag:
286+
`fsycl-device-config-file=/path/to/file.yaml`. Users can pass a `.yaml` file
287+
describing the targets to be added/updated. An example of how such `.yaml` file
288+
should look like is shown below.
289+
```
290+
intel_gpu_skl:
291+
aspects: [aspect_name1, aspect_name2]
292+
may_support_other_aspects: true/false
293+
sub-group-sizes: [1, 2, 4, 8]
294+
aot-toolchain: ocloc
295+
aot-toolchain-options: -device skl
296+
```
297+
The frontend module should parse the user-provided `.yaml` file and update the
298+
map with the new information about targets. LLVM provides
299+
[YAML/IO](https://llvm.org/docs/YamlIO.html) library to easily parse `.yaml`
300+
files. The driver should propagate this option to all the tools that require
301+
the Device Configuration File (e.g. `sycl-post-link`) so that each of the
302+
tools can modify the map according to the user extensions described in the
303+
`.yaml` file.
304+
305+
As mentioned in [Requirements](#Requirements), there is an auto-detection
306+
mechanism for `aot-toolchain` and `aot-toolchain-options` that is able to
307+
infer these from the target name. In the `.yaml` example shown above the target
308+
name is `intel_gpu_skl`. From that name, we can infer that `aot-toolchain` is
309+
`ocloc` because the name starts with `intel_gpu`. Also, we can infer that it needs
310+
`aot-toolchain-options` set to `-device skl` just by keeping what is left after the
311+
prefix `intel_gpu`.
312+
313+
#### Potential Issues/Limitations
314+
- Multiple targets with the same name: On the one hand, the compiler emits a
315+
warning so that the user is aware that multiple targets share the same name. On
316+
the other hand, it simply processes each new entry and updates the map with the
317+
latest information found.
318+
319+
The auto-detection mechanism is a best effort to relieve users from specifying
320+
`aot-toolchain` and `aot-toolchain-options` from well known devices. However,
321+
it has its own limitations and potential issues:
322+
- Rules for target names: As of now, auto-detection is only available for Intel GPU
323+
targets. All targets starting with `intel_gpu_` will automatically set
324+
`aot-toolchain=ocloc` and `aot-toolchain-options=-device suffix` being suffix the part
325+
left after `intel_gpu_` prefix.
326+
- User specifies `aot-toolchain` and `aot-toolchain-options` for a target name
327+
that can be auto-detected: user-specified information has precedence over auto-detected
328+
information.
329+
330+
## Testing
331+
There is a danger that the device configuration file will get out-of-sync with the
332+
actual device capabilities. In order to prevent that, we need testing to validate
333+
that the device config file does not go out-of-sync. There are two tests that we
334+
should include:
335+
- A test that compares the list of aspects known to SYCL RT (defined in `aspects.def`)
336+
with the list of aspects defined in the `.td` file describing the default configuration.
337+
This will be useful to detect new aspects added to SYCL RT that have not been added in
338+
the `.td` file.
339+
- A test that compares the aspects listed in the `.td` file with the aspects reported
340+
via `device::has` for each device listed in the `.td` file. Both lists should match.
341+
This test could copy the mechanism of the test for `any_device_has` that goes over each
342+
item in `aspects.def` and tries to instantiate `any_device_has` with that enumerator.
343+
344+
Neither of the tests provides guarantees that nothing went out-of-sync *per se*, we
345+
would require running the second test in all the targets described in the `.td` file
346+
for such guarantees, but at least provides the mechanism to detect potential desyncs.
347+

sycl/doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Design Documents for the oneAPI DPC++ Compiler
4949
design/KernelFusionJIT
5050
design/NonRelocatableDeviceCode
5151
design/DeviceAspectTraitDesign
52+
design/DeviceConfigFile
5253
design/PropagateCompilerFlagsToRuntime
5354
New OpenCL Extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/design/opencl-extensions>
5455
New SPIR-V Extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/design/spirv-extensions>

0 commit comments

Comments
 (0)