|
| 1 | +# Implementation Design for Device Configuration File |
| 2 | +This design document describes the implementation of the DPC++ Device |
| 3 | +Configuration File. |
| 4 | + |
| 5 | +In summary, there several scenarios where we need to know information about a |
| 6 | +target at compile-time, which is the main purpose of this Device Configuration |
| 7 | +File. Examples are `any_device_has/all_devices_have` which defines macros |
| 8 | +depending on the optional features supported by a target; or conditional AOT |
| 9 | +compilation based on optional features used in kernels and supported by targets. |
| 10 | + |
| 11 | +## Requirements |
| 12 | +We need a default Device Configuration File embedded in the compiler describing |
| 13 | +the well known targets at the time of building the compiler. This embedded |
| 14 | +knowledge must be extendable, since our AOT toolchain allows compiling for |
| 15 | +targets not known at the time of building the compiler so long as the |
| 16 | +appropriate toolchain --AOT compiler and driver-- support such targets. In |
| 17 | +other words, we need to provide a way for users to add entries for new targets or |
| 18 | +update existing targets at application compile time. |
| 19 | + |
| 20 | +An entry of the Device Configuration File should include: |
| 21 | +- Name of the target. Target names should be spelled exactly as expected in |
| 22 | +`-fsycl-targets`, since these are going to be used to implement validation of |
| 23 | +supported targets. |
| 24 | +- List of supported aspects. |
| 25 | +- List of supported sub-group sizes. |
| 26 | +- [Optional] `aot-toolchain` name/identifier describing the toolchain used to compile |
| 27 | +for this target. This information is optional because we plan to implement an |
| 28 | +auto-detection mechanism that is able to infer the `aot-toolchain` from the |
| 29 | +target name for well known targets. |
| 30 | +- [Optional] `aot-toolchain-%option_name` information to be passed to the |
| 31 | +`aot-toolchain` command. This information is optional. For some targets, the |
| 32 | +auto-detection mechanism might be able to infer values for this. One example of this |
| 33 | +information would be `ocloc-device %device_id`. |
| 34 | + |
| 35 | +The information provided in the Device Configuration File is required from |
| 36 | +different tools and compiler modules: |
| 37 | +- Compiler driver: |
| 38 | + - `any_device_has/all_devices_have` requires compiler driver to read the |
| 39 | + config file and define corresponding macros. |
| 40 | + [[DeviceAspectTraitDesign](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/DeviceAspectTraitDesign.md)] |
| 41 | + - Compiler driver requires `aot-toolchain` and `ocloc-device` to trigger the |
| 42 | + compilation for the required targets. |
| 43 | + [https://github.com/intel/llvm/pull/6775/files] |
| 44 | +- `sycl-aspect-filter`: |
| 45 | +https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#aspect-filter-tool |
| 46 | + |
| 47 | +Finally, overhead should be minimal. Particularly, users should not pay for what |
| 48 | +they do not use. This motivates our decision to embed the default Device |
| 49 | +Configuration File rather than releasing it as a separate file. |
| 50 | + |
| 51 | +## High-Level Design |
| 52 | +The default Device Configuration File is a `.td` file located in the compiler |
| 53 | +source code. `.td` is the file extension for [LLVM |
| 54 | +TableGen](https://llvm.org/docs/TableGen/). This default file will include all |
| 55 | +the devices known by the developers at the time of the release. During the |
| 56 | +build process, using a custom TableGen backend, we generate a `.inc` C++ file |
| 57 | +containing a `std::map` with one key/value element for each entry in the `.td` |
| 58 | +file. Using a map we can later update or add new elements if the user provides |
| 59 | +new targets at application compile time. Finally, the tools and compiler |
| 60 | +modules that need information about the targets can simply query the map to get |
| 61 | +it. |
| 62 | + |
| 63 | +Further information about TableGen can be found in [TableGenFundamentals](https://releases.llvm.org/1.9/docs/TableGenFundamentals.html). |
| 64 | + |
| 65 | +### New `TableGen` backend |
| 66 | +Note: This [guide](https://llvm.org/docs/TableGen/BackGuide.html) details how |
| 67 | +to implement new TableGen backends. Also, the [Search |
| 68 | +Indexes](https://llvm.org/docs/TableGen/BackEnds.html#search-indexes) backend |
| 69 | +already does something very similar to what we seek. It generates a table that |
| 70 | +provides a lookup function, but it cannot be extended with new entries. We can |
| 71 | +use _Search Indexes_ backend as inspiration for ours. |
| 72 | + |
| 73 | +Our backend should generate a map where the key is the target name and the value |
| 74 | +is an object of a custom class/struct including all the information required. |
| 75 | + |
| 76 | +Firstly, we need to provide a file describing the `DynamicTable` class. An |
| 77 | +example for this is `SearchableTable.td`, which describes `GenericEnum`, and |
| 78 | +`GenericTable` classes for `gen-searchable-tables` backend. File |
| 79 | +`llvm/include/llvm/TableGen/DynamicTable.td` should look like the one below: |
| 80 | +``` |
| 81 | +//===- DynamicTable.td ----------------------------------*- tablegen -*-===// |
| 82 | +// |
| 83 | +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. |
| 84 | +// See https://llvm.org/LICENSE.txt for license information. |
| 85 | +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| 86 | +// |
| 87 | +//===----------------------------------------------------------------------===// |
| 88 | +// |
| 89 | +// This file defines the key top-level classes needed to produce a reasonably |
| 90 | +// generic dynamic table that can be updated in runtime. DynamicTable objects |
| 91 | +// can be defined using the class in this file: |
| 92 | +// 1. (Dynamic) Tables. By instantiating the DynamicTable |
| 93 | +// class once, a table with the name of the instantiating def is generated and |
| 94 | +// guarded by the GET_name_IMPL preprocessor guard. |
| 95 | +// |
| 96 | +//===----------------------------------------------------------------------===// |
| 97 | +// Define a record derived from this class to generate a dynamic table. This |
| 98 | +// table resembles a hashtable with a key-value pair, and can updated in runtime. |
| 99 | +// |
| 100 | +// The name of the record is used as the name of the global primary array of |
| 101 | +// entries of the table in C++. |
| 102 | +class DynamicTable { |
| 103 | + // Name of a class. The table will have one entry for each record that |
| 104 | + // derives from that class. |
| 105 | + string FilterClass; |
| 106 | +
|
| 107 | + // Name of the C++ struct/class type that holds table entries. The |
| 108 | + // declaration of this type is not generated automatically. |
| 109 | + string CppTypeName = FilterClass; |
| 110 | +
|
| 111 | + // List of the names of fields of collected records that contain the data for |
| 112 | + // table entries, in the order that is used for initialization in C++. |
| 113 | + // |
| 114 | + // TableGen needs to know the type of the fields so that it can format |
| 115 | + // the initializers correctly. |
| 116 | + // |
| 117 | + // For each field of the table named xxx, TableGen will look for a field |
| 118 | + // named TypeOf_xxx and use that as a more detailed description of the |
| 119 | + // type of the field. |
| 120 | +
|
| 121 | + // class MyTableEntry { |
| 122 | + // MyEnum V; |
| 123 | + // ... |
| 124 | + // } |
| 125 | + // |
| 126 | + // def MyTable : DynamicTable { |
| 127 | + // let FilterClass = "MyTableEntry"; |
| 128 | + // let Fields = ["V", ...]; |
| 129 | + // string TypeOf_V = "list<int>"; |
| 130 | + // } |
| 131 | + list<string> Fields; |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +This file should be included --either directly or indirectly-- in any other |
| 136 | +`.td` file that uses `DynamicTable` class. |
| 137 | + |
| 138 | +The default device configuration `.td` file should look like the one below: |
| 139 | +``` |
| 140 | +include "llvm/TableGen/DynamicTable.td" |
| 141 | +
|
| 142 | +// Aspect and all the aspects definitions could be outlined |
| 143 | +// to another .td file that could be included into this file |
| 144 | +class Aspect<string name> { |
| 145 | + string Name = name; |
| 146 | +} |
| 147 | +
|
| 148 | +def AspectCpu : Aspect<"cpu">; |
| 149 | +def AspectGpu : Aspect<"gpu">; |
| 150 | +def AspectAccelerator : Aspect<"accelerator">; |
| 151 | +def AspectCustom : Aspect<"custom">; |
| 152 | +def AspectFp16 : Aspect<"fp16">; |
| 153 | +def AspectFp64 : Aspect<"fp64">; |
| 154 | +def AspectImage : Aspect<"image">; |
| 155 | +def AspectOnline_compiler : Aspect<"online_compiler">; |
| 156 | +def AspectOnline_linker : Aspect<"online_linker">; |
| 157 | +def AspectQueue_profiling : Aspect<"queue_profiling">; |
| 158 | +def AspectUsm_device_allocations : Aspect<"usm_device_allocations">; |
| 159 | +def AspectUsm_host_allocations : Aspect<"usm_host_allocations">; |
| 160 | +def AspectUsm_shared_allocations : Aspect<"usm_shared_allocations">; |
| 161 | +def AspectUsm_system_allocations : Aspect<"usm_system_allocations">; |
| 162 | +def AspectExt_intel_pci_address : Aspect<"ext_intel_pci_address">; |
| 163 | +def AspectExt_intel_gpu_eu_count : Aspect<"ext_intel_gpu_eu_count">; |
| 164 | +def AspectExt_intel_gpu_eu_simd_width : Aspect<"ext_intel_gpu_eu_simd_width">; |
| 165 | +def AspectExt_intel_gpu_slices : Aspect<"ext_intel_gpu_slices">; |
| 166 | +def AspectExt_intel_gpu_subslices_per_slice : Aspect<"ext_intel_gpu_subslices_per_slice">; |
| 167 | +def AspectExt_intel_gpu_eu_count_per_subslice : Aspect<"ext_intel_gpu_eu_count_per_subslice">; |
| 168 | +def AspectExt_intel_max_mem_bandwidth : Aspect<"ext_intel_max_mem_bandwidth">; |
| 169 | +def AspectExt_intel_mem_channel : Aspect<"ext_intel_mem_channel">; |
| 170 | +def AspectUsm_atomic_host_allocations : Aspect<"usm_atomic_host_allocations">; |
| 171 | +def AspectUsm_atomic_shared_allocations : Aspect<"usm_atomic_shared_allocations">; |
| 172 | +def AspectAtomic64 : Aspect<"atomic64">; |
| 173 | +def AspectExt_intel_device_info_uuid : Aspect<"ext_intel_device_info_uuid">; |
| 174 | +def AspectExt_oneapi_srgb : Aspect<"ext_oneapi_srgb">; |
| 175 | +def AspectExt_oneapi_native_assert : Aspect<"ext_oneapi_native_assert">; |
| 176 | +def AspectHost_debuggable : Aspect<"host_debuggable">; |
| 177 | +def AspectExt_intel_gpu_hw_threads_per_eu : Aspect<"ext_intel_gpu_hw_threads_per_eu">; |
| 178 | +def AspectExt_oneapi_cuda_async_barrier : Aspect<"ext_oneapi_cuda_async_barrier">; |
| 179 | +def AspectExt_oneapi_bfloat16_math_functions : Aspect<"ext_oneapi_bfloat16_math_functions">; |
| 180 | +def AspectExt_intel_free_memory : Aspect<"ext_intel_free_memory">; |
| 181 | +def AspectExt_intel_device_id : Aspect<"ext_intel_device_id">; |
| 182 | +def AspectExt_intel_memory_clock_rate : Aspect<"ext_intel_memory_clock_rate">; |
| 183 | +def AspectExt_intel_memory_bus_width : Aspect<"ext_intel_memory_bus_width">; |
| 184 | +def AspectEmulated : Aspect<"emulated">; |
| 185 | + |
| 186 | +def TargetTable : DynamicTable { |
| 187 | + let FilterClass = "TargetInfo"; |
| 188 | + let Fields = ["TargetName", "aspects", "maySupportOtherAspects", |
| 189 | + "subGroupSizes", "aotToolchain", "aotToolchainOptions"]; |
| 190 | + string TypeOf_aspects = "list<Aspect>"; |
| 191 | + string TypeOf_subGroupSizes = "list<int>" |
| 192 | +} |
| 193 | +
|
| 194 | +class TargetInfo <string tgtName, list<Aspect> aspectList, bit otherAspects, |
| 195 | + list<int> listSubGroupSizes, string toolchain, string options> |
| 196 | +{ |
| 197 | + list<Aspect> aspects = aspectList; |
| 198 | + bits<1> maySupportOtherAspects = otherAspects; |
| 199 | + list<int> subGroupSizes = listSubGroupSizes; |
| 200 | + string aotToolchain = toolchain; |
| 201 | + string aotToolchainOptions = options; |
| 202 | +} |
| 203 | +
|
| 204 | +def : TargetInfo<"TargetA", [AspectCpu, AspectAtomic64], |
| 205 | + 0, [8, 16], "ocloc", "-device tgtA">; |
| 206 | +def : TargetInfo<"TargetB", [AspectGpu, AspectFp16], |
| 207 | + 0, [8, 16], "ocloc", "-device tgtB">; |
| 208 | +def : TargetInfo<"TargetC", [AspectEmulated, AspectImage], |
| 209 | + 0, [8, 32], "ocloc", "-device tgtC, -option2 val">; |
| 210 | +``` |
| 211 | +Note: backends tested don't allow lists within `TargetInfo` class. This is a |
| 212 | +backend limitation, rather than a TableGen limitation. Thus, we should be able |
| 213 | +to lift this limitation in our own backend, as shown in the initial prototype |
| 214 | +implemented to drive the design. |
| 215 | + |
| 216 | +The generated `.inc` file should look like the example below: |
| 217 | +```c++ |
| 218 | +std::map<std::string, TargetInfo> TargetTable = { |
| 219 | + {"TargetA", |
| 220 | + {{"cpu", "atomic64"}, 0, {8, 16}, "ocloc", "-device tgtA"}}, |
| 221 | + {"TargetB", |
| 222 | + {{"gpu", "fp16"}, 0, {8, 16}, "ocloc", "-device tgtB"}}, |
| 223 | + {"TargetC", |
| 224 | + {{"emulated", "image"}, 0, {8, 32}, "ocloc", "-device tgtC, -option2 val"}}}; |
| 225 | +``` |
| 226 | +
|
| 227 | +We also need a header file that includes the `.inc` file generated by the |
| 228 | +TableGen backend. Other backends don't generate the definition of `struct |
| 229 | +TargetInfo`, and this seems a good idea to me: it simplifies the backend |
| 230 | +implementation, and it is easier for developers to check the data structure |
| 231 | +to understand how to work with it. The idea is simply to define the struct |
| 232 | +in this header file. This header file should look like the code below: |
| 233 | +```c++ |
| 234 | +namespace DeviceConfigFile { |
| 235 | +struct TargetInfo { |
| 236 | + bool maySupportOtherAspects; |
| 237 | + std::vector<std::string> aspects; |
| 238 | + std::vector<unsigned> subGroupSizes; |
| 239 | + std::string aotToolchain; |
| 240 | + std::string aotToolchainOptions; |
| 241 | +}; |
| 242 | +
|
| 243 | +#include "device_config_file.inc" |
| 244 | +using TargetTable_t = std::map<std::string, TargetInfo>; |
| 245 | +}; // namespace DeviceConfigFile |
| 246 | +``` |
| 247 | + |
| 248 | +Other modules can query the map to get the information like in the example |
| 249 | +below: |
| 250 | +```c++ |
| 251 | +DeviceConfigFile::TargetInfo info = DeviceConfigFile::targets.find("TargetA"); |
| 252 | +if (info == DeviceConfigFile::targets.end()) { |
| 253 | + /* Target not found */ |
| 254 | + ... |
| 255 | +} else { |
| 256 | + auto aspects = info.aspects; |
| 257 | + auto maySupportOtherAspects = info.maySupportOtherAspects; |
| 258 | + auto subGroupSizes = info.subGroupSizes; |
| 259 | + ... |
| 260 | +} |
| 261 | +``` |
| 262 | + |
| 263 | +## Tools and Modules Interacting with Device Config File |
| 264 | +This is a list of the tools and compiler modules that require using the file: |
| 265 | +- The *compiler driver* needs the file to determine the set of legal values for |
| 266 | +`-fsycl-targets`. |
| 267 | +- The *compiler driver* needs the file to define macros for `any_device_has/all_devices_have`. |
| 268 | +- *Clang* needs the file to emit diagnostics related to `-fsycl-fixed-targets.` |
| 269 | +- `sycl-post-link` needs the file to filter kernels in device images when doing AOT |
| 270 | +compilation. |
| 271 | + |
| 272 | +Following, you can find the changes required in different parts of the project |
| 273 | +in more detail. |
| 274 | + |
| 275 | +### Changes to Build Infrastructure |
| 276 | +We need the information about the targets in multiple tools and compiler |
| 277 | +modules listed in [Requirements](#Requirements). Thus, we need to make sure |
| 278 | +that the generation of the `.inc` file out of the `.td` file is done in time |
| 279 | +for all the consumers. The command we need to run for TableGen is `llvm-tblgen |
| 280 | +-gen-dynamic-tables -I /llvm-root/llvm/include/ input.td -o output.inc`. |
| 281 | +Additionally, we need to set dependencies adequately so that this command is |
| 282 | +run before any of the consumers need it. |
| 283 | + |
| 284 | +### Changes to the DPC++ Frontend |
| 285 | +To allow users to add new targets we provide a new flag: |
| 286 | +`fsycl-device-config-file=/path/to/file.yaml`. Users can pass a `.yaml` file |
| 287 | +describing the targets to be added/updated. An example of how such `.yaml` file |
| 288 | +should look like is shown below. |
| 289 | +``` |
| 290 | +intel_gpu_skl: |
| 291 | + aspects: [aspect_name1, aspect_name2] |
| 292 | + may_support_other_aspects: true/false |
| 293 | + sub-group-sizes: [1, 2, 4, 8] |
| 294 | + aot-toolchain: ocloc |
| 295 | + aot-toolchain-options: -device skl |
| 296 | +``` |
| 297 | +The frontend module should parse the user-provided `.yaml` file and update the |
| 298 | +map with the new information about targets. LLVM provides |
| 299 | +[YAML/IO](https://llvm.org/docs/YamlIO.html) library to easily parse `.yaml` |
| 300 | +files. The driver should propagate this option to all the tools that require |
| 301 | +the Device Configuration File (e.g. `sycl-post-link`) so that each of the |
| 302 | +tools can modify the map according to the user extensions described in the |
| 303 | +`.yaml` file. |
| 304 | + |
| 305 | +As mentioned in [Requirements](#Requirements), there is an auto-detection |
| 306 | +mechanism for `aot-toolchain` and `aot-toolchain-options` that is able to |
| 307 | +infer these from the target name. In the `.yaml` example shown above the target |
| 308 | +name is `intel_gpu_skl`. From that name, we can infer that `aot-toolchain` is |
| 309 | +`ocloc` because the name starts with `intel_gpu`. Also, we can infer that it needs |
| 310 | +`aot-toolchain-options` set to `-device skl` just by keeping what is left after the |
| 311 | +prefix `intel_gpu`. |
| 312 | + |
| 313 | +#### Potential Issues/Limitations |
| 314 | +- Multiple targets with the same name: On the one hand, the compiler emits a |
| 315 | +warning so that the user is aware that multiple targets share the same name. On |
| 316 | +the other hand, it simply processes each new entry and updates the map with the |
| 317 | +latest information found. |
| 318 | + |
| 319 | +The auto-detection mechanism is a best effort to relieve users from specifying |
| 320 | +`aot-toolchain` and `aot-toolchain-options` from well known devices. However, |
| 321 | +it has its own limitations and potential issues: |
| 322 | +- Rules for target names: As of now, auto-detection is only available for Intel GPU |
| 323 | +targets. All targets starting with `intel_gpu_` will automatically set |
| 324 | +`aot-toolchain=ocloc` and `aot-toolchain-options=-device suffix` being suffix the part |
| 325 | +left after `intel_gpu_` prefix. |
| 326 | +- User specifies `aot-toolchain` and `aot-toolchain-options` for a target name |
| 327 | +that can be auto-detected: user-specified information has precedence over auto-detected |
| 328 | +information. |
| 329 | + |
| 330 | +## Testing |
| 331 | +There is a danger that the device configuration file will get out-of-sync with the |
| 332 | +actual device capabilities. In order to prevent that, we need testing to validate |
| 333 | +that the device config file does not go out-of-sync. There are two tests that we |
| 334 | +should include: |
| 335 | +- A test that compares the list of aspects known to SYCL RT (defined in `aspects.def`) |
| 336 | +with the list of aspects defined in the `.td` file describing the default configuration. |
| 337 | +This will be useful to detect new aspects added to SYCL RT that have not been added in |
| 338 | +the `.td` file. |
| 339 | +- A test that compares the aspects listed in the `.td` file with the aspects reported |
| 340 | +via `device::has` for each device listed in the `.td` file. Both lists should match. |
| 341 | +This test could copy the mechanism of the test for `any_device_has` that goes over each |
| 342 | +item in `aspects.def` and tries to instantiate `any_device_has` with that enumerator. |
| 343 | + |
| 344 | +Neither of the tests provides guarantees that nothing went out-of-sync *per se*, we |
| 345 | +would require running the second test in all the targets described in the `.td` file |
| 346 | +for such guarantees, but at least provides the mechanism to detect potential desyncs. |
| 347 | + |
0 commit comments