|
| 1 | +# Mapping host variables to compiler-generated info |
| 2 | + |
| 3 | +[SYCL 2020][sycl-2020-spec] specification and some extensions such as |
| 4 | +[SYCL_INTEL_device_global][device-global-ext-spec] imply that implementation is |
| 5 | +capable to somehow map addresses of a host objects to their counterparts in |
| 6 | +device programs. |
| 7 | + |
| 8 | +For example, in order to implement specialization constants on top of SPIR-V, we |
| 9 | +need to be able to map addresses of `specialization_id` variables into numeric |
| 10 | +IDs of corresponding specialization constants at SPIR-V level. |
| 11 | + |
| 12 | +Another example is device global [implementation][device-global-design], where |
| 13 | +in order to communicate a value of `device_global` variable between host and |
| 14 | +device we need to map its host address to a symbolic name/identifier and some |
| 15 | +other info, which is used at PI layer and below. |
| 16 | + |
| 17 | +This design document describes a generic way how to map address of any SYCL |
| 18 | +object defined in a namespace scope to its unique symbolic ID. Please note that |
| 19 | +this document doesn't try to map the address to something other than a unique |
| 20 | +symbolic ID: other required information is usually generated by the device |
| 21 | +compiler and communicated to the runtime by device image properties. Unique |
| 22 | +symbolic ID which can be obtained from mapping mechanism described in this |
| 23 | +design document could be used as a key in those properties to propagate |
| 24 | +additional information using existing mechanisms. |
| 25 | + |
| 26 | +So, overall the picture looks like: |
| 27 | +- device compiler generates property set/s which provide mapping |
| 28 | + "unique symbolic ID" -> "various information required by DPC++ RT" |
| 29 | +- device or host compiler generates mapping |
| 30 | + "address of a host variable" -> "unique symbolic ID" (as described below by |
| 31 | + this document) |
| 32 | +- DPC++ RT uses these to mappings to obtain required information and somehow |
| 33 | + uses it |
| 34 | + |
| 35 | +This design document describes two approaches of how the |
| 36 | +"address of a host variable" -> "unique symbolic ID" mapping can be generated: |
| 37 | +the first one with integration footer and another one with modification of the |
| 38 | +host compiler. |
| 39 | + |
| 40 | +Both approaches have their pros and cons and they are expected to be implemented |
| 41 | +and exists in the implementation at the same time, but only one of them will be |
| 42 | +used at a time depending on whether 3rd-party host compiler is used or not. |
| 43 | + |
| 44 | +Integration footer can be used with 3rd-party host compilers, but it requires |
| 45 | +appending to a translation unit provided by user, which could affect debug |
| 46 | +information: since there are no compilers that support appending a file at the |
| 47 | +end (similar to `-include`), appending is done by generating a temporary input |
| 48 | +file using concatenation of the original input and integration footer. |
| 49 | + |
| 50 | +Such replacement of the main translation unit causes the following issues: |
| 51 | +- debug information about the source file might be incorrect, leading to |
| 52 | + problems with gdb `l` command and code coverage tools |
| 53 | +- checksum of host and device source files becomes different which causes device |
| 54 | + code debugging to be completely broken in some environments (such as MS Visual |
| 55 | + Studio, for example) |
| 56 | + |
| 57 | +Customizing host compiler allows to avoid issues with debuggers and code |
| 58 | +coverage tools, but that is not an option if user wants to compile host part |
| 59 | +of an app with a 3rd-party host compiler. |
| 60 | + |
| 61 | +Further sections describe the implementation design of both approaches in more |
| 62 | +details, note that there are few components which should be modified regardless |
| 63 | +of which approach is in use. |
| 64 | + |
| 65 | +## Common front-end part |
| 66 | + |
| 67 | +DPC++ FE should support the following attribute: |
| 68 | +`[[__sycl_detail__::uniquely_identifiable_object(kind)]]`. This attribute accepts |
| 69 | +a string literal and should be applied to types (like `device_global` or |
| 70 | +`specialization_id`). |
| 71 | + |
| 72 | +Presence of the attribute instructs the compiler to perform the following |
| 73 | +things: |
| 74 | +- emit `sycl-unique-id` LLVM IR attribute on each definition of a variable of |
| 75 | + type marked with `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` |
| 76 | + attribute. `sycl-unique-id` LLVM IR attribute should be accompanied by a |
| 77 | + unique string identifier of a variable it is attached to. The rules for |
| 78 | + creating this string are the same as for `__builtin_sycl_unique_stable_id` and |
| 79 | + the same algorithm can be used when generating the string for the attribute |
| 80 | +- emit `sycl-uid-type` LLVM IR attribute alongside `sycl-unique-id`, which |
| 81 | + contains the `kind` string passed to |
| 82 | + `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute |
| 83 | + |
| 84 | +**TODO**: we have `[[__sycl_detail__::device_global]]` attribute documented in |
| 85 | +[device global design doc][device-global-design], which instructs front-end to |
| 86 | +emit some additional semantic checking. Shall we leave it in place or that |
| 87 | +request for semantic checking should also be documented by |
| 88 | +`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute when `kind` |
| 89 | +is set to a certain value? |
| 90 | + |
| 91 | +**TODO**: alternatively, we could completely re-use existing |
| 92 | +`[[__sycl_detail__::device_global]]` attribute and introduce another one for |
| 93 | +specialization constants, i.e. it is a question of whether or not we want to |
| 94 | +generalize unique IDs generation in form of a generic attribute or not. |
| 95 | + |
| 96 | +When DPC++ compiler is used as both host and device compiler, then the attribute |
| 97 | +should be respected by both host and device compiler passes and LLVM IR |
| 98 | +attributes should appear in LLVM IR for both host and device code. When DPC++ |
| 99 | +compiler is only used as a device compiler, then we don't expect the attribute |
| 100 | +to be handled on host, apparently. |
| 101 | + |
| 102 | +Another thing we need from DPC++ FE compiler is to define a special macro, which |
| 103 | +will allow to distinguish it from other compilers. That is needed to apply the |
| 104 | +aforementioned attribute conditionally to avoid spamming users with warnings |
| 105 | +about unknown attributes. |
| 106 | + |
| 107 | +The suggested macro name is `__INTEL_SYCL_HOST_COMPILER__`. It should be defined |
| 108 | +when the compiler is invoked in SYCL host mode (`-fsycl-is-host` `-cc1` flag). |
| 109 | + |
| 110 | +## Common headers part |
| 111 | + |
| 112 | +Header files should be modified by adding the new attributes to types |
| 113 | +declarations, objects of which we will need in our mapping.Again, |
| 114 | +`device_global` and `specialization_id` are examples here: |
| 115 | + |
| 116 | +``` |
| 117 | +template <typename T> |
| 118 | +class |
| 119 | +#if defined(__SYCL_DEVICE_ONLY__) || defined(__INTEL_SYCL_HOST_COMPILER__) |
| 120 | + [[__sycl_detail__::uniquely_identifiable_object("specialization_id")]] |
| 121 | +#endif |
| 122 | +specialization_id { |
| 123 | +// ... |
| 124 | +}; |
| 125 | +``` |
| 126 | + |
| 127 | +## Common runtime part |
| 128 | + |
| 129 | +The runtime should implement the following function, which will be called from |
| 130 | +a code generated by the compiler (see the next section): |
| 131 | + |
| 132 | +``` |
| 133 | +void __register_uniquely_identifiable_object( |
| 134 | + void *Address, const char* UniqueID, const char *Kind); |
| 135 | +``` |
| 136 | + |
| 137 | +The function accepts the following arguments: |
| 138 | +- `Address` is an address of a variable, which exists in an application on host |
| 139 | +- `UniqueID` is a unique symbolic ID, which corresponds to that variable |
| 140 | +- `Kind` is a string which corresponds to `kind` argument passed to |
| 141 | + `[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute attached |
| 142 | + to the type of the variable identified by `Address`. It can be used to |
| 143 | + distinguish different entities like `specialization_id` and `device_global`: |
| 144 | + for example they could be stored in different maps to speed up certain |
| 145 | + operations with them. |
| 146 | + |
| 147 | +The compiler guarantees that the function will be called zero or more times |
| 148 | +(depending on the amount of uniquely identifiable objects found in a program) |
| 149 | +_before_ application's `main()` function, i.e. in a global constructor. |
| 150 | + |
| 151 | +That poses some restrictions on those uniquely identifiable object, i.e. that |
| 152 | +they can't be used from another global object due to risk of accessing a |
| 153 | +non-initialized object, but that is an UB anyway because the order of global |
| 154 | +objects initialization is not defined in C++ when those objects are defined in |
| 155 | +different translation unit. |
| 156 | + |
| 157 | +## Compiler driver part |
| 158 | + |
| 159 | +The compiler driver is the component which is responsible for selecting the |
| 160 | +approach we are taking and the decision is made based on whether or not |
| 161 | +3rd-party host compiler is in use. |
| 162 | + |
| 163 | +If `-fsycl-host-compiler` option is present, the compiler driver chooses the |
| 164 | +integration footer approach: |
| 165 | +- it supplies device compilation step with `-fsycl-int-footer` option to |
| 166 | + instruct device compiler to emit integration footer |
| 167 | +- it appends the integration footer to user-provided translation unit before |
| 168 | + passing it to a host compiler |
| 169 | + |
| 170 | +Otherwise, if `-fsycl-host-compiler` is not present, then the compiler driver |
| 171 | +chooses another approach by simply doing nothing related to integration footer: |
| 172 | +- `-fsycl-int-footer` is **not** passed to device compiler |
| 173 | +- user-provided translation unit is passes as-is to host compiler |
| 174 | + |
| 175 | +## Integration footer approach |
| 176 | + |
| 177 | +When this approach is used, not only extra file (integration footer) is |
| 178 | +generated, but integration header is also modified: FE compiler generates a |
| 179 | +definition of a namespace scope variable of type |
| 180 | +`__sycl_device_global_registration` whose sole purpose it to run its constructor |
| 181 | +before the application's `main()` function: |
| 182 | + |
| 183 | +``` |
| 184 | +namespace sycl::detail { |
| 185 | +namespace { |
| 186 | +
|
| 187 | +class __sycl_device_global_registration { |
| 188 | + public: |
| 189 | + __sycl_device_global_registration() noexcept; |
| 190 | +}; |
| 191 | +__sycl_device_global_registration __sycl_device_global_registrar; |
| 192 | +
|
| 193 | +} // namespace (unnamed) |
| 194 | +} // namespace sycl::detail |
| 195 | +``` |
| 196 | + |
| 197 | +The integration footer generated by the compiler contains the definition of the |
| 198 | +constructor, which calls a function in the DPC++ runtime, which registers |
| 199 | +needed mappings: |
| 200 | + |
| 201 | +``` |
| 202 | +namespace sycl::detail { |
| 203 | +namespace { |
| 204 | +
|
| 205 | +__sycl_device_global_registration::__sycl_device_global_registration() noexcept { |
| 206 | + __register_uniquely_identifiable_object( |
| 207 | + &::Foo, |
| 208 | + /* same string returned from __builtin_sycl_unique_stable_id(::Foo) */, |
| 209 | + "specialization_id"); |
| 210 | + __register_uniquely_identifiable_object( |
| 211 | + &::inner::Bar, |
| 212 | + /* same string returned from __builtin_sycl_unique_stable_id(::inner::Bar) */, |
| 213 | + "device_global"); |
| 214 | +} |
| 215 | +
|
| 216 | +} // namespace (unnamed) |
| 217 | +} // namespace sycl::detail |
| 218 | +``` |
| 219 | + |
| 220 | +## Custom host compiler approach |
| 221 | + |
| 222 | +With this approach, we simply schedule a one more pass in the optimization |
| 223 | +pipeline, which should be executed regardless of the optimization level, because |
| 224 | +it is required for proper functioning of some features. |
| 225 | + |
| 226 | +The pass does similar thing to integration footer: it emits a global constructor |
| 227 | +which in turn calls `__register_uniquely_identifiable_object` to provide the |
| 228 | +runtime with required mapping information. |
| 229 | + |
| 230 | +Unlike with integration footer approach no separate file is being generated, |
| 231 | +which preserves all source files mapping and checksums to be in place and |
| 232 | +correct. |
| 233 | + |
| 234 | +Generated constructor function should have internal linkage to avoid possible |
| 235 | +names clashes and multiple definition errors later at link stage. |
| 236 | + |
| 237 | +Generated constructor contains a call to |
| 238 | +`__register_uniquely_identifiable_object` for each global variable which has |
| 239 | +`sycl-unique-id` and `sycl-uid-kind` attributes, passing values of those |
| 240 | +attributes into the corresponding arguments of the function. |
0 commit comments