Skip to content

Commit b4a90d3

Browse files
committed
Initial version of the doc
1 parent 769851c commit b4a90d3

File tree

1 file changed

+240
-0
lines changed

1 file changed

+240
-0
lines changed
Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
# Mapping host variables to compiler-generated info
2+
3+
[SYCL 2020][sycl-2020-spec] specification and some extensions such as
4+
[SYCL_INTEL_device_global][device-global-ext-spec] imply that implementation is
5+
capable to somehow map addresses of a host objects to their counterparts in
6+
device programs.
7+
8+
For example, in order to implement specialization constants on top of SPIR-V, we
9+
need to be able to map addresses of `specialization_id` variables into numeric
10+
IDs of corresponding specialization constants at SPIR-V level.
11+
12+
Another example is device global [implementation][device-global-design], where
13+
in order to communicate a value of `device_global` variable between host and
14+
device we need to map its host address to a symbolic name/identifier and some
15+
other info, which is used at PI layer and below.
16+
17+
This design document describes a generic way how to map address of any SYCL
18+
object defined in a namespace scope to its unique symbolic ID. Please note that
19+
this document doesn't try to map the address to something other than a unique
20+
symbolic ID: other required information is usually generated by the device
21+
compiler and communicated to the runtime by device image properties. Unique
22+
symbolic ID which can be obtained from mapping mechanism described in this
23+
design document could be used as a key in those properties to propagate
24+
additional information using existing mechanisms.
25+
26+
So, overall the picture looks like:
27+
- device compiler generates property set/s which provide mapping
28+
"unique symbolic ID" -> "various information required by DPC++ RT"
29+
- device or host compiler generates mapping
30+
"address of a host variable" -> "unique symbolic ID" (as described below by
31+
this document)
32+
- DPC++ RT uses these to mappings to obtain required information and somehow
33+
uses it
34+
35+
This design document describes two approaches of how the
36+
"address of a host variable" -> "unique symbolic ID" mapping can be generated:
37+
the first one with integration footer and another one with modification of the
38+
host compiler.
39+
40+
Both approaches have their pros and cons and they are expected to be implemented
41+
and exists in the implementation at the same time, but only one of them will be
42+
used at a time depending on whether 3rd-party host compiler is used or not.
43+
44+
Integration footer can be used with 3rd-party host compilers, but it requires
45+
appending to a translation unit provided by user, which could affect debug
46+
information: since there are no compilers that support appending a file at the
47+
end (similar to `-include`), appending is done by generating a temporary input
48+
file using concatenation of the original input and integration footer.
49+
50+
Such replacement of the main translation unit causes the following issues:
51+
- debug information about the source file might be incorrect, leading to
52+
problems with gdb `l` command and code coverage tools
53+
- checksum of host and device source files becomes different which causes device
54+
code debugging to be completely broken in some environments (such as MS Visual
55+
Studio, for example)
56+
57+
Customizing host compiler allows to avoid issues with debuggers and code
58+
coverage tools, but that is not an option if user wants to compile host part
59+
of an app with a 3rd-party host compiler.
60+
61+
Further sections describe the implementation design of both approaches in more
62+
details, note that there are few components which should be modified regardless
63+
of which approach is in use.
64+
65+
## Common front-end part
66+
67+
DPC++ FE should support the following attribute:
68+
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]`. This attribute accepts
69+
a string literal and should be applied to types (like `device_global` or
70+
`specialization_id`).
71+
72+
Presence of the attribute instructs the compiler to perform the following
73+
things:
74+
- emit `sycl-unique-id` LLVM IR attribute on each definition of a variable of
75+
type marked with `[[__sycl_detail__::uniquely_identifiable_object(kind)]]`
76+
attribute. `sycl-unique-id` LLVM IR attribute should be accompanied by a
77+
unique string identifier of a variable it is attached to. The rules for
78+
creating this string are the same as for `__builtin_sycl_unique_stable_id` and
79+
the same algorithm can be used when generating the string for the attribute
80+
- emit `sycl-uid-type` LLVM IR attribute alongside `sycl-unique-id`, which
81+
contains the `kind` string passed to
82+
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute
83+
84+
**TODO**: we have `[[__sycl_detail__::device_global]]` attribute documented in
85+
[device global design doc][device-global-design], which instructs front-end to
86+
emit some additional semantic checking. Shall we leave it in place or that
87+
request for semantic checking should also be documented by
88+
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute when `kind`
89+
is set to a certain value?
90+
91+
**TODO**: alternatively, we could completely re-use existing
92+
`[[__sycl_detail__::device_global]]` attribute and introduce another one for
93+
specialization constants, i.e. it is a question of whether or not we want to
94+
generalize unique IDs generation in form of a generic attribute or not.
95+
96+
When DPC++ compiler is used as both host and device compiler, then the attribute
97+
should be respected by both host and device compiler passes and LLVM IR
98+
attributes should appear in LLVM IR for both host and device code. When DPC++
99+
compiler is only used as a device compiler, then we don't expect the attribute
100+
to be handled on host, apparently.
101+
102+
Another thing we need from DPC++ FE compiler is to define a special macro, which
103+
will allow to distinguish it from other compilers. That is needed to apply the
104+
aforementioned attribute conditionally to avoid spamming users with warnings
105+
about unknown attributes.
106+
107+
The suggested macro name is `__INTEL_SYCL_HOST_COMPILER__`. It should be defined
108+
when the compiler is invoked in SYCL host mode (`-fsycl-is-host` `-cc1` flag).
109+
110+
## Common headers part
111+
112+
Header files should be modified by adding the new attributes to types
113+
declarations, objects of which we will need in our mapping.Again,
114+
`device_global` and `specialization_id` are examples here:
115+
116+
```
117+
template <typename T>
118+
class
119+
#if defined(__SYCL_DEVICE_ONLY__) || defined(__INTEL_SYCL_HOST_COMPILER__)
120+
[[__sycl_detail__::uniquely_identifiable_object("specialization_id")]]
121+
#endif
122+
specialization_id {
123+
// ...
124+
};
125+
```
126+
127+
## Common runtime part
128+
129+
The runtime should implement the following function, which will be called from
130+
a code generated by the compiler (see the next section):
131+
132+
```
133+
void __register_uniquely_identifiable_object(
134+
void *Address, const char* UniqueID, const char *Kind);
135+
```
136+
137+
The function accepts the following arguments:
138+
- `Address` is an address of a variable, which exists in an application on host
139+
- `UniqueID` is a unique symbolic ID, which corresponds to that variable
140+
- `Kind` is a string which corresponds to `kind` argument passed to
141+
`[[__sycl_detail__::uniquely_identifiable_object(kind)]]` attribute attached
142+
to the type of the variable identified by `Address`. It can be used to
143+
distinguish different entities like `specialization_id` and `device_global`:
144+
for example they could be stored in different maps to speed up certain
145+
operations with them.
146+
147+
The compiler guarantees that the function will be called zero or more times
148+
(depending on the amount of uniquely identifiable objects found in a program)
149+
_before_ application's `main()` function, i.e. in a global constructor.
150+
151+
That poses some restrictions on those uniquely identifiable object, i.e. that
152+
they can't be used from another global object due to risk of accessing a
153+
non-initialized object, but that is an UB anyway because the order of global
154+
objects initialization is not defined in C++ when those objects are defined in
155+
different translation unit.
156+
157+
## Compiler driver part
158+
159+
The compiler driver is the component which is responsible for selecting the
160+
approach we are taking and the decision is made based on whether or not
161+
3rd-party host compiler is in use.
162+
163+
If `-fsycl-host-compiler` option is present, the compiler driver chooses the
164+
integration footer approach:
165+
- it supplies device compilation step with `-fsycl-int-footer` option to
166+
instruct device compiler to emit integration footer
167+
- it appends the integration footer to user-provided translation unit before
168+
passing it to a host compiler
169+
170+
Otherwise, if `-fsycl-host-compiler` is not present, then the compiler driver
171+
chooses another approach by simply doing nothing related to integration footer:
172+
- `-fsycl-int-footer` is **not** passed to device compiler
173+
- user-provided translation unit is passes as-is to host compiler
174+
175+
## Integration footer approach
176+
177+
When this approach is used, not only extra file (integration footer) is
178+
generated, but integration header is also modified: FE compiler generates a
179+
definition of a namespace scope variable of type
180+
`__sycl_device_global_registration` whose sole purpose it to run its constructor
181+
before the application's `main()` function:
182+
183+
```
184+
namespace sycl::detail {
185+
namespace {
186+
187+
class __sycl_device_global_registration {
188+
public:
189+
__sycl_device_global_registration() noexcept;
190+
};
191+
__sycl_device_global_registration __sycl_device_global_registrar;
192+
193+
} // namespace (unnamed)
194+
} // namespace sycl::detail
195+
```
196+
197+
The integration footer generated by the compiler contains the definition of the
198+
constructor, which calls a function in the DPC++ runtime, which registers
199+
needed mappings:
200+
201+
```
202+
namespace sycl::detail {
203+
namespace {
204+
205+
__sycl_device_global_registration::__sycl_device_global_registration() noexcept {
206+
__register_uniquely_identifiable_object(
207+
&::Foo,
208+
/* same string returned from __builtin_sycl_unique_stable_id(::Foo) */,
209+
"specialization_id");
210+
__register_uniquely_identifiable_object(
211+
&::inner::Bar,
212+
/* same string returned from __builtin_sycl_unique_stable_id(::inner::Bar) */,
213+
"device_global");
214+
}
215+
216+
} // namespace (unnamed)
217+
} // namespace sycl::detail
218+
```
219+
220+
## Custom host compiler approach
221+
222+
With this approach, we simply schedule a one more pass in the optimization
223+
pipeline, which should be executed regardless of the optimization level, because
224+
it is required for proper functioning of some features.
225+
226+
The pass does similar thing to integration footer: it emits a global constructor
227+
which in turn calls `__register_uniquely_identifiable_object` to provide the
228+
runtime with required mapping information.
229+
230+
Unlike with integration footer approach no separate file is being generated,
231+
which preserves all source files mapping and checksums to be in place and
232+
correct.
233+
234+
Generated constructor function should have internal linkage to avoid possible
235+
names clashes and multiple definition errors later at link stage.
236+
237+
Generated constructor contains a call to
238+
`__register_uniquely_identifiable_object` for each global variable which has
239+
`sycl-unique-id` and `sycl-uid-kind` attributes, passing values of those
240+
attributes into the corresponding arguments of the function.

0 commit comments

Comments
 (0)