2
2
3
3
This document describes the design for the DPC++ implementation of the
4
4
[ sycl\_ ext\_ oneapi\_ device\_ if] [ 1 ] and
5
- [ sycl\_ ext\_ intel \_ device\_ architecture] [ 2 ] extensions.
5
+ [ sycl\_ ext\_ oneapi \_ device\_ architecture] [ 2 ] extensions.
6
6
7
7
[ 1 ] : < ../extensions/proposed/sycl_ext_oneapi_device_if.asciidoc >
8
- [ 2 ] : < ../extensions/proposed/sycl_ext_intel_device_architecture .asciidoc >
8
+ [ 2 ] : < ../extensions/proposed/sycl_ext_oneapi_device_architecture .asciidoc >
9
9
10
10
11
11
## Phased implementation
12
12
13
13
The implementation is divided into two phases. In the first phase, we support
14
- only [ sycl\_ ext\_ intel \_ device\_ architecture] [ 2 ] and it is supported only in
14
+ only [ sycl\_ ext\_ oneapi \_ device\_ architecture] [ 2 ] and it is supported only in
15
15
AOT mode. The second phase adds support for both extensions in both AOT and
16
16
JIT modes.
17
17
@@ -73,6 +73,46 @@ recognizes:
73
73
* ` intel_gpu_11_2_0 ` (alias for ` intel_gpu_ehl ` )
74
74
* ` intel_gpu_12_0_0 ` (alias for ` intel_gpu_tgllp ` )
75
75
* ` intel_gpu_12_10_0 ` (alias for ` intel_gpu_dg1 ` )
76
+ * ` nvidia_gpu_sm20 `
77
+ * ` nvidia_gpu_sm30 `
78
+ * ` nvidia_gpu_sm32 `
79
+ * ` nvidia_gpu_sm35 `
80
+ * ` nvidia_gpu_sm37 `
81
+ * ` nvidia_gpu_sm50 `
82
+ * ` nvidia_gpu_sm52 `
83
+ * ` nvidia_gpu_sm53 `
84
+ * ` nvidia_gpu_sm60 `
85
+ * ` nvidia_gpu_sm61 `
86
+ * ` nvidia_gpu_sm62 `
87
+ * ` nvidia_gpu_sm70 `
88
+ * ` nvidia_gpu_sm72 `
89
+ * ` nvidia_gpu_sm75 `
90
+ * ` nvidia_gpu_sm80 `
91
+ * ` nvidia_gpu_sm86 `
92
+ * ` nvidia_gpu_sm87 `
93
+ * ` nvidia_gpu_sm89 `
94
+ * ` nvidia_gpu_sm90 `
95
+ * ` amd_gpu_gfx700 `
96
+ * ` amd_gpu_gfx701 `
97
+ * ` amd_gpu_gfx702 `
98
+ * ` amd_gpu_gfx801 `
99
+ * ` amd_gpu_gfx802 `
100
+ * ` amd_gpu_gfx803 `
101
+ * ` amd_gpu_gfx805 `
102
+ * ` amd_gpu_gfx810 `
103
+ * ` amd_gpu_gfx900 `
104
+ * ` amd_gpu_gfx902 `
105
+ * ` amd_gpu_gfx904 `
106
+ * ` amd_gpu_gfx906 `
107
+ * ` amd_gpu_gfx908 `
108
+ * ` amd_gpu_gfx90a `
109
+ * ` amd_gpu_gfx1010 `
110
+ * ` amd_gpu_gfx1011 `
111
+ * ` amd_gpu_gfx1012 `
112
+ * ` amd_gpu_gfx1013 `
113
+ * ` amd_gpu_gfx1030 `
114
+ * ` amd_gpu_gfx1031 `
115
+ * ` amd_gpu_gfx1032 `
76
116
77
117
The above listed device names may not be mixed with the existing target name
78
118
` spir64_gen ` on the same command line. In addition, the user must not pass the
@@ -120,6 +160,46 @@ one of the following corresponding C++ macro names:
120
160
* ` __SYCL_TARGET_INTEL_GPU_ACM_G11__ `
121
161
* ` __SYCL_TARGET_INTEL_GPU_ACM_G12__ `
122
162
* ` __SYCL_TARGET_INTEL_GPU_PVC__ `
163
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM20__ `
164
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM30__ `
165
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM32__ `
166
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM35__ `
167
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM37__ `
168
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM50__ `
169
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM52__ `
170
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM53__ `
171
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM60__ `
172
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM61__ `
173
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM62__ `
174
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM70__ `
175
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM72__ `
176
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM75__ `
177
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM80__ `
178
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM86__ `
179
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM87__ `
180
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM89__ `
181
+ * ` __SYCL_TARGET_NVIDIA_GPU_SM90__ `
182
+ * ` __SYCL_TARGET_AMD_GPU_GFX700__ `
183
+ * ` __SYCL_TARGET_AMD_GPU_GFX701__ `
184
+ * ` __SYCL_TARGET_AMD_GPU_GFX702__ `
185
+ * ` __SYCL_TARGET_AMD_GPU_GFX801__ `
186
+ * ` __SYCL_TARGET_AMD_GPU_GFX802__ `
187
+ * ` __SYCL_TARGET_AMD_GPU_GFX803__ `
188
+ * ` __SYCL_TARGET_AMD_GPU_GFX805__ `
189
+ * ` __SYCL_TARGET_AMD_GPU_GFX810__ `
190
+ * ` __SYCL_TARGET_AMD_GPU_GFX900__ `
191
+ * ` __SYCL_TARGET_AMD_GPU_GFX902__ `
192
+ * ` __SYCL_TARGET_AMD_GPU_GFX904__ `
193
+ * ` __SYCL_TARGET_AMD_GPU_GFX906__ `
194
+ * ` __SYCL_TARGET_AMD_GPU_GFX908__ `
195
+ * ` __SYCL_TARGET_AMD_GPU_GFX90A__ `
196
+ * ` __SYCL_TARGET_AMD_GPU_GFX1010__ `
197
+ * ` __SYCL_TARGET_AMD_GPU_GFX1011__ `
198
+ * ` __SYCL_TARGET_AMD_GPU_GFX1012__ `
199
+ * ` __SYCL_TARGET_AMD_GPU_GFX1013__ `
200
+ * ` __SYCL_TARGET_AMD_GPU_GFX1030__ `
201
+ * ` __SYCL_TARGET_AMD_GPU_GFX1031__ `
202
+ * ` __SYCL_TARGET_AMD_GPU_GFX1032__ `
123
203
124
204
If the user invokes the compiler driver with ` -fsycl-targets=spir64_x86_64 ` ,
125
205
the compiler driver must predefine the following C++ macro name:
@@ -131,14 +211,14 @@ documented to users, and user code should not make use of them.
131
211
132
212
### Changes to the device headers
133
213
134
- The device headers implement the [ sycl\_ ext\_ intel \_ device\_ architecture] [ 2 ]
214
+ The device headers implement the [ sycl\_ ext\_ oneapi \_ device\_ architecture] [ 2 ]
135
215
extension using these predefined macros and leverage ` if constexpr ` to discard
136
216
statements in the "if" or "else" body when the device does not match one of the
137
217
listed architectures. The following code snippet illustrates the technique:
138
218
139
219
```
140
220
namespace sycl {
141
- namespace ext::intel::experimental {
221
+ namespace ext::oneapi::exprimental {
142
222
143
223
enum class architecture {
144
224
x86_64,
@@ -148,7 +228,7 @@ enum class architecture {
148
228
// ...
149
229
};
150
230
151
- } // namespace ext::intel::experimental
231
+ } // namespace ext::oneapi::exprimental
152
232
153
233
namespace detail {
154
234
@@ -191,14 +271,14 @@ static constexpr bool is_aot_for_architecture[] = {
191
271
192
272
// Read the value of "is_allowable_aot_mode" via a template to defer triggering
193
273
// static_assert() until template instantiation time.
194
- template<ext::intel ::experimental::architecture... Archs>
274
+ template<ext::oneapi ::experimental::architecture... Archs>
195
275
constexpr static bool allowable_aot_mode() {
196
276
return is_allowable_aot_mode;
197
277
}
198
278
199
279
// Tells if the current device has one of the architectures in the parameter
200
280
// pack.
201
- template<ext::intel ::experimental::architecture... Archs>
281
+ template<ext::oneapi ::experimental::architecture... Archs>
202
282
constexpr static bool device_architecture_is() {
203
283
return (is_aot_for_architecture[static_cast<int>(Archs)] || ...);
204
284
}
@@ -212,7 +292,7 @@ constexpr static bool device_architecture_is() {
212
292
template<bool MakeCall>
213
293
class if_architecture_helper {
214
294
public:
215
- template<ext::intel::experimental ::architecture ...Archs, typename T,
295
+ template<ext::oneapi::exprimental ::architecture ...Archs, typename T,
216
296
typename ...Args>
217
297
constexpr auto else_if_architecture_is(T fnTrue, Args ...args) {
218
298
if constexpr (MakeCall && device_architecture_is<Archs...>()) {
@@ -233,7 +313,7 @@ class if_architecture_helper {
233
313
234
314
} // namespace detail
235
315
236
- namespace ext::intel::experimental {
316
+ namespace ext::oneapi::exprimental {
237
317
238
318
template<architecture ...Archs, typename T, typename ...Args>
239
319
constexpr static auto if_architecture_is(T fnTrue, Args ...args) {
@@ -249,16 +329,16 @@ constexpr static auto if_architecture_is(T fnTrue, Args ...args) {
249
329
}
250
330
}
251
331
252
- } // namespace ext::intel::experimental
332
+ } // namespace ext::oneapi::exprimental
253
333
} // namespace sycl
254
334
```
255
335
256
336
### Analysis of error checking for unsupported AOT modes
257
337
258
338
The header file code presented above triggers a ` static_assert ` if the
259
339
` if_architecture_is ` function is used in a translation unit that is compiled
260
- for an unsupported target. The only supported targets are ` spir64_x86_64 ` and
261
- the new ` intel_gpu_* ` GPU device names.
340
+ for an unsupported target. The supported targets are ` spir64_x86_64 ` ,
341
+ the new ` intel_gpu_* ` , ` nvidia_gpu_* ` and ` amd_gpu_* ` GPU device names.
262
342
263
343
The error checking relies on the fact that the device compiler is invoked
264
344
separately for each target listed in ` -fsycl-target ` . If any target is
0 commit comments