Skip to content

Commit c6091df

Browse files
authored
[SYCL][Doc] Update if_architecture_is extension to include NVIDIA and AMD architectures (#7246)
Update if_architecture_is extension to include NVIDIA and AMD architectures - For NVIDIA adds aspect for each sm version, - For AMD adds aspect for each architecture supported by ROCm, - Copies updated version of experimental/sycl_ext_intel_device_architecture.asciidoc to proposed/sycl_ext_oneapi_device_architecture.asciidoc.
1 parent 96bfb05 commit c6091df

File tree

2 files changed

+731
-13
lines changed

2 files changed

+731
-13
lines changed

sycl/doc/design/DeviceIf.md

Lines changed: 93 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,16 @@
22

33
This document describes the design for the DPC++ implementation of the
44
[sycl\_ext\_oneapi\_device\_if][1] and
5-
[sycl\_ext\_intel\_device\_architecture][2] extensions.
5+
[sycl\_ext\_oneapi\_device\_architecture][2] extensions.
66

77
[1]: <../extensions/proposed/sycl_ext_oneapi_device_if.asciidoc>
8-
[2]: <../extensions/proposed/sycl_ext_intel_device_architecture.asciidoc>
8+
[2]: <../extensions/proposed/sycl_ext_oneapi_device_architecture.asciidoc>
99

1010

1111
## Phased implementation
1212

1313
The implementation is divided into two phases. In the first phase, we support
14-
only [sycl\_ext\_intel\_device\_architecture][2] and it is supported only in
14+
only [sycl\_ext\_oneapi\_device\_architecture][2] and it is supported only in
1515
AOT mode. The second phase adds support for both extensions in both AOT and
1616
JIT modes.
1717

@@ -73,6 +73,46 @@ recognizes:
7373
* `intel_gpu_11_2_0` (alias for `intel_gpu_ehl`)
7474
* `intel_gpu_12_0_0` (alias for `intel_gpu_tgllp`)
7575
* `intel_gpu_12_10_0` (alias for `intel_gpu_dg1`)
76+
* `nvidia_gpu_sm20`
77+
* `nvidia_gpu_sm30`
78+
* `nvidia_gpu_sm32`
79+
* `nvidia_gpu_sm35`
80+
* `nvidia_gpu_sm37`
81+
* `nvidia_gpu_sm50`
82+
* `nvidia_gpu_sm52`
83+
* `nvidia_gpu_sm53`
84+
* `nvidia_gpu_sm60`
85+
* `nvidia_gpu_sm61`
86+
* `nvidia_gpu_sm62`
87+
* `nvidia_gpu_sm70`
88+
* `nvidia_gpu_sm72`
89+
* `nvidia_gpu_sm75`
90+
* `nvidia_gpu_sm80`
91+
* `nvidia_gpu_sm86`
92+
* `nvidia_gpu_sm87`
93+
* `nvidia_gpu_sm89`
94+
* `nvidia_gpu_sm90`
95+
* `amd_gpu_gfx700`
96+
* `amd_gpu_gfx701`
97+
* `amd_gpu_gfx702`
98+
* `amd_gpu_gfx801`
99+
* `amd_gpu_gfx802`
100+
* `amd_gpu_gfx803`
101+
* `amd_gpu_gfx805`
102+
* `amd_gpu_gfx810`
103+
* `amd_gpu_gfx900`
104+
* `amd_gpu_gfx902`
105+
* `amd_gpu_gfx904`
106+
* `amd_gpu_gfx906`
107+
* `amd_gpu_gfx908`
108+
* `amd_gpu_gfx90a`
109+
* `amd_gpu_gfx1010`
110+
* `amd_gpu_gfx1011`
111+
* `amd_gpu_gfx1012`
112+
* `amd_gpu_gfx1013`
113+
* `amd_gpu_gfx1030`
114+
* `amd_gpu_gfx1031`
115+
* `amd_gpu_gfx1032`
76116

77117
The above listed device names may not be mixed with the existing target name
78118
`spir64_gen` on the same command line. In addition, the user must not pass the
@@ -120,6 +160,46 @@ one of the following corresponding C++ macro names:
120160
* `__SYCL_TARGET_INTEL_GPU_ACM_G11__`
121161
* `__SYCL_TARGET_INTEL_GPU_ACM_G12__`
122162
* `__SYCL_TARGET_INTEL_GPU_PVC__`
163+
* `__SYCL_TARGET_NVIDIA_GPU_SM20__`
164+
* `__SYCL_TARGET_NVIDIA_GPU_SM30__`
165+
* `__SYCL_TARGET_NVIDIA_GPU_SM32__`
166+
* `__SYCL_TARGET_NVIDIA_GPU_SM35__`
167+
* `__SYCL_TARGET_NVIDIA_GPU_SM37__`
168+
* `__SYCL_TARGET_NVIDIA_GPU_SM50__`
169+
* `__SYCL_TARGET_NVIDIA_GPU_SM52__`
170+
* `__SYCL_TARGET_NVIDIA_GPU_SM53__`
171+
* `__SYCL_TARGET_NVIDIA_GPU_SM60__`
172+
* `__SYCL_TARGET_NVIDIA_GPU_SM61__`
173+
* `__SYCL_TARGET_NVIDIA_GPU_SM62__`
174+
* `__SYCL_TARGET_NVIDIA_GPU_SM70__`
175+
* `__SYCL_TARGET_NVIDIA_GPU_SM72__`
176+
* `__SYCL_TARGET_NVIDIA_GPU_SM75__`
177+
* `__SYCL_TARGET_NVIDIA_GPU_SM80__`
178+
* `__SYCL_TARGET_NVIDIA_GPU_SM86__`
179+
* `__SYCL_TARGET_NVIDIA_GPU_SM87__`
180+
* `__SYCL_TARGET_NVIDIA_GPU_SM89__`
181+
* `__SYCL_TARGET_NVIDIA_GPU_SM90__`
182+
* `__SYCL_TARGET_AMD_GPU_GFX700__`
183+
* `__SYCL_TARGET_AMD_GPU_GFX701__`
184+
* `__SYCL_TARGET_AMD_GPU_GFX702__`
185+
* `__SYCL_TARGET_AMD_GPU_GFX801__`
186+
* `__SYCL_TARGET_AMD_GPU_GFX802__`
187+
* `__SYCL_TARGET_AMD_GPU_GFX803__`
188+
* `__SYCL_TARGET_AMD_GPU_GFX805__`
189+
* `__SYCL_TARGET_AMD_GPU_GFX810__`
190+
* `__SYCL_TARGET_AMD_GPU_GFX900__`
191+
* `__SYCL_TARGET_AMD_GPU_GFX902__`
192+
* `__SYCL_TARGET_AMD_GPU_GFX904__`
193+
* `__SYCL_TARGET_AMD_GPU_GFX906__`
194+
* `__SYCL_TARGET_AMD_GPU_GFX908__`
195+
* `__SYCL_TARGET_AMD_GPU_GFX90A__`
196+
* `__SYCL_TARGET_AMD_GPU_GFX1010__`
197+
* `__SYCL_TARGET_AMD_GPU_GFX1011__`
198+
* `__SYCL_TARGET_AMD_GPU_GFX1012__`
199+
* `__SYCL_TARGET_AMD_GPU_GFX1013__`
200+
* `__SYCL_TARGET_AMD_GPU_GFX1030__`
201+
* `__SYCL_TARGET_AMD_GPU_GFX1031__`
202+
* `__SYCL_TARGET_AMD_GPU_GFX1032__`
123203

124204
If the user invokes the compiler driver with `-fsycl-targets=spir64_x86_64`,
125205
the compiler driver must predefine the following C++ macro name:
@@ -131,14 +211,14 @@ documented to users, and user code should not make use of them.
131211

132212
### Changes to the device headers
133213

134-
The device headers implement the [sycl\_ext\_intel\_device\_architecture][2]
214+
The device headers implement the [sycl\_ext\_oneapi\_device\_architecture][2]
135215
extension using these predefined macros and leverage `if constexpr` to discard
136216
statements in the "if" or "else" body when the device does not match one of the
137217
listed architectures. The following code snippet illustrates the technique:
138218

139219
```
140220
namespace sycl {
141-
namespace ext::intel::experimental {
221+
namespace ext::oneapi::exprimental {
142222
143223
enum class architecture {
144224
x86_64,
@@ -148,7 +228,7 @@ enum class architecture {
148228
// ...
149229
};
150230
151-
} // namespace ext::intel::experimental
231+
} // namespace ext::oneapi::exprimental
152232
153233
namespace detail {
154234
@@ -191,14 +271,14 @@ static constexpr bool is_aot_for_architecture[] = {
191271
192272
// Read the value of "is_allowable_aot_mode" via a template to defer triggering
193273
// static_assert() until template instantiation time.
194-
template<ext::intel::experimental::architecture... Archs>
274+
template<ext::oneapi::experimental::architecture... Archs>
195275
constexpr static bool allowable_aot_mode() {
196276
return is_allowable_aot_mode;
197277
}
198278
199279
// Tells if the current device has one of the architectures in the parameter
200280
// pack.
201-
template<ext::intel::experimental::architecture... Archs>
281+
template<ext::oneapi::experimental::architecture... Archs>
202282
constexpr static bool device_architecture_is() {
203283
return (is_aot_for_architecture[static_cast<int>(Archs)] || ...);
204284
}
@@ -212,7 +292,7 @@ constexpr static bool device_architecture_is() {
212292
template<bool MakeCall>
213293
class if_architecture_helper {
214294
public:
215-
template<ext::intel::experimental::architecture ...Archs, typename T,
295+
template<ext::oneapi::exprimental::architecture ...Archs, typename T,
216296
typename ...Args>
217297
constexpr auto else_if_architecture_is(T fnTrue, Args ...args) {
218298
if constexpr (MakeCall && device_architecture_is<Archs...>()) {
@@ -233,7 +313,7 @@ class if_architecture_helper {
233313
234314
} // namespace detail
235315
236-
namespace ext::intel::experimental {
316+
namespace ext::oneapi::exprimental {
237317
238318
template<architecture ...Archs, typename T, typename ...Args>
239319
constexpr static auto if_architecture_is(T fnTrue, Args ...args) {
@@ -249,16 +329,16 @@ constexpr static auto if_architecture_is(T fnTrue, Args ...args) {
249329
}
250330
}
251331
252-
} // namespace ext::intel::experimental
332+
} // namespace ext::oneapi::exprimental
253333
} // namespace sycl
254334
```
255335

256336
### Analysis of error checking for unsupported AOT modes
257337

258338
The header file code presented above triggers a `static_assert` if the
259339
`if_architecture_is` function is used in a translation unit that is compiled
260-
for an unsupported target. The only supported targets are `spir64_x86_64` and
261-
the new `intel_gpu_*` GPU device names.
340+
for an unsupported target. The supported targets are `spir64_x86_64`,
341+
the new `intel_gpu_*`, `nvidia_gpu_*` and `amd_gpu_*` GPU device names.
262342

263343
The error checking relies on the fact that the device compiler is invoked
264344
separately for each target listed in `-fsycl-target`. If any target is

0 commit comments

Comments
 (0)