Skip to content

Commit d4f4f7c

Browse files
authored
[ESIMD][DOC][NFC] Deprecate DPAS type s1 and u1; clarify DPAS HW reqs (#12412)
DPAS API is not emulated. It is user's responsibility to manage device detection queries to avoid running DPAS on GPUs without XMX. --------- Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
1 parent 193b4bc commit d4f4f7c

File tree

2 files changed

+39
-17
lines changed

2 files changed

+39
-17
lines changed

sycl/doc/extensions/supported/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
- [__regcall Calling convention](#__regcall-calling-convention)
3232
- [Inline assembly](#inline-assembly)
3333
- [Device aspect](#device-aspect)
34+
- [Device queries and conditional dispatching of the code](#device-queries-and-conditional-dispatching-of-the-code)
3435
- [Implementation restrictions](#implementation-restrictions)
3536
- [Features not supported with the ESIMD extension](#features-not-supported-with-the-esimd-extension)
3637
- [Unsupported standard SYCL APIs](#unsupported-standard-sycl-apis)
@@ -76,11 +77,15 @@ compiler, collectives just return the value in the single work-item. Another
7677
consequence of the unit subgroup size is guaranteed independent forward
7778
progress between work-items on many Intel GPU architecture generations.
7879

79-
## Explicit SIMD extension APIs
80+
Explicit SIMD APIs must be executed on Intel graphics architecture devices.
81+
Attempting to run such code on other devices will result in an error.
82+
83+
Also, most of ESIMD APIs require the corresponding HW support in the target GPU.
84+
It is user's responsibility to manage corresponding compile- and/or runtime-checks to avoid
85+
undefined behavior caused by using ESIMD APIs on GPUs that don't support it.
86+
See [this section](#device-queries-and-conditional-dispatching-of-the-code) for device query/dispatch machanisms usable to avoid undefined behavior.
8087

81-
Explicit SIMD APIs can be used only in code to be executed on Intel graphics
82-
architecture devices and the host device for now. Attempt to run such code on
83-
other devices will result in error.
88+
## Explicit SIMD extension APIs
8489

8590
All the ESIMD APIs are defined in the `sycl::ext::intel::esimd`
8691
namespace.
@@ -653,6 +658,7 @@ See more details in the API documentation
653658
### Dot Product Accumulate Systolic - `DPAS` API
654659
655660
DPAS is the matrix multiply-add-and-accumulate operation performed on limited size matrices/tiles.
661+
This API requires XMX (Xe Matrix eXtension) to be supported by the target GPU.
656662
657663
The input and output matrix/tile dimensions are parametrizable to certain extent and depend on the element types of operands and the target device.
658664
The operands and returns of DPAS API may require vertical or horizontal packing or unpacking. Please see [more details](#input-and-output-matrices-representation-as-simd-vectors) below.
@@ -667,8 +673,8 @@ As a member XMX (Xe Matrix eXtension) family of GPU operations it is included in
667673
/// it is deducible from the element types of input matrices.
668674
enum class dpas_argument_type {
669675
Invalid = 0,
670-
u1 = 1, // unsigned 1 bit
671-
s1 = 2, // signed 1 bit
676+
u1 = 1, // unsigned 1 bit: this type is reserved - not suppoted
677+
s1 = 2, // signed 1 bit: this type is reserved - not suppoted
672678
u2 = 3, // unsigned 2 bits
673679
s2 = 4, // signed 2 bits
674680
u4 = 5, // unsigned 4 bits
@@ -1119,6 +1125,22 @@ int main(void) {
11191125
more examples can be found in the
11201126
[ESIMD test suite](https://github.com/intel/llvm/tree/sycl/sycl/test-e2e/ESIMD) on github.
11211127

1128+
## Device queries and conditional dispatching of the code
1129+
1130+
ESIMD API provides access to low level GPU hardware API. At ESIMD program
1131+
compilation time it is not known what target device is going to be used to run the program.
1132+
The ESIMD programming model relies on the user to manage the corresponding compile- and/or
1133+
runtime-checks to prevent ESIMD API from running on a GPU that does not support the API.
1134+
1135+
One of the most trivial ways to manage such checks is to have them on the HOST. This variant
1136+
includes a) calling device detect query to understand what device is being used
1137+
b) depending on the device run one or another version of the kernel.
1138+
For example, [this test](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_int.cpp#L8) is designed to be run on DG2 and PVC even though those two devices have
1139+
different `execution sizes`. This is done via usage of [device query on the HOST](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L430) and a subsequent
1140+
call of the corresponding supported variant of the [DG2 kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L446) or [PVC kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L438).
1141+
1142+
There may also be JIT-compile time checks via usage of [specialization constants](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_specialization_constants) or [if_architecture_is](../../experimental/sycl_ext_oneapi_device_architecture.asciidoc) extension.
1143+
11221144
## Implementation restrictions
11231145

11241146
This section contains a list of the main restrictions that apply when using the ESIMD

sycl/include/sycl/ext/intel/esimd/xmx/common.hpp

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,17 @@ namespace ext::intel::esimd::xmx {
2121
/// it is deducible from the element types of input matrices.
2222
enum class dpas_argument_type {
2323
Invalid = 0,
24-
u1 = 1, // unsigned 1 bit
25-
s1 = 2, // signed 1 bit
26-
u2 = 3, // unsigned 2 bits
27-
s2 = 4, // signed 2 bits
28-
u4 = 5, // unsigned 4 bits
29-
s4 = 6, // signed 4 bits
30-
u8 = 7, // unsigned 8 bits
31-
s8 = 8, // signed 8 bits
32-
bf16 = 9, // bfloat 16
33-
fp16 = 10, // half float
34-
tf32 = 12, // tensorfloat 32
24+
u1 __SYCL_DEPRECATED("u1 is reserved/unsupported") = 1, // unsigned 1 bit
25+
s1 __SYCL_DEPRECATED("s1 is reserved/unsupported") = 2, // signed 1 bit
26+
u2 = 3, // unsigned 2 bits
27+
s2 = 4, // signed 2 bits
28+
u4 = 5, // unsigned 4 bits
29+
s4 = 6, // signed 4 bits
30+
u8 = 7, // unsigned 8 bits
31+
s8 = 8, // signed 8 bits
32+
bf16 = 9, // bfloat 16
33+
fp16 = 10, // half float
34+
tf32 = 12, // tensorfloat 32
3535
};
3636

3737
} // namespace ext::intel::esimd::xmx

0 commit comments

Comments
 (0)