Skip to content

Commit 8f7aa58

Browse files
committed
[ESIMD][DOC][NFC] Deprecate DPAS type s1 and u1; clarify DPAS HW reqs
DPAS API is not emulated. It is user's responsibility to manage device detection queries to avoid running DPAS on GPUs without XMX. Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
1 parent 24255a5 commit 8f7aa58

File tree

2 files changed

+31
-8
lines changed

2 files changed

+31
-8
lines changed

sycl/doc/extensions/supported/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
- [__regcall Calling convention](#__regcall-calling-convention)
3232
- [Inline assembly](#inline-assembly)
3333
- [Device aspect](#device-aspect)
34+
- [Device queries and conditional dispatching of the code](#device-queries-and-conditional-dispatching-of-the-code)
3435
- [Implementation restrictions](#implementation-restrictions)
3536
- [Features not supported with the ESIMD extension](#features-not-supported-with-the-esimd-extension)
3637
- [Unsupported standard SYCL APIs](#unsupported-standard-sycl-apis)
@@ -76,11 +77,16 @@ compiler, collectives just return the value in the single work-item. Another
7677
consequence of the unit subgroup size is guaranteed independent forward
7778
progress between work-items on many Intel GPU architecture generations.
7879

79-
## Explicit SIMD extension APIs
80+
Explicit SIMD APIs can be used only in the code to be executed on Intel graphics
81+
architecture devices. Attempt to run such code on other devices will result in
82+
error.
83+
84+
Also, most of ESIMD APIs require the corresponding HW support in the target GPU.
85+
It is user's responsibility to manage corresponding compile- and/or runtime-checks to avoid
86+
undefined behavior caused by using ESIMD APIs on GPUs that don't support it.
87+
See [this section](#device-queries-and-conditional-dispatching-of-the-code) for device query/dispatch machanisms usable to avoid undefined behavior.
8088

81-
Explicit SIMD APIs can be used only in code to be executed on Intel graphics
82-
architecture devices and the host device for now. Attempt to run such code on
83-
other devices will result in error.
89+
## Explicit SIMD extension APIs
8490

8591
All the ESIMD APIs are defined in the `sycl::ext::intel::esimd`
8692
namespace.
@@ -653,6 +659,7 @@ See more details in the API documentation
653659
### Dot Product Accumulate Systolic - `DPAS` API
654660
655661
DPAS is the matrix multiply-add-and-accumulate operation performed on limited size matrices/tiles.
662+
This API requires XMX (Xe Matrix eXtension) to be supported by the target GPU.
656663
657664
The input and output matrix/tile dimensions are parametrizable to certain extent and depend on the element types of operands and the target device.
658665
The operands and returns of DPAS API may require vertical or horizontal packing or unpacking. Please see [more details](#input-and-output-matrices-representation-as-simd-vectors) below.
@@ -667,8 +674,8 @@ As a member XMX (Xe Matrix eXtension) family of GPU operations it is included in
667674
/// it is deducible from the element types of input matrices.
668675
enum class dpas_argument_type {
669676
Invalid = 0,
670-
u1 = 1, // unsigned 1 bit
671-
s1 = 2, // signed 1 bit
677+
u1 = 1, // unsigned 1 bit: this type is reserved - not suppoted
678+
s1 = 2, // signed 1 bit: this type is reserved - not suppoted
672679
u2 = 3, // unsigned 2 bits
673680
s2 = 4, // signed 2 bits
674681
u4 = 5, // unsigned 4 bits
@@ -1119,6 +1126,22 @@ int main(void) {
11191126
more examples can be found in the
11201127
[ESIMD test suite](https://github.com/intel/llvm/tree/sycl/sycl/test-e2e/ESIMD) on github.
11211128

1129+
## Device queries and conditional dispatching of the code
1130+
1131+
ESIMD API provides the access to low level GPU hardware API. At the ESIMD program
1132+
compilation time it is not known what target device is going to be used to run the program.
1133+
ESIMD programming model relies on user to manage the corresponding compile- and/or
1134+
runtime-checks that prevents ESIMD API from running on GPU that does not support such API.
1135+
1136+
One of the most trivial way to manage such checks is to have them on HOST. This variant
1137+
includes a) calling device detect query to understand what device is being used
1138+
b) depending on the device run one or another version of the kernel.
1139+
For example, [this test](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_int.cpp#L8) is designed to be run on DG2 and PVC even though those two devices have
1140+
different `execution size`. This is done via usage of [device query on HOST](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L430) and subsequent
1141+
call of the corresponding supported variant of the [DG2 kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L446) or [PVC kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L438).
1142+
1143+
There may also be JIT-compile time checks via usage of [specialization constants](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_specialization_constants) or [if_architecture_is](../../experimental/sycl_ext_oneapi_device_architecture.asciidoc) extension.
1144+
11221145
## Implementation restrictions
11231146

11241147
This section contains a list of the main restrictions that apply when using the ESIMD

sycl/include/sycl/ext/intel/esimd/xmx/common.hpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ namespace ext::intel::esimd::xmx {
2121
/// it is deducible from the element types of input matrices.
2222
enum class dpas_argument_type {
2323
Invalid = 0,
24-
u1 = 1, // unsigned 1 bit
25-
s1 = 2, // signed 1 bit
24+
u1 __SYCL_DEPRECATED("u1 is reserved/unsupported") = 1, // unsigned 1 bit
25+
s1 __SYCL_DEPRECATED("s1 is reserved/unsupported") = 2, // signed 1 bit
2626
u2 = 3, // unsigned 2 bits
2727
s2 = 4, // signed 2 bits
2828
u4 = 5, // unsigned 4 bits

0 commit comments

Comments
 (0)