You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ESIMD][DOC][NFC] Deprecate DPAS type s1 and u1; clarify DPAS HW reqs (#12412)
DPAS API is not emulated. It is user's responsibility to manage device
detection queries to avoid running DPAS on GPUs without XMX.
---------
Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
-[Features not supported with the ESIMD extension](#features-not-supported-with-the-esimd-extension)
36
37
-[Unsupported standard SYCL APIs](#unsupported-standard-sycl-apis)
@@ -76,11 +77,15 @@ compiler, collectives just return the value in the single work-item. Another
76
77
consequence of the unit subgroup size is guaranteed independent forward
77
78
progress between work-items on many Intel GPU architecture generations.
78
79
79
-
## Explicit SIMD extension APIs
80
+
Explicit SIMD APIs must be executed on Intel graphics architecture devices.
81
+
Attempting to run such code on other devices will result in an error.
82
+
83
+
Also, most of ESIMD APIs require the corresponding HW support in the target GPU.
84
+
It is user's responsibility to manage corresponding compile- and/or runtime-checks to avoid
85
+
undefined behavior caused by using ESIMD APIs on GPUs that don't support it.
86
+
See [this section](#device-queries-and-conditional-dispatching-of-the-code) for device query/dispatch machanisms usable to avoid undefined behavior.
80
87
81
-
Explicit SIMD APIs can be used only in code to be executed on Intel graphics
82
-
architecture devices and the host device for now. Attempt to run such code on
83
-
other devices will result in error.
88
+
## Explicit SIMD extension APIs
84
89
85
90
All the ESIMD APIs are defined in the `sycl::ext::intel::esimd`
86
91
namespace.
@@ -653,6 +658,7 @@ See more details in the API documentation
653
658
### Dot Product Accumulate Systolic - `DPAS` API
654
659
655
660
DPAS is the matrix multiply-add-and-accumulate operation performed on limited size matrices/tiles.
661
+
This API requires XMX (Xe Matrix eXtension) to be supported by the target GPU.
656
662
657
663
The input and output matrix/tile dimensions are parametrizable to certain extent and depend on the element types of operands and the target device.
658
664
The operands and returns of DPAS API may require vertical or horizontal packing or unpacking. Please see [more details](#input-and-output-matrices-representation-as-simd-vectors) below.
@@ -667,8 +673,8 @@ As a member XMX (Xe Matrix eXtension) family of GPU operations it is included in
667
673
/// it is deducible from the element types of input matrices.
668
674
enum class dpas_argument_type {
669
675
Invalid = 0,
670
-
u1 = 1, // unsigned 1 bit
671
-
s1 = 2, // signed 1 bit
676
+
u1 = 1, // unsigned 1 bit: this type is reserved - not suppoted
677
+
s1 = 2, // signed 1 bit: this type is reserved - not suppoted
672
678
u2 = 3, // unsigned 2 bits
673
679
s2 = 4, // signed 2 bits
674
680
u4 = 5, // unsigned 4 bits
@@ -1119,6 +1125,22 @@ int main(void) {
1119
1125
more examples can be found in the
1120
1126
[ESIMD test suite](https://github.com/intel/llvm/tree/sycl/sycl/test-e2e/ESIMD) on github.
1121
1127
1128
+
## Device queries and conditional dispatching of the code
1129
+
1130
+
ESIMD API provides access to low level GPU hardware API. At ESIMD program
1131
+
compilation time it is not known what target device is going to be used to run the program.
1132
+
The ESIMD programming model relies on the user to manage the corresponding compile- and/or
1133
+
runtime-checks to prevent ESIMD API from running on a GPU that does not support the API.
1134
+
1135
+
One of the most trivial ways to manage such checks is to have them on the HOST. This variant
1136
+
includes a) calling device detect query to understand what device is being used
1137
+
b) depending on the device run one or another version of the kernel.
1138
+
For example, [this test](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_int.cpp#L8) is designed to be run on DG2 and PVC even though those two devices have
1139
+
different `execution sizes`. This is done via usage of [device query on the HOST](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L430) and a subsequent
1140
+
call of the corresponding supported variant of the [DG2 kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L446) or [PVC kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L438).
1141
+
1142
+
There may also be JIT-compile time checks via usage of [specialization constants](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_specialization_constants) or [if_architecture_is](../../experimental/sycl_ext_oneapi_device_architecture.asciidoc) extension.
1143
+
1122
1144
## Implementation restrictions
1123
1145
1124
1146
This section contains a list of the main restrictions that apply when using the ESIMD
0 commit comments