You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ESIMD][DOC][NFC] Deprecate DPAS type s1 and u1; clarify DPAS HW reqs
DPAS API is not emulated. It is user's responsibility to manage device
detection queries to avoid running DPAS on GPUs without XMX.
Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
-[Features not supported with the ESIMD extension](#features-not-supported-with-the-esimd-extension)
36
37
-[Unsupported standard SYCL APIs](#unsupported-standard-sycl-apis)
@@ -76,11 +77,16 @@ compiler, collectives just return the value in the single work-item. Another
76
77
consequence of the unit subgroup size is guaranteed independent forward
77
78
progress between work-items on many Intel GPU architecture generations.
78
79
79
-
## Explicit SIMD extension APIs
80
+
Explicit SIMD APIs can be used only in the code to be executed on Intel graphics
81
+
architecture devices. Attempt to run such code on other devices will result in
82
+
error.
83
+
84
+
Also, most of ESIMD APIs require the corresponding HW support in the target GPU.
85
+
It is user's responsibility to manage corresponding compile- and/or runtime-checks to avoid
86
+
undefined behavior caused by using ESIMD APIs on GPUs that don't support it.
87
+
See [this section](#device-queries-and-conditional-dispatching-of-the-code) for device query/dispatch machanisms usable to avoid undefined behavior.
80
88
81
-
Explicit SIMD APIs can be used only in code to be executed on Intel graphics
82
-
architecture devices and the host device for now. Attempt to run such code on
83
-
other devices will result in error.
89
+
## Explicit SIMD extension APIs
84
90
85
91
All the ESIMD APIs are defined in the `sycl::ext::intel::esimd`
86
92
namespace.
@@ -653,6 +659,7 @@ See more details in the API documentation
653
659
### Dot Product Accumulate Systolic - `DPAS` API
654
660
655
661
DPAS is the matrix multiply-add-and-accumulate operation performed on limited size matrices/tiles.
662
+
This API requires XMX (Xe Matrix eXtension) to be supported by the target GPU.
656
663
657
664
The input and output matrix/tile dimensions are parametrizable to certain extent and depend on the element types of operands and the target device.
658
665
The operands and returns of DPAS API may require vertical or horizontal packing or unpacking. Please see [more details](#input-and-output-matrices-representation-as-simd-vectors) below.
@@ -667,8 +674,8 @@ As a member XMX (Xe Matrix eXtension) family of GPU operations it is included in
667
674
/// it is deducible from the element types of input matrices.
668
675
enum class dpas_argument_type {
669
676
Invalid = 0,
670
-
u1 = 1, // unsigned 1 bit
671
-
s1 = 2, // signed 1 bit
677
+
u1 = 1, // unsigned 1 bit: this type is reserved - not suppoted
678
+
s1 = 2, // signed 1 bit: this type is reserved - not suppoted
672
679
u2 = 3, // unsigned 2 bits
673
680
s2 = 4, // signed 2 bits
674
681
u4 = 5, // unsigned 4 bits
@@ -1119,6 +1126,22 @@ int main(void) {
1119
1126
more examples can be found in the
1120
1127
[ESIMD test suite](https://github.com/intel/llvm/tree/sycl/sycl/test-e2e/ESIMD) on github.
1121
1128
1129
+
## Device queries and conditional dispatching of the code
1130
+
1131
+
ESIMD API provides the access to low level GPU hardware API. At the ESIMD program
1132
+
compilation time it is not known what target device is going to be used to run the program.
1133
+
ESIMD programming model relies on user to manage the corresponding compile- and/or
1134
+
runtime-checks that prevents ESIMD API from running on GPU that does not support such API.
1135
+
1136
+
One of the most trivial way to manage such checks is to have them on HOST. This variant
1137
+
includes a) calling device detect query to understand what device is being used
1138
+
b) depending on the device run one or another version of the kernel.
1139
+
For example, [this test](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_int.cpp#L8) is designed to be run on DG2 and PVC even though those two devices have
1140
+
different `execution size`. This is done via usage of [device query on HOST](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L430) and subsequent
1141
+
call of the corresponding supported variant of the [DG2 kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L446) or [PVC kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L438).
1142
+
1143
+
There may also be JIT-compile time checks via usage of [specialization constants](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_specialization_constants) or [if_architecture_is](../../experimental/sycl_ext_oneapi_device_architecture.asciidoc) extension.
1144
+
1122
1145
## Implementation restrictions
1123
1146
1124
1147
This section contains a list of the main restrictions that apply when using the ESIMD
0 commit comments