Skip to content

[ESIMD][DOC][NFC] Deprecate DPAS type s1 and u1; clarify DPAS HW reqs #12412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
- [__regcall Calling convention](#__regcall-calling-convention)
- [Inline assembly](#inline-assembly)
- [Device aspect](#device-aspect)
- [Device queries and conditional dispatching of the code](#device-queries-and-conditional-dispatching-of-the-code)
- [Implementation restrictions](#implementation-restrictions)
- [Features not supported with the ESIMD extension](#features-not-supported-with-the-esimd-extension)
- [Unsupported standard SYCL APIs](#unsupported-standard-sycl-apis)
Expand Down Expand Up @@ -76,11 +77,15 @@ compiler, collectives just return the value in the single work-item. Another
consequence of the unit subgroup size is guaranteed independent forward
progress between work-items on many Intel GPU architecture generations.

## Explicit SIMD extension APIs
Explicit SIMD APIs must be executed on Intel graphics architecture devices.
Attempting to run such code on other devices will result in an error.

Also, most of ESIMD APIs require the corresponding HW support in the target GPU.
It is user's responsibility to manage corresponding compile- and/or runtime-checks to avoid
undefined behavior caused by using ESIMD APIs on GPUs that don't support it.
See [this section](#device-queries-and-conditional-dispatching-of-the-code) for device query/dispatch machanisms usable to avoid undefined behavior.

Explicit SIMD APIs can be used only in code to be executed on Intel graphics
architecture devices and the host device for now. Attempt to run such code on
other devices will result in error.
## Explicit SIMD extension APIs

All the ESIMD APIs are defined in the `sycl::ext::intel::esimd`
namespace.
Expand Down Expand Up @@ -653,6 +658,7 @@ See more details in the API documentation
### Dot Product Accumulate Systolic - `DPAS` API

DPAS is the matrix multiply-add-and-accumulate operation performed on limited size matrices/tiles.
This API requires XMX (Xe Matrix eXtension) to be supported by the target GPU.

The input and output matrix/tile dimensions are parametrizable to certain extent and depend on the element types of operands and the target device.
The operands and returns of DPAS API may require vertical or horizontal packing or unpacking. Please see [more details](#input-and-output-matrices-representation-as-simd-vectors) below.
Expand All @@ -667,8 +673,8 @@ As a member XMX (Xe Matrix eXtension) family of GPU operations it is included in
/// it is deducible from the element types of input matrices.
enum class dpas_argument_type {
Invalid = 0,
u1 = 1, // unsigned 1 bit
s1 = 2, // signed 1 bit
u1 = 1, // unsigned 1 bit: this type is reserved - not suppoted
s1 = 2, // signed 1 bit: this type is reserved - not suppoted
u2 = 3, // unsigned 2 bits
s2 = 4, // signed 2 bits
u4 = 5, // unsigned 4 bits
Expand Down Expand Up @@ -1119,6 +1125,22 @@ int main(void) {
more examples can be found in the
[ESIMD test suite](https://github.com/intel/llvm/tree/sycl/sycl/test-e2e/ESIMD) on github.

## Device queries and conditional dispatching of the code

ESIMD API provides access to low level GPU hardware API. At ESIMD program
compilation time it is not known what target device is going to be used to run the program.
The ESIMD programming model relies on the user to manage the corresponding compile- and/or
runtime-checks to prevent ESIMD API from running on a GPU that does not support the API.

One of the most trivial ways to manage such checks is to have them on the HOST. This variant
includes a) calling device detect query to understand what device is being used
b) depending on the device run one or another version of the kernel.
For example, [this test](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_int.cpp#L8) is designed to be run on DG2 and PVC even though those two devices have
different `execution sizes`. This is done via usage of [device query on the HOST](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L430) and a subsequent
call of the corresponding supported variant of the [DG2 kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L446) or [PVC kernel](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/ESIMD/dpas/dpas_common.hpp#L438).

There may also be JIT-compile time checks via usage of [specialization constants](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_specialization_constants) or [if_architecture_is](../../experimental/sycl_ext_oneapi_device_architecture.asciidoc) extension.

## Implementation restrictions

This section contains a list of the main restrictions that apply when using the ESIMD
Expand Down
22 changes: 11 additions & 11 deletions sycl/include/sycl/ext/intel/esimd/xmx/common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,17 @@ namespace ext::intel::esimd::xmx {
/// it is deducible from the element types of input matrices.
enum class dpas_argument_type {
Invalid = 0,
u1 = 1, // unsigned 1 bit
s1 = 2, // signed 1 bit
u2 = 3, // unsigned 2 bits
s2 = 4, // signed 2 bits
u4 = 5, // unsigned 4 bits
s4 = 6, // signed 4 bits
u8 = 7, // unsigned 8 bits
s8 = 8, // signed 8 bits
bf16 = 9, // bfloat 16
fp16 = 10, // half float
tf32 = 12, // tensorfloat 32
u1 __SYCL_DEPRECATED("u1 is reserved/unsupported") = 1, // unsigned 1 bit
s1 __SYCL_DEPRECATED("s1 is reserved/unsupported") = 2, // signed 1 bit
u2 = 3, // unsigned 2 bits
s2 = 4, // signed 2 bits
u4 = 5, // unsigned 4 bits
s4 = 6, // signed 4 bits
u8 = 7, // unsigned 8 bits
s8 = 8, // signed 8 bits
bf16 = 9, // bfloat 16
fp16 = 10, // half float
tf32 = 12, // tensorfloat 32
};

} // namespace ext::intel::esimd::xmx
Expand Down