[SYCL][HIP] Support of AMD matrix core instructions #11485

mmoadeli · 2023-10-10T09:52:40Z

Support one block AMD matrix core instructions for __gfx90a__ architecture.
Supports __builtin_amdgcn_mfma_i32_32x32x8i8, __builtin_amdgcn_mfma_i32_16x16x16i8, __builtin_amdgcn_mfma_f64_16x16x4f64, __builtin_amdgcn_mfma_f32_32x32x8bf16_1k, __builtin_amdgcn_mfma_f32_16x16x16bf16_1k, __builtin_amdgcn_mfma_f32_32x32x8f16 and __builtin_amdgcn_mfma_f32_16x16x16f16 instructions.
Add HIP matrix core support into joint_matrix documentation.

Should be merged after

[SYCL][Matrix] syntax changes as preparation before moving joint matrix from experimental namespace #11215

experimental namespace As part of the effort to move joint matrix from experimental namespace to supported. A review of the API is being done as part of intel#7964. This results in the following changes in the syntax: 1- Add Td to joint_matrix_mad as Tc can be different from Td on the GPU, Now, we make D as an input argument to mad. 2- Change “packed” to ext_intel_packed: 3- Move EWOps (get_wi_data, wi_element, get_coord) to detail namespace) 4- add const to joint_matrix in store and mad 5 - add joint_matrix_copy/assignment function 6- add apply with coordination (change existing tests) 7- change get_coord vector type from int32_t to size_t 8- delete explicitly both = and copy ctor.

sycl/include/sycl/ext/oneapi/matrix/matrix-hip.hpp

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

sycl/test-e2e/Matrix/joint_matrix_hip_gfx90a.cpp

Use same code for `copy`, `fill` and `apply`. Remove `-DSYCL_EXT_ONEAPI_MATRIX_VERSION=4`

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

sycl/test-e2e/Matrix/joint_matrix_hip_half_gfx90a.cpp

…on arguments.

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

dkhaldi

LGTM

gmlueck

spec changes OK.

* Support one block AMD matrix core instructions for `__gfx90a__` architecture. * Supports `__builtin_amdgcn_mfma_i32_32x32x8i8`, `__builtin_amdgcn_mfma_i32_16x16x16i8`, `__builtin_amdgcn_mfma_f64_16x16x4f64`, `__builtin_amdgcn_mfma_f32_32x32x8bf16_1k`, `__builtin_amdgcn_mfma_f32_16x16x16bf16_1k`, `__builtin_amdgcn_mfma_f32_32x32x8f16` and `__builtin_amdgcn_mfma_f32_16x16x16f16` instructions. * Add HIP matrix core support into joint_matrix documentation. Should be merged after - #11215 --------- Co-authored-by: Bing1 Yu <[email protected]> Co-authored-by: mmoadeli <[email protected]>

yubingex007-a11y added 22 commits September 19, 2023 11:46

clang-format

5fbb285

fix typo: dest->dst

bf6cd56

fix testcase

b399041

fix mad bug

dae1ec6

fix cuda const joint_matrix_cuda

4ec8360

fix const issue of jm_store_cuda

a461cbb

fix const

5ff715b

lint

8ad7da9

address dounia's comments and roll back all the testcase changes

26ea49d

test changes: mov D in mad

a09a778

testcase changes: ext_intel_layout

821fa89

testcase changes: wi_data=>jm_apply

a3921b5

lint

ef1bc67

Merge remote-tracking branch 'intel_llvm/sycl' into jm_syntax

f395199

Merge remote-tracking branch 'intel_llvm/sycl' into jm_syntax

c71fee6

handle cuda testcase compfail

8f2f197

address dounia's comments

1411376

lint

95df3b1

rm sycl/test/matrix/query-use.cpp

fb1afdc

fix x jm_mad in joint_matrix_bf16_fill_k_cache_impl.hpp

11df531

Merge remote-tracking branch 'intel_llvm/sycl' into jm_syntax

a29e8f3

mmoadeli requested review from dkhaldi, YuriPlyakhin, yubingex007-a11y and a team as code owners October 10, 2023 09:52

mmoadeli requested review from aelovikov-intel and JackAKirk October 10, 2023 09:52

mmoadeli temporarily deployed to WindowsCILock October 10, 2023 10:08 — with GitHub Actions Inactive

mmoadeli temporarily deployed to WindowsCILock October 24, 2023 08:14 — with GitHub Actions Inactive