Skip to content

[SYCL][HIP] Support of AMD matrix core instructions #11485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
Oct 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
b68aead
[Matrix] syntax changes as prepraration before moving joint matrix from
yubingex007-a11y Sep 19, 2023
5fbb285
clang-format
yubingex007-a11y Sep 19, 2023
bf6cd56
fix typo: dest->dst
yubingex007-a11y Sep 19, 2023
b399041
fix testcase
yubingex007-a11y Sep 19, 2023
dae1ec6
fix mad bug
yubingex007-a11y Sep 19, 2023
4ec8360
fix cuda const joint_matrix_cuda
yubingex007-a11y Sep 19, 2023
a461cbb
fix const issue of jm_store_cuda
yubingex007-a11y Sep 19, 2023
5ff715b
fix const
yubingex007-a11y Sep 19, 2023
8ad7da9
lint
yubingex007-a11y Sep 19, 2023
26ea49d
address dounia's comments and roll back all the testcase changes
yubingex007-a11y Sep 21, 2023
a09a778
test changes: mov D in mad
yubingex007-a11y Sep 21, 2023
821fa89
testcase changes: ext_intel_layout
yubingex007-a11y Sep 21, 2023
a3921b5
testcase changes: wi_data=>jm_apply
yubingex007-a11y Sep 21, 2023
ef1bc67
lint
yubingex007-a11y Sep 21, 2023
f395199
Merge remote-tracking branch 'intel_llvm/sycl' into jm_syntax
yubingex007-a11y Sep 21, 2023
c71fee6
Merge remote-tracking branch 'intel_llvm/sycl' into jm_syntax
yubingex007-a11y Sep 22, 2023
8f2f197
handle cuda testcase compfail
yubingex007-a11y Sep 22, 2023
1411376
address dounia's comments
yubingex007-a11y Sep 22, 2023
95df3b1
lint
yubingex007-a11y Sep 22, 2023
fb1afdc
rm sycl/test/matrix/query-use.cpp
yubingex007-a11y Sep 22, 2023
11df531
fix x jm_mad in joint_matrix_bf16_fill_k_cache_impl.hpp
yubingex007-a11y Sep 25, 2023
a29e8f3
Merge remote-tracking branch 'intel_llvm/sycl' into jm_syntax
yubingex007-a11y Oct 9, 2023
a821107
address comments
yubingex007-a11y Oct 11, 2023
3f1b575
Merge remote-tracking branch 'intel_llvm/sycl' into jm_syntax
yubingex007-a11y Oct 11, 2023
1d091de
rm element_wise_irreg_sum_rows_impl.hpp
yubingex007-a11y Oct 11, 2023
1e20968
small fix
yubingex007-a11y Oct 11, 2023
1fe7fcd
small fix
yubingex007-a11y Oct 11, 2023
b5c0911
* Support one block AMD matrix core instructions
Oct 1, 2023
f6b2cb3
* Update matrix core support into joint_matrix documentation.
Oct 10, 2023
919884b
- Fix adding `gpu-amd-gfx90a` as available feature.
Oct 10, 2023
02bec23
Fix a missing variable name change.
Oct 11, 2023
3c460af
Add decoration type for call to get_multi_ptr function.
Oct 11, 2023
42e0c62
Update use cases of `mad` to have variables holding result of `mad` a…
Oct 12, 2023
be7a90f
Implement joint_matrix_copy for HIP(gfx90a) backend.
Oct 12, 2023
3d1237a
std::ignore unused input parameters.
Oct 12, 2023
ad7b8cd
std::ignore unused `sg` parameters in`joint_matrix_store`.
Oct 12, 2023
1d46e33
Merge branch 'sycl' into amd-joint-matrix
Oct 12, 2023
4231a1c
Fix AMD `joint_matrix_copy` function.
Oct 12, 2023
1a59580
- Fix joint_matrix_hip_copy.
Oct 13, 2023
fc31965
Remove curly braces for initialization of joint_matrix.
Oct 13, 2023
8bba0fb
- Use sycl::marray as container for jont_matrix data.
Oct 13, 2023
a152eda
Modify `joint_matrix_apply` test.
Oct 13, 2023
b44a76c
Update allow difference after using matrix random input.
Oct 13, 2023
66131d7
Fix `hip_matrix_copy`.
Oct 13, 2023
f719779
- Improve hip mfma tests to support matrices of multiple of K size.
Oct 22, 2023
7aa4ce3
Fix call to `copy` and `fill` for hip joint matrix.
Oct 23, 2023
3d13484
Fix reference mma calculation.
Oct 23, 2023
af5cc07
Rename `cuda_impl` and `hip_impl` member of `joint_matrix` to `matrix…
Oct 23, 2023
91cbab7
Merge branch 'sycl' into amd-joint-matrix
Oct 23, 2023
c6a3cee
Replace `cuda_impl` with `matrix_impl`
Oct 23, 2023
2f6885f
Rename `data` in `joint_matrix_hip` with `wi_marray`.
Oct 24, 2023
defd874
Improve `joint_matrix_copy` by avoiding the loop.
Oct 24, 2023
89c52d7
Add a missing `comma` to the test.
Oct 24, 2023
abfa2ab
Remove `-DSYCL_EXT_ONEAPI_MATRIX_VERSION=4` from AMD matrix compilati…
Oct 24, 2023
048ac39
Guard including `matrix-hip.hpp`
Oct 24, 2023
05d2e9d
Minor macro readability improvement.
Oct 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ specification.*
This extension is currently implemented in {dpcpp} only for devices
that contain a matrix hardware, specifically Intel(R) Advanced Matrix
Extensions (Intel(R) AMX), Intel(R) Xe Matrix Extensions (Intel(R)
XMX) and Nvidia(R) Tensor Cores.
XMX), Nvidia(R) Tensor Cores and AMD Matrix Cores(R).

The `joint_matrix` type and the `joint_matrix_mad` function are
optional kernel features as defined in section 5.7 of the core SYCL
Expand All @@ -67,8 +67,8 @@ implementation throws a synchronous exception with the

== Overview
Joint matrix is a SYCL extension for matrix hardware programming. It
unifies targets like Intel AMX in CPUs, Intel XMX in Intel GPUs and
Nvidia Tensor Cores. This provides a portable and performant API for
unifies targets like Intel AMX in CPUs, Intel XMX in Intel GPUs,
Nvidia Tensor Cores and AMD Matrix Cores(R). This provides a portable and performant API for
users who want to build their own neural networks applications,
perform custom optimizations, or experiment with new operations in a
timely and performing manner.
Expand Down Expand Up @@ -921,7 +921,8 @@ the type of the A matrix must be the same as the type of the B
matrix.

IMPORTANT: When compiling for the `ext_oneapi_cuda` backend the target
arch backend flag, `-Xsycl-target-backend --cuda-gpu-arch=sm_xx`, must
arch backend flag, `-fsycl-targets=nvidia_gpu_sm_xx`
(or equivalents, e.g. `-Xsycl-target-backend --cuda-gpu-arch=sm_xx`), must
be used, where `sm_xx` must be a Compute Capability that is equal to
or greater than the appropriate Minimum Compute Capability. When an
executable has been compiled for `sm_xx`, if the executable is run on
Expand Down Expand Up @@ -971,6 +972,34 @@ multiple of 4 when `T` is `float`; where `T` is the type of the
`joint_matrix` elements. When `T` is not `half` or `float` there are
no restrictions to `stride`.

==== AMD Matrix Cores Supported Combinations
The complete set of matrix data types and dimensions that are supported by
the `ext_oneapi_hip` backend are represented in the following
table. In this architecture's implementation, A and B matrices must have the same type.
Similarly, C and D matrices must share the same type.

IMPORTANT: The supported instructions may be run on GFX90A (MI200, MI210, MI250 and MI250X GPUs)
architecture. When compiling for the `ext_oneapi_hip` backend the
target arch backend flag, `-fsycl-targets=amd_gpu_gfx90a`, must
be used. An attempt to run the compiled code on an unsupported architecture will throw an error.


[frame="none",options="header"]
|======================
| A and B type | C and D type | M | N | K
.2+| `matrix_type::fp16` .2+| `matrix_type::fp32`
|32 |32 |8
|16 |16 |16
.2+| `matrix_type::sint8` .2+| `matrix_type::sint32`
|32 |32 |8
|16 |16 |16
.2+|`matrix_type::bf16` .2+|`matrix_type::fp32`
|32 |32 |8
|16 |16 |16
.1+|`matrix_type::fp64` .1+| `matrix_type::fp64`
|16 |16 |4
|======================

=== Revision History

[frame="none",options="header"]
Expand All @@ -990,4 +1019,5 @@ the Intel-specifics to a separate extension document
type, runtime query, and supported combinations appendix for Intel AMX
and Intel XMX
|7 |2023-04-11 |Jack Kirk |Add Nvidia Tensor Cores supported combinations
|8 |2023-10-05 |Mahmoud Moadeli |Add AMD Matrix Core supported combinations
|======================
8 changes: 5 additions & 3 deletions sycl/include/sycl/detail/defines.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,11 @@
#define __SYCL_TYPE(x)
#endif

// joint matrix should only be included by default for SPIR or NVPTX backends
#if defined __SPIR__ || defined __NVPTX__ || !defined __SYCL_DEVICE_ONLY__
// joint matrix should only be included by default for SPIR, NVPTX or HIP(GFX90A
// only) backends
#if defined __SPIR__ || defined __NVPTX__ || !defined __SYCL_DEVICE_ONLY__ || \
defined __gfx90a__
#ifndef SYCL_EXT_ONEAPI_MATRIX_VERSION
#define SYCL_EXT_ONEAPI_MATRIX_VERSION 4
#endif // SYCL_EXT_ONEAPI_MATRIX_VERSION
#endif // __SPIR__ || __NVPTX__ || !__SYCL_DEVICE_ONLY
#endif // __SPIR__ || __NVPTX__ || !__SYCL_DEVICE_ONLY || __gfx90a__
Loading