Skip to content

Commit b7858de

Browse files
authored
[SYCL][Doc][Joint Matrix] Add more devices support and more types and shape combinations (#15547)
This PR: 1. adds more types to PVC combinations, 2. adds more shapes to PVC combinations, 3. updates the matrix combinations appendix to includes 3 new devices, namely: Battlemage and Lunar Lake that share same matrix combinations as PVC, Arrow Lake H that shares the same matrix combinations as DG2 (Note that these new Intel GPU devices with matrix support have been added to https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc#new-enumeration-of-architectures)
1 parent de3a1db commit b7858de

File tree

2 files changed

+83
-32
lines changed

2 files changed

+83
-32
lines changed

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ an application can use to indicate that the matrix data is loaded or
113113
stored in VNNI "packed" format.
114114

115115
```c++
116-
namespace sycl::ext::oneapi::experimental::matrix::layout {
116+
namespace sycl::ext::oneapi::experimental::matrix {
117117

118118
enum class layout {
119119
ext_intel_packed

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

Lines changed: 82 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1017,52 +1017,103 @@ is shown in a single column in the table below.
10171017

10181018
==== Intel XMX Supported Combinations
10191019
This is currently available in devices with the architecture
1020-
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_dg2_g10`,
1021-
`architecture::intel_gpu_dg2_g11`, and
1022-
`architecture::intel_gpu_dg2_g12`.
1023-
In these architectures'
1024-
implementation, the type of the C matrix must be the same as the type
1025-
of the D matrix. Therefore, that common type is shown in a single
1026-
column in the table below.
1020+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1021+
`architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_dg2_g10`,
1022+
`architecture::intel_gpu_dg2_g11`, `architecture::intel_gpu_dg2_g12`,
1023+
and `architecture::intel_gpu_arl_h`.
10271024

10281025
[frame="none",options="header"]
10291026
|======================
1030-
| A type | B type | C and D type | M | N | K | device
1027+
| A type | B type | C type | D type | M | N | K | device
10311028
.2+| `matrix_type::uint8` .2+| `matrix_type::uint8` .2+|
1032-
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
1033-
|`architecture::intel_gpu_pvc`
1029+
`matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
1030+
|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1031+
`architecture::intel_gpu_lnl_m`
10341032
|8|`architecture::intel_gpu_dg2_g10,
1035-
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1033+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1034+
`architecture::intel_gpu_arl_h`
10361035
.2+| `matrix_type::uint8` .2+| `matrix_type::sint8` .2+|
1037-
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1038-
`architecture::intel_gpu_pvc`
1036+
`matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1037+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1038+
`architecture::intel_gpu_lnl_m`
10391039
|8|`architecture::intel_gpu_dg2_g10,
1040-
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1040+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1041+
`architecture::intel_gpu_arl_h`
10411042
.2+| `matrix_type::sint8` .2+| `matrix_type::uint8` .2+|
1042-
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1043-
`architecture::intel_gpu_pvc`
1043+
`matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1044+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1045+
`architecture::intel_gpu_lnl_m`
10441046
|8|`architecture::intel_gpu_dg2_g10,
1045-
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1047+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1048+
`architecture::intel_gpu_arl_h`
10461049
.2+| `matrix_type::sint8` .2+| `matrix_type::sint8` .2+|
1047-
`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1048-
`architecture::intel_gpu_pvc`
1050+
`matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1051+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1052+
`architecture::intel_gpu_lnl_m`
10491053
|8|`architecture::intel_gpu_dg2_g10,
1050-
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1051-
.2+|`matrix_type::fp16` .2+| `matrix_type::fp16` .2+|
1052-
`matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
1053-
`architecture::intel_gpu_pvc`
1054-
|8| `architecture::intel_gpu_dg2_g10,
1055-
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1056-
.6+| `matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1057-
`matrix_type::fp32` | 16 | 16 | 16 .4+|`architecture::intel_gpu_pvc`
1058-
| 1 | 64 | 16 | 32 | 64 | 16
1054+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1055+
`architecture::intel_gpu_arl_h`
1056+
.8+|`matrix_type::fp16` .8+| `matrix_type::fp16` .8+|
1057+
`matrix_type::fp32` .8+|`matrix_type::fp32` .1+| 16 .1+| 16 | 16
1058+
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1059+
`architecture::intel_gpu_lnl_m`
1060+
.2+| 1 .2+| 64 | 16 |32
1061+
.2+| 32 .2+| 64 | 16 |32
1062+
.2+| +<=+ 8 | 16 .2+| 16
1063+
|8 .2+| `architecture::intel_gpu_dg2_g10,
1064+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1065+
`architecture::intel_gpu_arl_h`
1066+
.1+| 32 .1+| 32 .1+| 16
1067+
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
1068+
`matrix_type::fp16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16
1069+
.6+| `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1070+
`architecture::intel_gpu_lnl_m`
1071+
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1072+
.2+| 32 .2+| 64 | 16 | 32
1073+
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
1074+
`matrix_type::fp32` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
1075+
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1076+
`architecture::intel_gpu_lnl_m`
1077+
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1078+
.2+| 32 .2+| 64 |16 | 32
1079+
.6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
1080+
`matrix_type::fp16` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
1081+
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1082+
`architecture::intel_gpu_lnl_m`
1083+
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 |32 .2+| 32 .2+| 64 | 16 | 32
1084+
.8+| `matrix_type::bf16` .8+| `matrix_type::bf16` .8+|
1085+
`matrix_type::fp32` .8+| `matrix_type::fp32` | 16 | 16 | 16
1086+
.6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1087+
`architecture::intel_gpu_lnl_m`
1088+
.2+| 1 .2+| 64 | 16 | 32
1089+
.2+| 32 .2+| 64 | 16 |32
10591090
.2+| +<=+ 8 | 16 .2+| 16
10601091
|8 .2+| `architecture::intel_gpu_dg2_g10,
1061-
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1092+
architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1093+
`architecture::intel_gpu_arl_h`
10621094
.1+| 32 .1+| 32 .1+| 16
1095+
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1096+
`matrix_type::bf16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16 .6+|
1097+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1098+
`architecture::intel_gpu_lnl_m`
1099+
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1100+
.2+| 32 .2+| 64 |16 | 32
1101+
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1102+
`matrix_type::fp32` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
1103+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1104+
`architecture::intel_gpu_lnl_m`
1105+
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1106+
.2+| 32 .2+| 64 |16 | 32
1107+
.6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1108+
`matrix_type::bf16` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
1109+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1110+
`architecture::intel_gpu_lnl_m`
1111+
| 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1112+
.2+| 32 .2+| 64 |16 | 32
10631113
| `matrix_type::tf32` | `matrix_type::tf32` |
1064-
`matrix_type::fp32` | +<=+ 8 | 16 | 8 |
1065-
`architecture::intel_gpu_pvc`
1114+
`matrix_type::fp32` .2+| `matrix_type::fp32` | +<=+ 8 | 16 | 8 |
1115+
`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1116+
`architecture::intel_gpu_lnl_m`
10661117
|======================
10671118

10681119
==== Nvidia Tensor Cores Supported Combinations

0 commit comments

Comments
 (0)