@@ -1017,52 +1017,103 @@ is shown in a single column in the table below.
1017
1017
1018
1018
==== Intel XMX Supported Combinations
1019
1019
This is currently available in devices with the architecture
1020
- `architecture::intel_gpu_pvc`, `architecture::intel_gpu_dg2_g10`,
1021
- `architecture::intel_gpu_dg2_g11`, and
1022
- `architecture::intel_gpu_dg2_g12`.
1023
- In these architectures'
1024
- implementation, the type of the C matrix must be the same as the type
1025
- of the D matrix. Therefore, that common type is shown in a single
1026
- column in the table below.
1020
+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1021
+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_dg2_g10`,
1022
+ `architecture::intel_gpu_dg2_g11`, `architecture::intel_gpu_dg2_g12`,
1023
+ and `architecture::intel_gpu_arl_h`.
1027
1024
1028
1025
[frame="none",options="header"]
1029
1026
|======================
1030
- | A type | B type | C and D type | M | N | K | device
1027
+ | A type | B type | C type | D type | M | N | K | device
1031
1028
.2+| `matrix_type::uint8` .2+| `matrix_type::uint8` .2+|
1032
- `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
1033
- |`architecture::intel_gpu_pvc`
1029
+ `matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32
1030
+ |`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1031
+ `architecture::intel_gpu_lnl_m`
1034
1032
|8|`architecture::intel_gpu_dg2_g10,
1035
- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1033
+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1034
+ `architecture::intel_gpu_arl_h`
1036
1035
.2+| `matrix_type::uint8` .2+| `matrix_type::sint8` .2+|
1037
- `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1038
- `architecture::intel_gpu_pvc`
1036
+ `matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1037
+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1038
+ `architecture::intel_gpu_lnl_m`
1039
1039
|8|`architecture::intel_gpu_dg2_g10,
1040
- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1040
+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1041
+ `architecture::intel_gpu_arl_h`
1041
1042
.2+| `matrix_type::sint8` .2+| `matrix_type::uint8` .2+|
1042
- `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1043
- `architecture::intel_gpu_pvc`
1043
+ `matrix_type::sint32` .2+|`matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1044
+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1045
+ `architecture::intel_gpu_lnl_m`
1044
1046
|8|`architecture::intel_gpu_dg2_g10,
1045
- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1047
+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1048
+ `architecture::intel_gpu_arl_h`
1046
1049
.2+| `matrix_type::sint8` .2+| `matrix_type::sint8` .2+|
1047
- `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1048
- `architecture::intel_gpu_pvc`
1050
+ `matrix_type::sint32` .2+| `matrix_type::sint32` .2+| +<=+ 8 | 16 .2+| 32 |
1051
+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1052
+ `architecture::intel_gpu_lnl_m`
1049
1053
|8|`architecture::intel_gpu_dg2_g10,
1050
- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1051
- .2+|`matrix_type::fp16` .2+| `matrix_type::fp16` .2+|
1052
- `matrix_type::fp32` .2+| +<=+ 8 | 16 .2+| 16 |
1053
- `architecture::intel_gpu_pvc`
1054
- |8| `architecture::intel_gpu_dg2_g10,
1055
- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1056
- .6+| `matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1057
- `matrix_type::fp32` | 16 | 16 | 16 .4+|`architecture::intel_gpu_pvc`
1058
- | 1 | 64 | 16 | 32 | 64 | 16
1054
+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1055
+ `architecture::intel_gpu_arl_h`
1056
+ .8+|`matrix_type::fp16` .8+| `matrix_type::fp16` .8+|
1057
+ `matrix_type::fp32` .8+|`matrix_type::fp32` .1+| 16 .1+| 16 | 16
1058
+ .6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1059
+ `architecture::intel_gpu_lnl_m`
1060
+ .2+| 1 .2+| 64 | 16 |32
1061
+ .2+| 32 .2+| 64 | 16 |32
1062
+ .2+| +<=+ 8 | 16 .2+| 16
1063
+ |8 .2+| `architecture::intel_gpu_dg2_g10,
1064
+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1065
+ `architecture::intel_gpu_arl_h`
1066
+ .1+| 32 .1+| 32 .1+| 16
1067
+ .6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
1068
+ `matrix_type::fp16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16
1069
+ .6+| `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1070
+ `architecture::intel_gpu_lnl_m`
1071
+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1072
+ .2+| 32 .2+| 64 | 16 | 32
1073
+ .6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
1074
+ `matrix_type::fp32` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
1075
+ .6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1076
+ `architecture::intel_gpu_lnl_m`
1077
+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1078
+ .2+| 32 .2+| 64 |16 | 32
1079
+ .6+|`matrix_type::fp16` .6+| `matrix_type::fp16` .6+|
1080
+ `matrix_type::fp16` .6+|`matrix_type::fp16` .1+| +<=+ 8 | 16 .1+| 16
1081
+ .6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1082
+ `architecture::intel_gpu_lnl_m`
1083
+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 |32 .2+| 32 .2+| 64 | 16 | 32
1084
+ .8+| `matrix_type::bf16` .8+| `matrix_type::bf16` .8+|
1085
+ `matrix_type::fp32` .8+| `matrix_type::fp32` | 16 | 16 | 16
1086
+ .6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1087
+ `architecture::intel_gpu_lnl_m`
1088
+ .2+| 1 .2+| 64 | 16 | 32
1089
+ .2+| 32 .2+| 64 | 16 |32
1059
1090
.2+| +<=+ 8 | 16 .2+| 16
1060
1091
|8 .2+| `architecture::intel_gpu_dg2_g10,
1061
- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1092
+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1093
+ `architecture::intel_gpu_arl_h`
1062
1094
.1+| 32 .1+| 32 .1+| 16
1095
+ .6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1096
+ `matrix_type::bf16` .6+|`matrix_type::fp32` .1+| +<=+ 8 | 16 .1+| 16 .6+|
1097
+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1098
+ `architecture::intel_gpu_lnl_m`
1099
+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1100
+ .2+| 32 .2+| 64 |16 | 32
1101
+ .6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1102
+ `matrix_type::fp32` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
1103
+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1104
+ `architecture::intel_gpu_lnl_m`
1105
+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1106
+ .2+| 32 .2+| 64 |16 | 32
1107
+ .6+|`matrix_type::bf16` .6+| `matrix_type::bf16` .6+|
1108
+ `matrix_type::bf16` .6+|`matrix_type::bf16` .1+| +<=+ 8 | 16 .1+| 16 .6+|
1109
+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1110
+ `architecture::intel_gpu_lnl_m`
1111
+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1112
+ .2+| 32 .2+| 64 |16 | 32
1063
1113
| `matrix_type::tf32` | `matrix_type::tf32` |
1064
- `matrix_type::fp32` | +<=+ 8 | 16 | 8 |
1065
- `architecture::intel_gpu_pvc`
1114
+ `matrix_type::fp32` .2+| `matrix_type::fp32` | +<=+ 8 | 16 | 8 |
1115
+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1116
+ `architecture::intel_gpu_lnl_m`
1066
1117
|======================
1067
1118
1068
1119
==== Nvidia Tensor Cores Supported Combinations
0 commit comments