[Matrix][SYCL] Add use argument for joint_matrix and add another feat… #5835

yubingex007-a11y · 2022-03-18T08:46:09Z

…ure macro for it

yubingex007-a11y · 2022-03-23T02:54:46Z

the remaining fail in Lit is because "File /__w/llvm/llvm/build/bin/../include/sycl/ext/oneapi/matrix/matrix-tensorcore.hpp Line 21: no member named 'dynamic_extent' in namespace 'sycl'"

it's strange since i saw "inline constexpr size_t dynamic_extent = SIZE_MAX;" in sycl/include/CL/sycl/sycl_span.hpp

yubingex007-a11y · 2022-03-23T13:16:05Z

sycl/include/CL/sycl/feature_test.hpp.in

@@ -38,7 +38,7 @@ namespace sycl {
 // 2- provides JIT implementation (target agnostic) for the
 // experimental matrix extension
 #ifndef SYCL_EXT_ONEAPI_MATRIX
-#define SYCL_EXT_ONEAPI_MATRIX 2
+#define SYCL_EXT_ONEAPI_MATRIX 3


change to 2

yubingex007-a11y · 2022-03-23T13:19:05Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

+      joint_matrix<Tb, defaultK, defaultN, matrix_layout::packed_b,
+                   matrix_use::unnecessary, Group>;
+  template <typename Group>
+  using joint_matrix_c =


give correct template param

dkhaldi

Besides my other comments, please add a test case that uses "use".

dkhaldi · 2022-03-24T18:41:02Z

sycl/include/CL/__spirv/spirv_ops.hpp

-    __spv::__spirv_JointMatrixINTEL<T2, K, N, LB, S> *B,
-    __spv::__spirv_JointMatrixINTEL<T3, M, N, LC, S> *C,
+    __spv::__spirv_JointMatrixINTEL<T1, M, K, LA, UA, S> *A,
+    __spv::__spirv_JointMatrixINTEL<T1, K, N, LB, UB, S> *B,


this should be T2, right?

dkhaldi · 2022-03-24T18:43:09Z

sycl/include/CL/__spirv/spirv_ops.hpp

-    __spv::__spirv_JointMatrixINTEL<T2, K, N, LB, S> *B,
-    __spv::__spirv_JointMatrixINTEL<T3, M, N, LC, S> *C,
+    __spv::__spirv_JointMatrixINTEL<T1, M, K, LA, UA, S> *A,
+    __spv::__spirv_JointMatrixINTEL<T1, K, N, LB, UB, S> *B,


dkhaldi · 2022-03-24T18:43:45Z

sycl/include/CL/__spirv/spirv_ops.hpp

-    __spv::__spirv_JointMatrixINTEL<T2, K, N, LB, S> *B,
-    __spv::__spirv_JointMatrixINTEL<T3, M, N, LC, S> *C,
+    __spv::__spirv_JointMatrixINTEL<T1, M, K, LA, UA, S> *A,
+    __spv::__spirv_JointMatrixINTEL<T1, K, N, LB, UB, S> *B,


dkhaldi · 2022-03-24T18:55:48Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+namespace oneapi {
+namespace experimental::matrix {
+
+enum class matrix_layout { row_major, col_major, packed_a, packed_b };


do we need packed_a, packed_b here?
can you make the change to just packed?

dkhaldi · 2022-03-24T18:59:21Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+enum class matrix_use { matrix_a, matrix_b, accumulator, unnecessary };
+
+template <matrix_use Use> struct spv_matrix_use_traits {
+  static constexpr __spv::MatrixUse value = __spv::MatrixUse::MatrixA;


If only the specialization on line 47 is the one really used, this should be empty, right?

dkhaldi · 2022-03-24T19:04:07Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit.hpp

+enum class matrix_use { matrix_a, matrix_b, accumulator, unnecessary };
+
+template <matrix_use Use> struct spv_matrix_use_traits {
+  static constexpr __spv::MatrixUse value = __spv::MatrixUse::MatrixA;


same as above, this should be empty.

you mean we should remove the default value. but we have default value for matrix_layout's traits

dkhaldi · 2022-03-24T19:05:20Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

@@ -0,0 +1,435 @@
+//===-------------- static-query.hpp - SYCL matrix ------------*- C++ -*---===//


static-query-use.hpp

did you correct this?

sycl/include/sycl/ext/oneapi/matrix/matrix.hpp

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

dkhaldi · 2022-04-04T15:19:31Z

sycl/test/matrix/matrix-int8-test-use.cpp

-         nd_range<2>({NDRangeM, NDRangeN}, {1, 1}),
-         [ accA, accB, accC, M, N, K ](nd_item<2> spmd_item)
-             [[intel::reqd_sub_group_size(1)]]
+         nd_range<2>({NDRangeM, NDRangeN * SG_SZ}, {1, 1 * SG_SZ}),


SG size attribute is missing in this test.
Please start from a test that uses explicitly "SG size attribute" and just delete the AMX test.

yubingex007-a11y · 2022-04-06T08:22:15Z

@bader it seems the fail in Jenkins/Precommit is common issue, could you verify it?

dkhaldi

LGTM

yubingex007-a11y · 2022-04-08T14:15:54Z

ping? @intel/llvm-reviewers-runtime

steffenlarsen · 2022-04-08T17:04:08Z

Generally I think it looks good, but why do most of tests have XFAIL: *?

keryell · 2022-04-09T01:59:31Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+namespace sycl {
+namespace ext {
+namespace oneapi {
+namespace experimental::matrix {


While you are here

Suggested change

namespace sycl {

namespace ext {

namespace oneapi {

namespace experimental::matrix {

namespace sycl::ext::oneapi::experimental::matrix {

dkhaldi · 2022-04-12T15:51:21Z

@JackAKirk, in this PR, we add the use argument to transition from using required layout (especially packed_b) to "use" argument. This will make it possible hopefully to remove the duplicated code in the tensorcores implementation. I don't think we will merge the two in one file, but I would like to see the tensorcores implementation reuse the same definitions and not redefine the types, functions, enums, etc.
Can you please review and let me know where there is still divergence?

JackAKirk · 2022-04-12T15:57:38Z

@JackAKirk, in this PR, we add the use argument to transition from using required layout (especially packed_b) to "use" argument. This will make it possible hopefully to remove the duplicated code in the tensorcores implementation. I don't think we will merge the two in one file, but I would like to see the tensorcores implementation reuse the same definitions and not redefine the types, functions, enums, etc. Can you please review and let me know where there is still divergence?

Sounds good, I'll take a look. Hopefully once we make the tensor cores impl support the WI interface as well we can think about supporting them in the same SYCL_EXT_ONEAPI_MATRIX extension version.

JackAKirk · 2022-04-13T13:28:46Z

sycl/include/CL/__spirv/spirv_types.hpp

@@ -129,10 +136,12 @@ enum class MatrixLayout : uint32_t {
 // information to SPIRV translator.
 // The long term solution would be to introduce a matrix type in Clang and use
 // it instead of this member.
-template <typename T, std::size_t R, std::size_t C, MatrixLayout U,
+template <typename T, std::size_t R, std::size_t C, MatrixUse U, MatrixLayout L,


This matches the template arguments used for joint_matrix in matrix-tensorcore.hpp (the current CUDA impl), except that the ordering is a little different (MatrixUse is the second template param in matrix-tensorcore.hpp). Trivial point, but we should align an ordering before merging the impls into a single SYCL_EXT_ONEAPI_MATRIX version.

This is a good point. Currently, this is at this position because it is an optional argument for now (to keep backward compatibility with the previous API (no use argument)).
Once this API with "use" argument is stable enough that we can remove the non-use-API, we can revise the order.
My personal preference is that it is probably a good idea to keep the arguments that might be become "optional" in the future as last. Today, "use" is required but layout is definitely optional.

I suggest not to modify the order of template parameters of internal SPIR-V joint matrix type representation. The reason is that after this patch: #6535 the clang started to generate opaque matrix type like this: spirv.JointMatrixINTEL._half_10_2_0_0 and hence we can and will remove the array W/A. So it becomes crucial to keep the template parameter's order internally (it still can be changed in user-visible API).

So, we will keep the Workaround in SPIRVRegularizeLLVMBase::adaptStructTypes for now. Since the W/A will be removed in the future, we'd better move the Use at the end of template param list, right?

JackAKirk · 2022-04-13T13:30:07Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+struct joint_matrix {
+public:
+  __spv::__spirv_JointMatrixINTEL<
+      T, NumRows, NumCols, spv_matrix_use_traits<Use>::value,


This matches the template arguments used for joint_matrix in matrix-tensorcore.hpp (the current CUDA impl), except that the ordering is a little different (matrix_use is the second template param in matrix-tensorcore.hpp). Trivial point, but we should align an ordering before merging the impls into a single SYCL_EXT_ONEAPI_MATRIX version.

JackAKirk · 2022-04-13T13:40:58Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+
+// unnecessary was introduced for backward compatibility.
+// Once the use implementation is stable, "unnecessary" value will be omitted
+enum class matrix_use { matrix_a, matrix_b, accumulator, unnecessary };


in matrix-tensorcores.hpp we define "a"/"b" rather than "matrix_a"/"matrix_b":

enum class matrix_use { a, b, accumulator };

We should align on a single naming for matrix_use members, since it is a user facing class that the user will have to provide to the joint_matrix template constructor.

I prefer adding matrix there, but then why there is "matrix" on "a" and "b" and not on "accumulator".
I think a better naming can be "left", "right", and "accumulator"

JackAKirk · 2022-04-13T13:49:41Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+          access::address_space Space>
+inline __SYCL_ALWAYS_INLINE void
+joint_matrix_store(Group sg,
+                   joint_matrix<T, NumRows, NumCols, Use, MatL, Group> &src,


This is a more interesting point:

Are you sure that the matrix_use is necessary as a parameter for joint_matrix_store? In the CUDA backend it is not necessary because you can only call joint_matrix_store for a matrix_use::accumulator matrix, since there is no use case to store a matrix back to a matrix tile pointer storage other than the matrix that is the output of a MMA, which is always a matrix_use::accumulator matrix. I'd be surprised if this was different in the Intel backend!?
If this template param is removed this function interface will align with the definition in matrix-tensorcores.hpp.

dkhaldi · 2022-09-07T15:49:05Z

sycl/test/matrix/matrix-bfloat16-test-use.cpp

-           joint_matrix<ext::oneapi::sub_group, unsigned short, TM, TK> sub_a(
-               sg);
+           sycl::ext::oneapi::sub_group sg = spmd_item.get_sub_group();
+           joint_matrix<bfloat16, TM, TK, use::a> sub_a(sg);
           // For B, since current implementation does not support non-packed


this comment does not apply anymore

dkhaldi

LGTM

dkhaldi · 2022-09-07T15:50:27Z

sycl/test/matrix/matrix-bfloat16-test-use.cpp

-             return;
+           joint_matrix<bfloat16, TK, TN, use::b> sub_b(sg);
+           joint_matrix<float, TM, TN, use::accumulator> sub_c(sg);
+
           // AMX: 8 register tiles : 1k byte size, SMmaxxSKmax =16x64


also, let's not make AMX specific --> remove this comment

JackAKirk · 2022-09-07T15:52:21Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+};
+
+template <typename Group, typename T, size_t NumRows, size_t NumCols, use Use,
+          layout Layout = layout::unused, access::address_space Space>


I don't think that this needs layout Layout = layout::unused here right? This template paramater should be safely removed since it is not used in intel/cuda/hip for this interface.

dkhaldi · 2022-09-07T15:52:22Z

sycl/test/matrix/matrix-bfloat16-test-use.cpp

-             return;
+           joint_matrix<bfloat16, TK, TN, use::b> sub_b(sg);
+           joint_matrix<float, TM, TN, use::accumulator> sub_c(sg);
+
           // AMX: 8 register tiles : 1k byte size, SMmaxxSKmax =16x64
           // strideX = X's cols, so strideC = N, strideA = K, strideB = N*4
           joint_matrix_load(sg, sub_c,


use joint_matrix_fill instead

JackAKirk · 2022-09-07T15:53:23Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+          layout Layout = layout::unused, access::address_space Space>
+inline __SYCL_ALWAYS_INLINE void
+joint_matrix_load(Group sg,
+                  joint_matrix<T, NumRows, NumCols, Use, Layout, Group> &res,


You can change this to joint_matrix<T, NumRows, NumCols, Use, layout::unused, Group> &res,

Then get rid of the layout template parameter as mentioned above.

but For CUDA backend, layout must be specified for matrix A and B, cuda users won't set it it unused for a&b, right?
according to #5835 (comment)
#5835 (review)

That's right for joint_matrix_load; however I think it is better if there is a separate overload for that case of joint_matrix_load in the cuda backend like I added here:

llvm/sycl/include/sycl/ext/oneapi/matrix/matrix-tensorcore.hpp

Line 780 in a881055

void joint_matrix_load(

Then we have also a version of joint_matrix_load that is only for the accumulator matrix::use here

llvm/sycl/include/sycl/ext/oneapi/matrix/matrix-tensorcore.hpp

Line 752 in a881055

void joint_matrix_load(

This immediately prevents the user from using both the template layout parameter and the non-template layout parameter at the same time. This also allows the Intel backend implementation to be simplified in the suggested way.

But for joint_matrix_store also in the CUDA backend there is no cases that require the layout as a template parameter (since joint_matrix_store is only functional for matrix::use::accumulator anyway in the CUDA backend - and this matrix::use type always relies on the non-template runtime layout parameter.) So also in the Intel backend the layout template parameter is not needed.

It is a little confusing but I hope that makes sense?

JackAKirk · 2022-09-07T15:56:25Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+}
+
+template <typename Group, typename T, size_t NumRows, size_t NumCols, use Use,
+          layout MatL = layout::unused, access::address_space Space>


Same situation here as with joint_matrix_load. You can remove layout MatL = layout::unused from the template parameters then directly use layout::unused in joint_matrix<T, NumRows, NumCols, Use, layout::unused, Group>

Otherwise it makes it legal for a user to try to call joint_matrix_load/store with a different layout from layout::unused which is wrong.

yubingex007-a11y · 2022-09-08T08:05:59Z

The clang-format issue has no relationship with this pr since we doesn't change clang/lib/Driver/ToolChains/SYCL.h, sycl/test/basic_tests/stdcpp_compat.cpp

steffenlarsen · 2022-09-08T10:37:27Z

The clang-format issue has no relationship with this pr since we doesn't change clang/lib/Driver/ToolChains/SYCL.h, sycl/test/basic_tests/stdcpp_compat.cpp

Most testing won't run until lint is happy. I suspect this is a previously fixed problem that is still present in this because it is based on a faulty intel/llvm HEAD. Could you please try and merge the sycl branch into this? My hope is that will fix it.

yubingex007-a11y · 2022-09-09T01:38:19Z

The clang-format issue has no relationship with this pr since we doesn't change clang/lib/Driver/ToolChains/SYCL.h, sycl/test/basic_tests/stdcpp_compat.cpp

Most testing won't run until lint is happy. I suspect this is a previously fixed problem that is still present in this because it is based on a faulty intel/llvm HEAD. Could you please try and merge the sycl branch into this? My hope is that will fix it.

ok, let me try, thanks!

yubingex007-a11y · 2022-09-09T05:20:09Z

ping?

dkhaldi · 2022-09-09T13:01:35Z

I think the only change needed here is to remove L from template arguments of load/store, replace it with "unused". But keep the memL that is the argument of the function

JackAKirk · 2022-09-09T16:58:03Z

I think the only change needed here is to remove L from template arguments of load/store, replace it with "unused". But keep the memL that is the argument of the function

Looks good now. There is also still the issue of whether Rows/Cols will take a default value in joint_matrix right? And then as a result whether we have to move the Layout template parameter back before the Rows/Cols template parameter in joint_matrix.

Also in the CUDA backend we also have an additional template parameter in joint_matrix_load that is necessary for the precision cases such as tf32, but you also have this in https://github.com/intel/llvm/pull/5920/files, so I guess it can wait until https://github.com/intel/llvm/pull/5920/files is merged?

dkhaldi · 2022-09-09T18:27:31Z

Looks good now. There is also still the issue of whether Rows/Cols will take a default value in joint_matrix right? And then as a result whether we have to move the Layout template parameter back before the Rows/Cols template parameter in joint_matrix.

Also in the CUDA backend we also have an additional template parameter in joint_matrix_load that is necessary for the precision cases such as tf32, but you also have this in https://github.com/intel/llvm/pull/5920/files, so I guess it can wait until https://github.com/intel/llvm/pull/5920/files is merged?

Yes let's address these two in the next PR. But I will address the first in the doc PR

JackAKirk · 2022-09-13T15:01:34Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit-use.hpp

+#endif // __SYCL_DEVICE_ONLY__
+}
+
+template <typename Group, typename T1, typename T2, typename T3, size_t M,


In the CUDA backend the layouts need to be provided to joint_matrix_mad because they must be known at compile time by the builtins. In the CUDA backend these layouts are inferred from the joint_matrix layout template parameters: see

llvm/sycl/include/sycl/ext/oneapi/matrix/matrix-tensorcore.hpp

Line 825 in a881055

std::size_t K, std::size_t N, layout LayoutA, layout LayoutB>

.

So if we want interfaces that match in both backends we need to add them here too, in the same way. For the user it should appear no different due to CTAD.

seems strange, if we want support intel/cuda backend, why we don't need to provide matrix_load without memL.

dkhaldi

LGTM

yubingex007-a11y · 2022-09-14T13:16:39Z

@steffenlarsen would you help us merge it?

yubingex007-a11y added 2 commits March 18, 2022 16:45

[Matrix][SYCL] Add use argument for joint_matrix and add another feat…

6bf0ce6

…ure macro for it

small fix for testcase and set them xfail

4f18f7f

yubingex007-a11y marked this pull request as ready for review March 23, 2022 02:46

yubingex007-a11y requested a review from a team as a code owner March 23, 2022 02:46

yubingex007-a11y requested review from vladimirlaz, dkhaldi and MrSidims March 23, 2022 02:46

yubingex007-a11y commented Mar 23, 2022

View reviewed changes

dkhaldi requested changes Mar 24, 2022

View reviewed changes

yubingex007-a11y added 2 commits March 31, 2022 15:11

Merge remote-tracking branch 'intel_llvm/sycl' into featuremacro

c722e18

change the order of use and layout and address dounia's comments

299ed92

yubingex007-a11y closed this Apr 1, 2022

yubingex007-a11y reopened this Apr 1, 2022

dkhaldi reviewed Apr 4, 2022

View reviewed changes

yubingex007-a11y added 3 commits April 5, 2022 18:19

small fix

ad6adf2

fix the lint's issue

253e018

small fix again

42feae6

dkhaldi approved these changes Apr 8, 2022

View reviewed changes

keryell reviewed Apr 9, 2022

View reviewed changes

JackAKirk reviewed Apr 13, 2022

View reviewed changes

JackAKirk mentioned this pull request Sep 1, 2022

[SYCL][Spec] Update the matrix spec based on new use argument #6662

Merged

yubingex007-a11y added 3 commits September 2, 2022 15:42

place use at the end of __spirv_JointMatrixINTEL's template parameter

39a09f6

Merge remote-tracking branch 'intel_llvm/sycl' into matrixuse

8c471ce

fix Lint issue

bd84452

dkhaldi reviewed Sep 7, 2022

View reviewed changes

dkhaldi previously approved these changes Sep 7, 2022

View reviewed changes

dkhaldi reviewed Sep 7, 2022

View reviewed changes

JackAKirk reviewed Sep 7, 2022

View reviewed changes

dkhaldi reviewed Sep 7, 2022

View reviewed changes

JackAKirk reviewed Sep 7, 2022

View reviewed changes

address Lint issue

3c1f7bc

yubingex007-a11y dismissed dkhaldi’s stale review via 3c1f7bc September 8, 2022 07:59

Merge remote-tracking branch 'intel_llvm/sycl' into matrixuse

2a178bc

MrSidims previously approved these changes Sep 9, 2022

View reviewed changes

address JarkAKirk's comments

07c5f28

yubingex007-a11y dismissed MrSidims’s stale review via 07c5f28 September 9, 2022 16:40

remove lA, LB, LC from the mad function and replace them with unused

3d3cb3a

JackAKirk reviewed Sep 13, 2022

View reviewed changes

dkhaldi approved these changes Sep 14, 2022

View reviewed changes

steffenlarsen approved these changes Sep 14, 2022

View reviewed changes

steffenlarsen merged commit 76ec14b into intel:sycl Sep 14, 2022

		@@ -0,0 +1,435 @@
		//===-------------- static-query.hpp - SYCL matrix ------------- C++ ----===//

[Matrix][SYCL] Add use argument for joint_matrix and add another feat… #5835

[Matrix][SYCL] Add use argument for joint_matrix and add another feat… #5835

Uh oh!

Conversation

yubingex007-a11y commented Mar 18, 2022

Uh oh!

yubingex007-a11y commented Mar 23, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhaldi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhaldi Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yubingex007-a11y commented Apr 6, 2022

Uh oh!

dkhaldi left a comment

Choose a reason for hiding this comment

Uh oh!

yubingex007-a11y commented Apr 8, 2022

Uh oh!

steffenlarsen commented Apr 8, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhaldi commented Apr 12, 2022

Uh oh!

JackAKirk commented Apr 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackAKirk Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackAKirk Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackAKirk Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

dkhaldi Mar 24, 2022 •

edited

Loading

JackAKirk commented Apr 12, 2022 •

edited

Loading

JackAKirk Apr 13, 2022 •

edited

Loading

JackAKirk Apr 13, 2022 •

edited

Loading

JackAKirk Apr 13, 2022 •

edited

Loading

yubingex007-a11y Sep 8, 2022 •

edited

Loading

JackAKirk Sep 9, 2022 •

edited

Loading