-
Notifications
You must be signed in to change notification settings - Fork 789
Enable matrix_load, matrix_store, and matrix_mad #4076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: I97b939c5559317ad82bec460221bccb83975b004
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, put files to ext/intel
directory instead of ONEAPI
.
@bader, Although the matrix feature is in experimental phase, it is a oneAPI feature, it is not an intel specific feature. I think we should keep it as it was already. |
Sorry, I meant to say |
What would be the namespace we should use in this case? |
The namespace should be According to my understanding Tagging @dm-vodopyanov and @romanovvlad to correct me if I miss something. |
Please, use |
@@ -38,7 +38,7 @@ | |||
__SYCL_INLINE_NAMESPACE(cl) { | |||
namespace sycl { | |||
namespace ext { | |||
namespace intel { | |||
namespace oneapi { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mistakenly change it, sorry for inconvience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. change namespace to ext::intel 2. move matrix-aot-amx.hpp to ext/intel directory
matrix_layout Layout = matrix_layout::row_major, | ||
access::address_space Space> | ||
inline __SYCL_ALWAYS_INLINE void joint_matrix_store( | ||
Group sg, joint_matrix<Group, T, NumRows, NumCols, Layout> &res, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group sg, joint_matrix<Group, T, NumRows, NumCols, Layout> &res, | |
Group sg, joint_matrix<Group, T, NumRows, NumCols, Layout> &obj, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change joint_matrix_load as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexeySotkin can you please elaborate why you asked for the change res --> obj?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just improving the name. There is no result for the store operation. It returns void. This parameter is a matrix object which we want to store.
access::address_space Space> | ||
inline __SYCL_ALWAYS_INLINE void joint_matrix_load( | ||
Group sg, joint_matrix<Group, T, NumRows, NumCols, Layout> &res, | ||
multi_ptr<T, Space> src, size_t stride, matrix_layout L = Layout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function argument L
is not used. Probably we should use it as operand of __spirv_MatrixLoadINTEL
, but it must be a constant.
It seems like we have two layouts here: one for the memory we read from, and another for the matrix object, right? If so, I think we need another template parameter to specify layout of the memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the changes we agreed on in our meeting today:
- remove matrix_layout L = Layout
- pass L (the memory layout) to the load/store functions
- rename L --> memL
- rename Layout --> matL
- rename the header file spirv API from matrix to joint_matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not quite clear what is "pass L to the load/store functions". I think we should move the last function parameter to template parameters, right?
Like:
template <typename Group, typename T, size_t NumRows, size_t NumCols,
matrix_layout MemL = matrix_layout::row_major,
matrix_layout MatL = matrix_layout::row_major,
access::address_space Space>
inline __SYCL_ALWAYS_INLINE void joint_matrix_load(
Group sg, joint_matrix<Group, T, NumRows, NumCols, Layout> &res,
multi_ptr<T, Space> src, size_t stride) {
...
res.spvm = __spirv_MatrixLoadINTEL<T, NumRows, NumCols,
spv_matrix_layout_traits<MemL>::value,
spv_matrix_layout_traits<MatL>::value>(Ptr, stride);
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the user does not need to specify any template argument when calling load/store. If we move the memory layout to the template, the user will have to specify each of the parameters (deduce all or nothing) when calling load/store.
The other reason is that I would like to not converge too much (unless there is a big reason) from Vulkan and WMMA APIs thus keeping the meory layout as argument to the functions
T *Ptr = src.get(); | ||
res.spvm = __spirv_MatrixLoadINTEL<T, NumRows, NumCols, | ||
spv_matrix_layout_traits<Layout>::value>( | ||
Ptr, stride, spv_matrix_layout_traits<Layout>::value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MemLayout
}; | ||
|
||
template <typename Group, typename T, size_t NumRows, size_t NumCols, | ||
matrix_layout Layout = matrix_layout::row_major, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matL
@@ -1,4 +1,4 @@ | |||
// RUN: %clangxx -march=sapphirerapids -fsycl -O2 %s -o %t.out | |||
// RUN: %clangxx -DSYCL_EXT_ONEAPI_MATRIX=1 -march=sapphirerapids -fsycl -O2 %s -o %t.out | |||
#include <CL/sycl.hpp> | |||
#if (SYCL_EXT_ONEAPI_MATRIX == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Only the leader perform AMX computation.
Why do you have "the leader" test in the code. this is wrong for two reasons:
1- SG size is one so we don't need to test that.
2- all work items in the subgroup must enter the joint matrix code. So we should NOT have diverged code in there
|
||
// Only the leader perform AMX computation. | ||
if (spmd_item.get_local_id(1) % TILE_SZ) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we should not have "the leader" test in the code. this is wrong because all work items in the subgroup must enter the joint matrix code. So we should not have diverge code in there.
The OpenCL backend will not vectorize the code so we should be able to generate ONE AMX code for the whole sub group
|
||
// Only the leader perform AMX computation. | ||
if (spmd_item.get_local_id(1) % TILE_SZ) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
|
||
// Only the leader perform AMX computation. | ||
if (spmd_item.get_local_id(1) % TILE_SZ) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
|
||
// Only the leader perform AMX computation. | ||
if (spmd_item.get_local_id(1) % TILE_SZ) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
case matrix_layout::row_major: | ||
res.spvm = | ||
__spirv_JointMatrixLoadINTEL<T, NumRows, NumCols, | ||
spv_matrix_layout_traits<Layout>::value>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you omitted the group argument here right?
res.spvm = | ||
__spirv_JointMatrixLoadINTEL<T, NumRows, NumCols, | ||
spv_matrix_layout_traits<Layout>::value>( | ||
Ptr, stride, __spv::MatrixLayout::ColumnMajor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
res.spvm = | ||
__spirv_JointMatrixLoadINTEL<T, NumRows, NumCols, | ||
spv_matrix_layout_traits<Layout>::value>( | ||
Ptr, stride, __spv::MatrixLayout::PackedA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spirv should have scope argument so sg should be passed in all load/store/mad functions
case matrix_layout::col_major: | ||
__spirv_JointMatrixStoreINTEL<T, NumRows, NumCols, | ||
spv_matrix_layout_traits<MatL>::value>( | ||
Ptr, obj.spvm, stride, __spv::MatrixLayout::ColumnMajor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we said obj should be change to "src" right?
LGTM |
@dkhaldi, you must approve using GitHub UI to unblock the merge. Currently it's blocked by your previous review where changes were requested. |
@intel/llvm-reviewers-runtime Hi, Could you do a code review please? |
1- we move the Group template argument on joint_matrix to the last position and make it optional. By default set Group to subgroup. So the user does not need to specify this each time 2- we change the "mad" function name from joint_matrix_mad to joint_matrix_mma.
LGTM, @bader can you help merge this PR? |
The PR is waiting on code owner review from @intel/llvm-reviewers-runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matrix-amx.hpp => matrix-aot-amx.hpp
Does this renaming follow API/ABI compatibility policy?
sycl/include/CL/sycl/ONEAPI/matrix/matrix-amx.hpp
said one should include matrix-amx.hpp
and the latter file becomes unavailable now.
Probably, the user won't be happy.
Other than that, the changes seem to be legit.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Change-Id: I97b939c5559317ad82bec460221bccb83975b004