-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][Matrix] Add initial get_coord API. #7037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch adds initial API for retrieval of coordinates from a work item element.
7936fb0
to
59d0e7e
Compare
|
||
for (int i = 0; i < tCData.length(); ++i) { | ||
size_t row, col; | ||
std::tie(row, col) = tCData[i].get_coord(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also use size_t [ row, col] =
to avoid calling tie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for (int i = 0; i < tCData.length(); ++i) { | ||
size_t row, col; | ||
std::tie(row, col) = tCData[i].get_coord(); | ||
res_local_row_acc[row] += tCData[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to return res_local_row_acc and use it verify_function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
matrix_multiply_ref((int32_t *)Aref, (int32_t *)Bref, (int32_t *)D, MATRIX_M, | ||
MATRIX_N, MATRIX_K / 2); | ||
|
||
bool res = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what I call verify_function.
matrix_multiply_ref should also calculate sum of rows and return that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -256,6 +257,18 @@ class wi_element { | |||
wi_element(joint_matrix<T, NumRows, NumCols, Use, Layout, Group> &Mat, | |||
std::size_t i) | |||
: M(Mat), idx(i) {} | |||
|
|||
// Functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -256,6 +257,18 @@ class wi_element { | |||
wi_element(joint_matrix<T, NumRows, NumCols, Use, Layout, Group> &Mat, | |||
std::size_t i) | |||
: M(Mat), idx(i) {} | |||
|
|||
// Functions | |||
std::tuple<size_t, size_t> get_coord() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to add this function to the specialization of wi_element for bfloat16 type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need a second test that tests all the matrices use: A, B and C. The test should not contain any mad or load operation.
Just have :
joint_matrix<bfloat16, TM, TK, use::a> sub_a(sg);
joint_matrix<bfloat16, TK, TN, use::b> sub_b(sg);
joint_matrix<float, TM, TN, use::accumulator> sub_c(sg);
the joint_matrix_fill for each of them
finally: get_coord function on each of them with the row or col sum calculation or just collecting the coordinates in a vector.
Like this, we will test this function for all three usages of the joint matrix type.
__spv::MatrixLayout L = __spv::MatrixLayout::RowMajor, | ||
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup> | ||
extern SYCL_EXTERNAL std::tuple<T, T> | ||
__spirv_JointMatrixWorkItemElemCoord(JOINT_MATRIX_INTEL(T, R, C, L, S, U) *, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such thing as std::tuple in SPIR-V. The instruction should return int2
and if we want to create a tuple for get_coord API, then we should read elements from this vector to create tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Can you please tell a bit more about this int2
type? Is there any documentation/ code that I can take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a 2 elements vector. int2 is a spelling from OpenCL, but guess the appropriate alias should be known for DPCPP, see: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_aliases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, why we use __ocl_vec_t<int32_t, 2> instead of sycl::vec<int32_t, 2> here? @MrSidims
…the vec to get the coordinates.
Added test case |
|
||
std::tuple<size_t, size_t> get_coord() { | ||
#ifdef __SYCL_DEVICE_ONLY__ | ||
__ocl_vec_t<int32_t, 2> co_ord = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove underscore from the name (co_ord)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see it's applied
sycl::buffer<bfloat16, 2> bufB(B.get_data(), sycl::range<2>(K, N)); | ||
sycl::buffer<float, 2> bufC((float *)C.get_data(), sycl::range<2>(M, N)); | ||
|
||
sycl::buffer<int32_t, 1> res_local_row_bufA(res_local_rowA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use usm instead of accessors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
void matrix_coord_ref(int *A_mem, int *B_mem, int *C_mem, int M, int N, int K) { | ||
for (int m = 0; m < M; m++) | ||
for (int k = 0; k < K; k++) { | ||
short *va = (short *)(A_mem + m * K + k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to bfloat16 *A_mem
A_mem[m][k]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
N * 2, layout::packed_b); | ||
sub_c = joint_matrix_mad(sg, sub_a, sub_b, sub_c); | ||
} | ||
joint_matrix_store(sg, sub_c, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…at16, fix theCPU kernel
|
||
sycl::ext::oneapi::sub_group sg = spmd_item.get_sub_group(); | ||
joint_matrix<bfloat16, TM, TK, use::a> sub_a(sg); | ||
// For B, since current implementation does not support non-packed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment does not apply anymore, remove it
@yubingex007-a11y @MrSidims , can you please add your reviews as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkhaldi when do we plan to change Matrix feature macro?
__spv::MatrixLayout L = __spv::MatrixLayout::RowMajor, | ||
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup> | ||
extern SYCL_EXTERNAL __ocl_vec_t<int32_t, 2> | ||
__spirv_JointMatrixWorkItemElemCoord(JOINT_MATRIX_INTEL(T, R, C, L, S, U) *, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please wait with the merge until the name for the instruction is picked. In the draft SPIR-V spec version it is JointMatrixGetElementCoordINTEL
__spv::MatrixUse U = __spv::MatrixUse::Unnecessary, | ||
__spv::MatrixLayout L = __spv::MatrixLayout::RowMajor, | ||
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup> | ||
extern SYCL_EXTERNAL __ocl_vec_t<int32_t, 2> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reasons can not add suggestion.
here and after int32_t -> uint32_t
@@ -256,6 +257,21 @@ class wi_element { | |||
wi_element(joint_matrix<T, NumRows, NumCols, Use, Layout, Group> &Mat, | |||
std::size_t i) | |||
: M(Mat), idx(i) {} | |||
|
|||
std::tuple<size_t, size_t> get_coord() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: size_t -> uint32_t is probably better
same nit applicable to the code below
|
||
std::tuple<size_t, size_t> get_coord() { | ||
#ifdef __SYCL_DEVICE_ONLY__ | ||
__ocl_vec_t<int32_t, 2> co_ord = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see it's applied
change it how? |
for (int i = 0; i < tAData.length(); ++i) { | ||
auto [row, col] = tAData[i].get_coord(); | ||
resA[row] += tAData[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SG size = 64, sub_a[8][16], multiple WI share a row
WI 0 --> 2 elements of row 0 --> resA[0]
WI 1 --> 3 elements of row 0 --> resA[0]
resA should be private variable --> length size
partial_sum[row] = reduction among the WI (reduce_over_group),
every WI will have same value of partial_sum[row]
copy partial_sum into the global variable
Sorry, I though that we will move feature macro with this change, but it's not the case. |
…pipeline is supported.
This patch adds an initial API for the retrieval of coordinates from a work item element. A `get_coord()` method is added to the intel namespace to work on `wi_element` class. Also, a relevant SPIRV op is added, which the get_coord() gets lowered to. This is recreated PR from my forked repo. The discussions are in the original (closed) PR #7037
This patch adds initial API for retrieval of coordinates from a work item element.