Skip to content

[SYCL][Matrix] Add initial get_coord API. #7037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

Conversation

arnamoy10
Copy link
Contributor

This patch adds initial API for retrieval of coordinates from a work item element.

@arnamoy10 arnamoy10 requested a review from dkhaldi October 12, 2022 20:13
@arnamoy10 arnamoy10 requested a review from a team as a code owner October 12, 2022 20:13
This patch adds initial API for retrieval of coordinates from a work item element.
@arnamoy10 arnamoy10 force-pushed the getcoordinate_support branch from 7936fb0 to 59d0e7e Compare October 13, 2022 15:09

for (int i = 0; i < tCData.length(); ++i) {
size_t row, col;
std::tie(row, col) = tCData[i].get_coord();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also use size_t [ row, col] =
to avoid calling tie

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (int i = 0; i < tCData.length(); ++i) {
size_t row, col;
std::tie(row, col) = tCData[i].get_coord();
res_local_row_acc[row] += tCData[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to return res_local_row_acc and use it verify_function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

matrix_multiply_ref((int32_t *)Aref, (int32_t *)Bref, (int32_t *)D, MATRIX_M,
MATRIX_N, MATRIX_K / 2);

bool res = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I call verify_function.
matrix_multiply_ref should also calculate sum of rows and return that instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -256,6 +257,18 @@ class wi_element {
wi_element(joint_matrix<T, NumRows, NumCols, Use, Layout, Group> &Mat,
std::size_t i)
: M(Mat), idx(i) {}

// Functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -256,6 +257,18 @@ class wi_element {
wi_element(joint_matrix<T, NumRows, NumCols, Use, Layout, Group> &Mat,
std::size_t i)
: M(Mat), idx(i) {}

// Functions
std::tuple<size_t, size_t> get_coord() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to add this function to the specialization of wi_element for bfloat16 type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

arnamoy.bhattacharyya added 2 commits October 14, 2022 10:39
Copy link
Contributor

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a second test that tests all the matrices use: A, B and C. The test should not contain any mad or load operation.
Just have :
joint_matrix<bfloat16, TM, TK, use::a> sub_a(sg);
joint_matrix<bfloat16, TK, TN, use::b> sub_b(sg);
joint_matrix<float, TM, TN, use::accumulator> sub_c(sg);

the joint_matrix_fill for each of them
finally: get_coord function on each of them with the row or col sum calculation or just collecting the coordinates in a vector.

Like this, we will test this function for all three usages of the joint matrix type.

__spv::MatrixLayout L = __spv::MatrixLayout::RowMajor,
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup>
extern SYCL_EXTERNAL std::tuple<T, T>
__spirv_JointMatrixWorkItemElemCoord(JOINT_MATRIX_INTEL(T, R, C, L, S, U) *,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as std::tuple in SPIR-V. The instruction should return int2 and if we want to create a tuple for get_coord API, then we should read elements from this vector to create tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Can you please tell a bit more about this int2 type? Is there any documentation/ code that I can take a look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a 2 elements vector. int2 is a spelling from OpenCL, but guess the appropriate alias should be known for DPCPP, see: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_aliases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@yubingex007-a11y yubingex007-a11y Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, why we use __ocl_vec_t<int32_t, 2> instead of sycl::vec<int32_t, 2> here? @MrSidims

arnamoy.bhattacharyya added 2 commits October 17, 2022 16:59
@arnamoy10
Copy link
Contributor Author

We also need a second test that tests all the matrices use: A, B and C. The test should not contain any mad or load operation. Just have : joint_matrix<bfloat16, TM, TK, use::a> sub_a(sg); joint_matrix<bfloat16, TK, TN, use::b> sub_b(sg); joint_matrix<float, TM, TN, use::accumulator> sub_c(sg);

the joint_matrix_fill for each of them finally: get_coord function on each of them with the row or col sum calculation or just collecting the coordinates in a vector.

Like this, we will test this function for all three usages of the joint matrix type.

Added test case


std::tuple<size_t, size_t> get_coord() {
#ifdef __SYCL_DEVICE_ONLY__
__ocl_vec_t<int32_t, 2> co_ord =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove underscore from the name (co_ord)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see it's applied

sycl::buffer<bfloat16, 2> bufB(B.get_data(), sycl::range<2>(K, N));
sycl::buffer<float, 2> bufC((float *)C.get_data(), sycl::range<2>(M, N));

sycl::buffer<int32_t, 1> res_local_row_bufA(res_local_rowA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use usm instead of accessors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

void matrix_coord_ref(int *A_mem, int *B_mem, int *C_mem, int M, int N, int K) {
for (int m = 0; m < M; m++)
for (int k = 0; k < K; k++) {
short *va = (short *)(A_mem + m * K + k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to bfloat16 *A_mem
A_mem[m][k]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

N * 2, layout::packed_b);
sub_c = joint_matrix_mad(sg, sub_a, sub_b, sub_c);
}
joint_matrix_store(sg, sub_c,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the store

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


sycl::ext::oneapi::sub_group sg = spmd_item.get_sub_group();
joint_matrix<bfloat16, TM, TK, use::a> sub_a(sg);
// For B, since current implementation does not support non-packed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment does not apply anymore, remove it

@dkhaldi
Copy link
Contributor

dkhaldi commented Oct 25, 2022

@yubingex007-a11y @MrSidims , can you please add your reviews as well?

Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkhaldi when do we plan to change Matrix feature macro?

__spv::MatrixLayout L = __spv::MatrixLayout::RowMajor,
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup>
extern SYCL_EXTERNAL __ocl_vec_t<int32_t, 2>
__spirv_JointMatrixWorkItemElemCoord(JOINT_MATRIX_INTEL(T, R, C, L, S, U) *,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wait with the merge until the name for the instruction is picked. In the draft SPIR-V spec version it is JointMatrixGetElementCoordINTEL

__spv::MatrixUse U = __spv::MatrixUse::Unnecessary,
__spv::MatrixLayout L = __spv::MatrixLayout::RowMajor,
__spv::Scope::Flag S = __spv::Scope::Flag::Subgroup>
extern SYCL_EXTERNAL __ocl_vec_t<int32_t, 2>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reasons can not add suggestion.
here and after int32_t -> uint32_t

@@ -256,6 +257,21 @@ class wi_element {
wi_element(joint_matrix<T, NumRows, NumCols, Use, Layout, Group> &Mat,
std::size_t i)
: M(Mat), idx(i) {}

std::tuple<size_t, size_t> get_coord() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: size_t -> uint32_t is probably better
same nit applicable to the code below


std::tuple<size_t, size_t> get_coord() {
#ifdef __SYCL_DEVICE_ONLY__
__ocl_vec_t<int32_t, 2> co_ord =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see it's applied

@dkhaldi
Copy link
Contributor

dkhaldi commented Oct 25, 2022

@dkhaldi when do we plan to change Matrix feature macro?

change it how?
the matrix feature macro will not change. You mean the implementation version? making use default?

for (int i = 0; i < tAData.length(); ++i) {
auto [row, col] = tAData[i].get_coord();
resA[row] += tAData[i];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// SG size = 64, sub_a[8][16], multiple WI share a row
WI 0 --> 2 elements of row 0 --> resA[0]
WI 1 --> 3 elements of row 0 --> resA[0]
resA should be private variable --> length size
partial_sum[row] = reduction among the WI (reduce_over_group),
every WI will have same value of partial_sum[row]
copy partial_sum into the global variable

@MrSidims
Copy link
Contributor

@dkhaldi when do we plan to change Matrix feature macro?

change it how? the matrix feature macro will not change. You mean the implementation version? making use default?

Sorry, I though that we will move feature macro with this change, but it's not the case.

arnamoy.bhattacharyya added 2 commits November 4, 2022 11:02
@whitneywhtsang whitneywhtsang deleted the getcoordinate_support branch December 1, 2022 19:59
dm-vodopyanov pushed a commit that referenced this pull request Mar 28, 2023
This patch adds an initial API for the retrieval of coordinates from a
work item element. A `get_coord()` method is added to the intel
namespace to work on `wi_element` class. Also, a relevant SPIRV op is
added, which the get_coord() gets lowered to.

This is recreated PR from my forked repo. The discussions are in the
original (closed) PR #7037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants