-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][Joint Matrix Spec] Add new API for out of bounds fill/load/store #11172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
36cbd73
f0c0610
d9608a2
8c197c6
d041a73
f7fdefe
0039eb0
908c321
4ae7c46
ee364e4
07e72ab
e9886ae
3354607
df2c2d4
95cdb68
166ec96
667ba34
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -198,6 +198,67 @@ joint_matrix_apply(sg, A, [=](T &val, size_t row, size_t col) { | |
sum_local_rows[row] += val; | ||
}); | ||
``` | ||
|
||
=== Load/Store/Fill With Out-of-Bounds Checks | ||
`joint_matrix_load`, `joint_matrix_store`, and `joint_matrix_fill` | ||
operations do not do bounds checking. When the global matrix size does | ||
not multiply `Rows` and `Cols` of the joint matrix, this extension | ||
YuriPlyakhin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
adds a new form of API to load, store, and fill joint matrix while | ||
checking the bounds. For load and fill, the out-of-bounds elements are | ||
set to 0. For the store, they are dropped out. | ||
|
||
These functions are similar to the existing ones but they | ||
YuriPlyakhin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
take extra arguments to determine the global bounds `Height` and `Width` of the | ||
two-dimensional region to load to or store from. | ||
|
||
```c++ | ||
namespace sycl::ext::intel::experimental::matrix { | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fda0 does this change address the missing arguments to generate efficiently 2d load/store in IGC? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fda0 Yes I changed it to add the two offset coordinates separately and excluded them from the base pointer calculation. |
||
template <typename Group, typename T, size_t Rows, size_t Cols, | ||
use Use, layout Layout, typename Tv> | ||
void joint_matrix_fill_checked(Group g, joint_matrix<Group, T, Use, Rows, | ||
Cols, Layout> &m, Tv v, size_t Height, size_t Width); | ||
|
||
// Only available when std::is_same_v<T1, std::remove_const_t<T2>> | ||
template <typename Group, typename T1, typename T2, | ||
size_t Rows, size_t Cols, | ||
access::address_space Space, access::decorated IsDecorated> | ||
void joint_matrix_load_checked(Group g, | ||
joint_matrix<Group, T1, use::accumulator, Rows, Cols, layout::dynamic> &res, | ||
multi_ptr<T2, Space, IsDecorated> src, size_t stride, | ||
layout Layout, size_t Height, size_t Width); | ||
|
||
// Only available when Layout != layout::dynamic | ||
// and when std::is_same_v<T1, std::remove_const_t<T2>> | ||
template <typename Group, typename T1, typename T2, | ||
size_t Rows, size_t Cols, | ||
use Use, layout Layout, | ||
access::address_space Space, access::decorated IsDecorated> | ||
void joint_matrix_load_checked(Group g, | ||
joint_matrix<Group, T1, Use, Rows, Cols, Layout> &res, | ||
multi_ptr<T2, Space, IsDecorated> src, size_t stride, | ||
size_t Height, size_t Width); | ||
|
||
template <typename Group, typename T, size_t Rows, size_t Cols, | ||
access::address_space Space, access::decorated IsDecorated> | ||
void joint_matrix_store_checked(Group g, | ||
const joint_matrix<Group, T, use::accumulator, Rows, Cols, layout::dynamic> &res, | ||
multi_ptr<T, Space, IsDecorated> dest, size_t stride, layout Layout, | ||
size_t Height, size_t Width); | ||
|
||
// Only available when Layout != layout::dynamic | ||
template <typename Group, typename T, size_t Rows, size_t Cols, | ||
use Use, layout Layout, access::address_space Space, | ||
access::decorated IsDecorated> | ||
void joint_matrix_store_checked(Group g, | ||
const joint_matrix<Group, T, Use, Rows, Cols, Layout> &res, | ||
multi_ptr<T, Space, IsDecorated> src, size_t stride, | ||
size_t Height, size_t Width); | ||
|
||
} // namespace sycl::ext::intel::experimental::matrix | ||
``` | ||
|
||
|
||
=== New Device Information Descriptor | ||
Besides the query we provide in | ||
link:sycl_ext_oneapi_matrix.asciidoc[sycl_ext_oneapi_matrix], | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand how this API can work. We might need to schedule a meeting where we can draw some pictures.
Wouldn't it make more sense to require the user to set
Width
andHeight
according to the remaining amount of memory in the "big matrix"? That way, the implementation simply checks to see if the template parameterRows
is greater thanHeight
and if the template parameterCols
is greater thanWidth
. If they are both less, the operations behave like normal loads and stores. If they are less, then the load sets the out-of-range elements to zero and the store ignores the out-of-range elements.From the description, though, it doesn't seem like the APIs work this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is done compared to the big matrix size comparing the address to be loaded (ptr) with the big matrix size (Height). If ptr > Height, the element should not be loaded.
if the remainder is passed, how is the implementation going to compare with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your recent change that adds
coordX
andcoordY
parameters, I see how this could work for the load and store APIs. However, I still don't see how the fill API can work.This API structure seems confusing to me, though, and probably less efficient. I suspect users will get confused that the
src
pointer tojoint_matrix_load
is completely different from thesrc_base
pointer tojoint_matrix_load_checked
. It seems like it would be less confusing and more efficient to have the user pass values forWidth
andHeight
that are the remaining amount of memory in the big matrix. Maybe this would be easier to explain in a meeting where we can draw pictures?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's discuss fill and remainder approach in a meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I unresolve this comment as there was no follow up and I probably has missed the meeting? :) I also don't understand, why we need coordinates for fill