-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][Matrix spec] Add joint_matrix_prefetch and overloads of load/store with annotated_ptr #11473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@gmlueck, gentle ping to review the spec of overloads of load/store on annotated pointer and prefetch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should have some documentation about which properties can be used with joint_matrix_load
and joint_matrix_store
. The same goes for joint_matrix_prefetch
. That function takes a property list, but there's no documentation about which properties you can pass.
I added some documentation as we will use existing properties for cache control and cache levels. Is what I added enough? |
sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc
Outdated
Show resolved
Hide resolved
…, update cache control syntax
The current spec might change as Row and Cols should be literals or const expr like in the load instructions otherwise the IGC 2d load instructions cannot be generated. |
@gmlueck, as we discussed on Monday, I put the prefetch size as template arguments since they must be constexpr for the implementation. |
typename PropertyListT> | ||
void joint_matrix_load(Group g, | ||
joint_matrix<Group, T1, use::accumulator, Rows, Cols, layout::dynamic> &res, | ||
ext::oneapi::experimental::annotated_ptr<T2, PropertyListT> src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ext::oneapi::experimental::annotated_ptr<T2, PropertyListT> src, | |
annotated_ptr<T2, PropertyListT> src, |
No need for this namespace because the function is also defined in that namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this function is defined in ext::oneapi::experimental::matrix namespace
annotated_ptr is defined in ext::oneapi::experimental namespace
Do you still think I should remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This function declaration has visibility to identifiers defined in any of the parent namespaces, which includes ext::oneapi::experimental
.
syclex::properties{syclintelex::read_assertion< | ||
syclintelex::cache_control<syclintelex::cache_mode::constant, | ||
syclex::cache_level::L2>>}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syclex::properties{syclintelex::read_assertion< | |
syclintelex::cache_control<syclintelex::cache_mode::constant, | |
syclex::cache_level::L2>>}}; | |
syclex::properties{syclintelex::read_hint< | |
syclintelex::cache_control<syclintelex::cache_mode::cached, | |
syclex::cache_level::L2>>}}; |
The comment below says "this load will be cached to L2". Assuming that's what you want the example to show, then you want read_hint
here, not read_assertion
.
template <typename Group, typename T1, typename T2, size_t Rows, size_t Cols, | ||
typename PropertyListT> | ||
void joint_matrix_store(Group g, | ||
const joint_matrix<Group, T1, use::accumulator, Rows, Cols, layout::dynamic> &res, | ||
ext::oneapi::experimental::annotated_ptr<T2, PropertyListT> dest, | ||
size_t stride, layout Layout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template <typename Group, typename T1, typename T2, size_t Rows, size_t Cols, | |
typename PropertyListT> | |
void joint_matrix_store(Group g, | |
const joint_matrix<Group, T1, use::accumulator, Rows, Cols, layout::dynamic> &res, | |
ext::oneapi::experimental::annotated_ptr<T2, PropertyListT> dest, | |
size_t stride, layout Layout); | |
template <typename Group, typename T, size_t Rows, size_t Cols, | |
typename PropertyListT> | |
void joint_matrix_store(Group g, | |
const joint_matrix<Group, T, use::accumulator, Rows, Cols, layout::dynamic> &res, | |
ext::oneapi::experimental::annotated_ptr<T, PropertyListT> dest, | |
size_t stride, layout Layout); |
The other overload of joint_matrix_store
has only one T
template parameter (not two), so I presume you want the same here.
Note, however, in another PR I suggested changing joint_matrix_store
to have two template parameters T1
and T2
. So, I do think it makes sense for both overloads to have these two template parameters. However, I think you should make that change in a separate PR, which changes both overloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Let's add T1 and T2 in the other PR #12276
Whichever PR gets merged first, the other one will follow.
|
||
```c++ | ||
using syclex = sycl::ext::oneapi::experimental; | ||
using syclintelex = sycl::ext::intel::experimental; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this line. syclintelex
isn't used in this example.
The level of cache targeted by `joint_matrix_prefetch` in the last | ||
argument is specified using the compile-time properties defined in the | ||
SYCL extension | ||
link:../../proposed/sycl_ext_oneapi_prefetch.asciidoc[sycl_ext_oneapi_prefetch.asciidoc#specifying-cache-levels] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link:../../proposed/sycl_ext_oneapi_prefetch.asciidoc[sycl_ext_oneapi_prefetch.asciidoc#specifying-cache-levels] | |
link:../../proposed/sycl_ext_oneapi_prefetch.asciidoc[sycl_ext_oneapi_prefetch] |
I'd suggest pointing just at the extension as a whole, rather than at a specification section, as I have here. Note that the previous link wasn't right anyway, so it wasn't actually pointing at the section.
argument is specified using the compile-time properties defined in the | ||
SYCL extension | ||
link:../../proposed/sycl_ext_oneapi_prefetch.asciidoc[sycl_ext_oneapi_prefetch.asciidoc#specifying-cache-levels] | ||
as illustrated in the example below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also describe the behavior when there are no properties. For example:
... as illustrated in the example below. When no cache levels are specified, the default behavior is to prefetch into the lowest level cache (i.e. L1).
[Example]
sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc
Show resolved
Hide resolved
@intel/llvm-gatekeepers, can you please merge? |
No description provided.