Skip to content

[SYCL][Matrix spec] Add joint_matrix_prefetch and overloads of load/store with annotated_ptr #11473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Feb 21, 2024

Conversation

dkhaldi
Copy link
Contributor

@dkhaldi dkhaldi commented Oct 9, 2023

No description provided.

@dkhaldi dkhaldi marked this pull request as ready for review October 12, 2023 18:57
@dkhaldi dkhaldi requested a review from a team as a code owner October 12, 2023 18:57
@dkhaldi dkhaldi changed the title [SYCL][Matrix spec draft] Add joint_matrix_prefetch and overloads of load/store with annotated_ptr [SYCL][Matrix spec] Add joint_matrix_prefetch and overloads of load/store with annotated_ptr Oct 19, 2023
@dkhaldi
Copy link
Contributor Author

dkhaldi commented Oct 19, 2023

@gmlueck, gentle ping to review the spec of overloads of load/store on annotated pointer and prefetch.

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should have some documentation about which properties can be used with joint_matrix_load and joint_matrix_store. The same goes for joint_matrix_prefetch. That function takes a property list, but there's no documentation about which properties you can pass.

@dkhaldi
Copy link
Contributor Author

dkhaldi commented Oct 23, 2023

It seems like we should have some documentation about which properties can be used with joint_matrix_load and joint_matrix_store. The same goes for joint_matrix_prefetch. That function takes a property list, but there's no documentation about which properties you can pass.

I added some documentation as we will use existing properties for cache control and cache levels. Is what I added enough?

@dkhaldi
Copy link
Contributor Author

dkhaldi commented Dec 6, 2023

The current spec might change as Row and Cols should be literals or const expr like in the load instructions otherwise the IGC 2d load instructions cannot be generated.
@fda0, please confirm your findings here as well.

@dkhaldi
Copy link
Contributor Author

dkhaldi commented Dec 21, 2023

@gmlueck, as we discussed on Monday, I put the prefetch size as template arguments since they must be constexpr for the implementation.
Can you please review/approve?

dm-vodopyanov pushed a commit that referenced this pull request Dec 21, 2023
typename PropertyListT>
void joint_matrix_load(Group g,
joint_matrix<Group, T1, use::accumulator, Rows, Cols, layout::dynamic> &res,
ext::oneapi::experimental::annotated_ptr<T2, PropertyListT> src,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ext::oneapi::experimental::annotated_ptr<T2, PropertyListT> src,
annotated_ptr<T2, PropertyListT> src,

No need for this namespace because the function is also defined in that namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this function is defined in ext::oneapi::experimental::matrix namespace
annotated_ptr is defined in ext::oneapi::experimental namespace
Do you still think I should remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This function declaration has visibility to identifiers defined in any of the parent namespaces, which includes ext::oneapi::experimental.

Comment on lines 283 to 285
syclex::properties{syclintelex::read_assertion<
syclintelex::cache_control<syclintelex::cache_mode::constant,
syclex::cache_level::L2>>}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
syclex::properties{syclintelex::read_assertion<
syclintelex::cache_control<syclintelex::cache_mode::constant,
syclex::cache_level::L2>>}};
syclex::properties{syclintelex::read_hint<
syclintelex::cache_control<syclintelex::cache_mode::cached,
syclex::cache_level::L2>>}};

The comment below says "this load will be cached to L2". Assuming that's what you want the example to show, then you want read_hint here, not read_assertion.

Comment on lines 307 to 312
template <typename Group, typename T1, typename T2, size_t Rows, size_t Cols,
typename PropertyListT>
void joint_matrix_store(Group g,
const joint_matrix<Group, T1, use::accumulator, Rows, Cols, layout::dynamic> &res,
ext::oneapi::experimental::annotated_ptr<T2, PropertyListT> dest,
size_t stride, layout Layout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename Group, typename T1, typename T2, size_t Rows, size_t Cols,
typename PropertyListT>
void joint_matrix_store(Group g,
const joint_matrix<Group, T1, use::accumulator, Rows, Cols, layout::dynamic> &res,
ext::oneapi::experimental::annotated_ptr<T2, PropertyListT> dest,
size_t stride, layout Layout);
template <typename Group, typename T, size_t Rows, size_t Cols,
typename PropertyListT>
void joint_matrix_store(Group g,
const joint_matrix<Group, T, use::accumulator, Rows, Cols, layout::dynamic> &res,
ext::oneapi::experimental::annotated_ptr<T, PropertyListT> dest,
size_t stride, layout Layout);

The other overload of joint_matrix_store has only one T template parameter (not two), so I presume you want the same here.

Note, however, in another PR I suggested changing joint_matrix_store to have two template parameters T1 and T2. So, I do think it makes sense for both overloads to have these two template parameters. However, I think you should make that change in a separate PR, which changes both overloads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Let's add T1 and T2 in the other PR #12276
Whichever PR gets merged first, the other one will follow.


```c++
using syclex = sycl::ext::oneapi::experimental;
using syclintelex = sycl::ext::intel::experimental;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line. syclintelex isn't used in this example.

The level of cache targeted by `joint_matrix_prefetch` in the last
argument is specified using the compile-time properties defined in the
SYCL extension
link:../../proposed/sycl_ext_oneapi_prefetch.asciidoc[sycl_ext_oneapi_prefetch.asciidoc#specifying-cache-levels]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
link:../../proposed/sycl_ext_oneapi_prefetch.asciidoc[sycl_ext_oneapi_prefetch.asciidoc#specifying-cache-levels]
link:../../proposed/sycl_ext_oneapi_prefetch.asciidoc[sycl_ext_oneapi_prefetch]

I'd suggest pointing just at the extension as a whole, rather than at a specification section, as I have here. Note that the previous link wasn't right anyway, so it wasn't actually pointing at the section.

argument is specified using the compile-time properties defined in the
SYCL extension
link:../../proposed/sycl_ext_oneapi_prefetch.asciidoc[sycl_ext_oneapi_prefetch.asciidoc#specifying-cache-levels]
as illustrated in the example below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also describe the behavior when there are no properties. For example:

... as illustrated in the example below. When no cache levels are specified, the default behavior is to prefetch into the lowest level cache (i.e. L1).

[Example]

@dkhaldi
Copy link
Contributor Author

dkhaldi commented Feb 20, 2024

@intel/llvm-gatekeepers, can you please merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants