Skip to content

[SYCL][ESIMD] Implement unified memory API - block_store(acc,...) #11830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 15, 2023

Conversation

sarnex
Copy link
Contributor

@sarnex sarnex commented Nov 8, 2023

This change implements the new compile time properties API for block_store with accessors.

I also updated the old E2E test to fix alignment issues, added compile-time tests and updated the naming of the old functions.

@sarnex sarnex force-pushed the blockstoreacc branch 2 times, most recently from b8fa875 to 1ac4095 Compare November 9, 2023 15:36
@sarnex sarnex marked this pull request as ready for review November 9, 2023 17:44
@sarnex sarnex requested a review from a team as a code owner November 9, 2023 17:44
Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Nick, I finished the 1st code-review pass. Please see my comments.


auto surf_ind = __esimd_get_surface_index(
detail::AccessorPrivateProxy::getQualifiedPtrOrImageObj(acc));
__esimd_oword_st<Tx, N>(surf_ind, byte_offset >> 4, vals.data());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block_store(acc) has some difference comparing to block_load(acc).
The load may be lowered to __esimd_oword_ld, or __esimd_oword_ld_unaligned, while store can only be stored using aligned store.

This code expects from user that byte_offset is 16-byte aligned as it was previously (it was not explicitly told in the old comment section).

This place is a bit tricky.
We can choose to say that if (!(b) && (c)) then the function assumes the default alignment is 16 if no alignment props passed. Also, if 'alignment' property 4 or 8 is passed, then this function generates code requiring DG2 or PVC device.

Copy link
Contributor Author

@sarnex sarnex Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the in-depth review. I updated the function and doc to expect 16-byte alignment if the property is not passed and !b && C, and if alignment of < 16 is passed we require PVC/DG2, hopefully I did it correctly, please take a look


if constexpr (sizeof(T) >= 4)
Passed &=
testACC<T, 4, !CheckMask, CheckProperties>(Q, 2, 4, AlignElemProps);
Copy link
Contributor

@v-klochkov v-klochkov Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usage of 'AlignElemProps' is user's error here (because this code is targeted to Gen12 too). The test works because the actual alignment is 16: sizeof(T) >=4 and N=4 gives 16 or more bytes.

If user cannot guarantee the byte offsets to be 16-byte aligned, then the only option for us is to generate LSC call, but this part of the test is supposed to pass on Gen12.

Same for the next call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had in mind to make these tests more strict by doing something like:
If user says the alignment is 4-bytes, then intentionally manage the offset to be 4-bytes aligned and not 16-bytes aligned.
So, for these tests it would mean allocating a bit bigger buffers and add start reading with extra offset=4 (+4 to some 16-byte aligned boundary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the Gen12 ones to just use 16-byte alignment to not complicate the test even more

This change implements the new compile time properties API for
block_store with accessors.

I also updated the old E2E test to fix alignment issues,
added compile-time tests and updated the naming of the old functions.

Signed-off-by: Sarnie, Nick <[email protected]>
Signed-off-by: Sarnie, Nick <[email protected]>
@sarnex
Copy link
Contributor Author

sarnex commented Nov 13, 2023

Sorry for force push, I had to rebase

@sarnex
Copy link
Contributor Author

sarnex commented Nov 13, 2023

CI failures not related

Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had only 2 comments after the 2nd review pass.

Signed-off-by: Sarnie, Nick <[email protected]>
@againull againull merged commit 960d898 into intel:sycl Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants