Skip to content

[SYCL][ESIMD] Add 8/16-bit type support to lsc_block_load/store #6757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Sep 22, 2022

Conversation

fineg74
Copy link
Contributor

@fineg74 fineg74 commented Sep 10, 2022

No description provided.

@fineg74
Copy link
Contributor Author

fineg74 commented Sep 10, 2022

Test PR: intel/llvm-test-suite#1249

} else if constexpr (_DS == lsc_data_size::u16) {
static_assert(NElts % 2 == 0,
"Number of elements is not supported by Transposed store");
detail::check_lsc_vector_size<NElts / 2>();
Copy link
Contributor

@v-klochkov v-klochkov Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we choose to translate loads/stores of 8 and 16-bit elements to loads/stores of 32-bit elements, then that allows load/store of (MaxN * 2) 16-bit elements and (MaxN * 4) 8-bit elements.
Don't we want to use 64-bit loads/stores when it is possible? In those cases it would allow loading/storing 2x longer memory chunk of 8/16-bit elements, i.e. MaxN4 for 16-bit elements and MaxN8 for 8-bit elements.

Update: I just realized that the attempt to support bigger/longer vectors of 8/16-bit vectors using 64-bit loads/stores would require stricter requirement to address alignment.

@kbobrovs - FYI. Please confirm we choose to make users responsible to align addresses to 4-byte for 8-16-bit lsc_block_load/stores. and that we don't choose emulation via 64-bit loads/stores, that would require 8-bit alignment. What is your opinion on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, @v-klochkov. We need to understand what are requirements for the hardware instruction first, and add them to the doxygen comments of the function as a requirement. Similarly to https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/intel/esimd/memory.hpp#L272.

Note that for block_load we also have alignment template parameter, depending on which one or the other intrinsic is generated. Here I think we always generate single intrinsic and it does not allow to convey alignment info, right? Otherwise we might need to add alignment parameter here as well.

Copy link
Contributor

@v-klochkov v-klochkov Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the address passed to intrinsic (and HW instruction) must be 4-byte aligned for load/stores of 32-bit elements, and 8-byte aligned for 64-bit element vectors loads/stores.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fineg74 can you please add a comment explaining the requirement to alignment of the address passed to the intrinsics.

return __esimd_lsc_load_stateless<T, L1H, L3H, _AddressScale, _ImmOffset, _DS,
_VS, _Transposed, N>(pred.data(),
addrs.data());
constexpr int SmallIntFactor =
Copy link
Contributor

@kbobrovs kbobrovs Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend documentation for the lsc_block_load/store and make sure checks verify the constraints:

/// Accesses contiguous block of memory of `NElts * S` bytes  starting from
/// given address, where S is a byte size of an "element" defined by the \c DS
/// template parameter. The maximum size of accessed block is 512 bytes for PVC
/// and 256 bytes for ACM (DG2).
/// When \с DS equals \с lsc_data_size::u64, the address must be 8-byte aligned,
/// otherwise - 4-bytes aligned. Allowed values for the data size are
/// \с lsc_data_size::u32 and \с lsc_data_size::u64. Allowed NElts values are
/// 1, 2, 3, 4, 8, 16, 32, 64.
/// Note that to access 512 bytes, DS must be \с lsc_data_size::u64 and \c NElts
/// must be 64.

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding. Some improvements are still needed:

  • The text I suggested works only for block load/stores (_Transpose=1). Sorry if I was unclear. For gather/scatter constraints are different (e.g. u8u32 и u16u32 data sizes are also allowed)
  • Please do not duplicate text. I suggest to add it to one of the APIs and refer to it from others.
    "access" works for both load and store. E.g. you can add "See comments in the block_load API for description and parameter constraints" to block_store

Also, please add '3' to allowed NElts. (We should have a test on this too) - I missed that initially.

@fineg74
Copy link
Contributor Author

fineg74 commented Sep 21, 2022

Thanks for adding. Some improvements are still needed:

  • The text I suggested works only for block load/stores (_Transpose=1). Sorry if I was unclear. For gather/scatter constraints are different (e.g. u8u32 и u16u32 data sizes are also allowed)
  • Please do not duplicate text. I suggest to add it to one of the APIs and refer to it from others.
    "access" works for both load and store. E.g. you can add "See comments in the block_load API for description and parameter constraints" to block_store

Also, please add '3' to allowed NElts. (We should have a test on this too) - I missed that initially.

The only functions that were changed in this PR are lsc_block_load/store. I didn't touch gather/scatter and didn't update documentation for these functions.

@kbobrovs kbobrovs merged commit f9d8059 into intel:sycl Sep 22, 2022
@fineg74 fineg74 deleted the lsc_block branch September 27, 2022 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants