-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL][ESIMD] Add 8/16-bit type support to lsc_block_load/store #6757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # sycl/include/sycl/ext/intel/experimental/esimd/memory.hpp
Test PR: intel/llvm-test-suite#1249 |
} else if constexpr (_DS == lsc_data_size::u16) { | ||
static_assert(NElts % 2 == 0, | ||
"Number of elements is not supported by Transposed store"); | ||
detail::check_lsc_vector_size<NElts / 2>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we choose to translate loads/stores of 8 and 16-bit elements to loads/stores of 32-bit elements, then that allows load/store of (MaxN * 2) 16-bit elements and (MaxN * 4) 8-bit elements.
Don't we want to use 64-bit loads/stores when it is possible? In those cases it would allow loading/storing 2x longer memory chunk of 8/16-bit elements, i.e. MaxN4 for 16-bit elements and MaxN8 for 8-bit elements.
Update: I just realized that the attempt to support bigger/longer vectors of 8/16-bit vectors using 64-bit loads/stores would require stricter requirement to address alignment.
@kbobrovs - FYI. Please confirm we choose to make users responsible to align addresses to 4-byte for 8-16-bit lsc_block_load/stores. and that we don't choose emulation via 64-bit loads/stores, that would require 8-bit alignment. What is your opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, @v-klochkov. We need to understand what are requirements for the hardware instruction first, and add them to the doxygen comments of the function as a requirement. Similarly to https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/intel/esimd/memory.hpp#L272.
Note that for block_load we also have alignment template parameter, depending on which one or the other intrinsic is generated. Here I think we always generate single intrinsic and it does not allow to convey alignment info, right? Otherwise we might need to add alignment parameter here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the address passed to intrinsic (and HW instruction) must be 4-byte aligned for load/stores of 32-bit elements, and 8-byte aligned for 64-bit element vectors loads/stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fineg74 can you please add a comment explaining the requirement to alignment of the address passed to the intrinsics.
return __esimd_lsc_load_stateless<T, L1H, L3H, _AddressScale, _ImmOffset, _DS, | ||
_VS, _Transposed, N>(pred.data(), | ||
addrs.data()); | ||
constexpr int SmallIntFactor = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extend documentation for the lsc_block_load/store and make sure checks verify the constraints:
/// Accesses contiguous block of memory of `NElts * S` bytes starting from
/// given address, where S is a byte size of an "element" defined by the \c DS
/// template parameter. The maximum size of accessed block is 512 bytes for PVC
/// and 256 bytes for ACM (DG2).
/// When \с DS equals \с lsc_data_size::u64, the address must be 8-byte aligned,
/// otherwise - 4-bytes aligned. Allowed values for the data size are
/// \с lsc_data_size::u32 and \с lsc_data_size::u64. Allowed NElts values are
/// 1, 2, 3, 4, 8, 16, 32, 64.
/// Note that to access 512 bytes, DS must be \с lsc_data_size::u64 and \c NElts
/// must be 64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding. Some improvements are still needed:
- The text I suggested works only for block load/stores (_Transpose=1). Sorry if I was unclear. For gather/scatter constraints are different (e.g.
u8u32
иu16u32
data sizes are also allowed) - Please do not duplicate text. I suggest to add it to one of the APIs and refer to it from others.
"access" works for both load and store. E.g. you can add "See comments in the block_load API for description and parameter constraints" to block_store
Also, please add '3' to allowed NElts
. (We should have a test on this too) - I missed that initially.
The only functions that were changed in this PR are lsc_block_load/store. I didn't touch gather/scatter and didn't update documentation for these functions. |
No description provided.