-
Notifications
You must be signed in to change notification settings - Fork 790
[SYCL][ESIMD] Add capability to specify 64 bit offsets to esimd functions #7411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 918f2d0.
Complementary test PR : intel/llvm-test-suite#1385 |
Test failures in |
scatter(Tx *p, simd<uint32_t, N> offsets, simd<Tx, N> vals, | ||
template <typename Tx, int N, class T = detail::__raw_t<Tx>, typename Toffset> | ||
__ESIMD_API std::enable_if_t<detail::isPowerOf2(N, 32) && | ||
(std::is_same_v<Toffset, uint32_t> || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this can break existing code. E.g. int offsets will now cause compilation error, but should have been working in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only code breaking problem I saw so far is when offset is a simd_view i.e. something like:
scatter(p, x.select<1,8>(0), vals)
and the reason is that simd_view converts to simd using operator() which can convert the simd_view to simd of any type. It wasn't the issue when functions received only a single type but becomes the issue when function can accept multiple types so the compiler can't choose which type to use.
Using multiple functions instead of template i.e. having scatter(Tx *p, simd<uint32_t, N> offsets, simd<Tx, N> vals)
and scatter(Tx *p, simd<uint64_t, N> offsets, simd<Tx, N> vals)
doesn't solve the problem as the compiler faces the same issue.
I am not sure that the code we have now is good one as it would accept even vector of floats without any issue (I have 2 tests where offsets and vals were flipped and did not cause any compilation issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backward compatibility of non-experimental code must not be broken. This is a hard requirement. The only exception is approved list of APIs changed in major releases.
The only code breaking problem I saw so far is when offset is a simd_view i.e. something like:
scatter(p, x.select<1,8>(0), vals)
that is a problem too.
I tried short
as offset type for scatter - it works now. So it should continue to work. My suggestion is:
- create
template<class OffsetT> ... scatter_impl(..., simd<OffsetT, N> offsets,...)
in namespace detail. - create
...scatter(Tx *p, simd<uint64_t, N> offsets,...
- have
...scatter(Tx *p, simd<uint64_t, N> offsets,...
and...scatter(Tx *p, simd<uint32_t, N> offsets,...
delegate to scatter_impl.
I am not sure that the code we have now is good one as it would accept even vector of floats without any issue (I have 2 tests where offsets and vals were flipped and did not cause any compilation issue
Implicit conversion between arithmetic types in most contexts is basic C++ behavior, esimd::simd
tries to follow. The user logic error you mention is desirable to protect against, of course, but even the enable_if_t you added does not fully shield from this if element type is uint32/64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed type checks for offset types, so it should be now compatible with existing code.
I believe the suggested approach would break the interface as there is ambiguity which function to use.
For example here is the error I got when I tried to use this approach with test where offsets were defined as vector of floats while it compiles perfectly with current version with removed offset type checks. I believe there will be similar issues when offset types other than uint32_t or uint64_t will be provided.
/home/gregory/src/dpc/sysl_workspace/work/tests/llvm-test-suite/SYCL/ESIMD/Stencil.cpp:179:17: error: call to 'scatter' is ambiguous
scatter<float, WIDTH>(outputMatrix, sum, elm16_off, p);
^~~~~~~~~~~~~~~~~~~~~
/home/gregory/src/dpc/sysl_workspace/work/llvm/build/bin/../include/sycl/ext/intel/esimd/memory.hpp:218:1: note: candidate function [with Tx = float, N = 16, T = float]
scatter(Tx *p, simd<uint32_t, N> offsets, simd<Tx, N> vals,
^
/home/gregory/src/dpc/sysl_workspace/work/llvm/build/bin/../include/sycl/ext/intel/esimd/memory.hpp:225:1: note: candidate function [with Tx = float, N = 16, T = float]
scatter(Tx *p, simd<uint64_t, N> offsets, simd<Tx, N> vals,
int N, typename Toffset> | ||
__ESIMD_API std::enable_if_t<(N == 8 || N == 16 || N == 32) && sizeof(T) == 4 && | ||
(std::is_same_v<Toffset, uint32_t> || | ||
std::is_same_v<Toffset, uint64_t>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure there is a test with 64-bit offset and N.
scatter_rgba(T *p, simd<uint32_t, N> offsets, | ||
int N, typename Toffset> | ||
__ESIMD_API std::enable_if_t<(N == 8 || N == 16 || N == 32) && sizeof(T) == 4 && | ||
(std::is_same_v<Toffset, uint32_t> || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to use the same mechanism for template parameter checking in all APIs. Here std::enable_if_t
is used, but in lsc_gather
below - static_assert.
static_assert
seems better choice, as gives clearer idea of the problem to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that non-lsc API uses std::enable_if_t
approach while lsc API uses static_assert
.
We probably need to decide on common approach between these 2 APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what I suggest - use static_assert everywhere
__ESIMD_API simd<Tx, N> atomic_update(Tx *p, simd<unsigned, N> offset, | ||
simd<Tx, N> src0, simd_mask<N> mask) { | ||
template <atomic_op Op, typename Tx, int N, typename Toffset> | ||
__ESIMD_API std::enable_if_t<std::is_same_v<Toffset, uint32_t> || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here an in other places - template parameter checking approach should be consistent in all APIs.
This reverts commit 70f9020.
template <typename Tx, int N, class T = detail::__raw_t<Tx>> | ||
__ESIMD_API std::enable_if_t<detail::isPowerOf2(N, 32), simd<Tx, N>> | ||
gather(const Tx *p, simd<uint32_t, N> offsets, simd_mask<N> mask = 1) { | ||
template <typename Tx, int N, class T = detail::__raw_t<Tx>, typename Toffset> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in other places:
Parameters with default values (T
) should go after parameters w/o default values.
Actually, T calculation should be moved out of the parameter list and replaced with using T = detail::__raw_t<Tx>
, it is never supposed to be set by the user.
scatter_rgba(T *p, simd_view<Toffset, RegionTy> offsets, | ||
simd<T, N * get_num_channels_enabled(RGBAMask)> vals, | ||
simd_mask<N> mask = 1) { | ||
using Ty = typename simd_view<Toffset, RegionTy>::element_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add assert that the number of offsets matches the number of vals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not expected to have the same number of elements. One element in offsets controls multiple elements in vals as specified by mask. It is enforced by template specialization
@@ -611,6 +647,17 @@ scatter_rgba(T *p, simd<uint32_t, N> offsets, | |||
addrs.data(), vals.data(), mask.data()); | |||
} | |||
|
|||
template <rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in other new overloads:
please add doxygen, and explain what is the difference with the other scatter_rgba
overload.
/// Loads ("gathers") elements from different memory locations and returns a | ||
/// vector of them. Each memory location is base address plus an offset - a | ||
/// value of the corresponding element in the input offset vector. Access to | ||
/// any element's memory location can be disabled via the input vector of | ||
/// predicates (mask). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in other places:
/// Loads ("gathers") elements from different memory locations and returns a | |
/// vector of them. Each memory location is base address plus an offset - a | |
/// value of the corresponding element in the input offset vector. Access to | |
/// any element's memory location can be disabled via the input vector of | |
/// predicates (mask). | |
/// A variation of \c gather API with \c offsets represented as a \c simd_view object. |
@@ -580,6 +613,39 @@ gather_rgba(const T *p, simd<Toffset, N> offsets, simd_mask<N> mask = 1) { | |||
addrs.data(), mask.data()); | |||
} | |||
|
|||
/// @anchor usm_gather_rgba |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here my comment above is especially relevant.
@@ -148,6 +152,19 @@ gather(const Tx *p, simd<uint32_t, N> offsets, simd_mask<N> mask = 1) { | |||
mask.data()); | |||
} | |||
|
|||
/// A variation of \c gather API with \c offsets represented as \c simd_view | |||
/// object | |||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review request was actually to get rid of description duplication only. Please return parameter doxygen comments.
Looks like more fixes are needed (might be a test problem):
|
It is known issue and it is fixed in test PR. |
/verify with intel/llvm-test-suite#1385 |
waiting for comments on failed tests |
/verify with intel/llvm-test-suite#1385 |
/verify with intel/llvm-test-suite#1385 |
Please provide analysis of the failed tests (or state which ones are unrelated) |
Test failures in |
/verify with intel/llvm-test-suite#1385 |
Test failures in |
llvm-test-suite failures HostInteropTask/host-task-failure.cpp |
No description provided.