Skip to content

[SYCL][ESIMD] Add capability to specify 64 bit offsets to esimd functions #7411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 14, 2022

Conversation

fineg74
Copy link
Contributor

@fineg74 fineg74 commented Nov 16, 2022

No description provided.

@fineg74
Copy link
Contributor Author

fineg74 commented Nov 16, 2022

Complementary test PR : intel/llvm-test-suite#1385

@fineg74
Copy link
Contributor Author

fineg74 commented Nov 16, 2022

Test failures in
SYCL :: ESIMD/Stencil.cpp
SYCL :: ESIMD/stencil2.cpp
are expected and are fixed in test PR

scatter(Tx *p, simd<uint32_t, N> offsets, simd<Tx, N> vals,
template <typename Tx, int N, class T = detail::__raw_t<Tx>, typename Toffset>
__ESIMD_API std::enable_if_t<detail::isPowerOf2(N, 32) &&
(std::is_same_v<Toffset, uint32_t> ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this can break existing code. E.g. int offsets will now cause compilation error, but should have been working in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only code breaking problem I saw so far is when offset is a simd_view i.e. something like:
scatter(p, x.select<1,8>(0), vals) and the reason is that simd_view converts to simd using operator() which can convert the simd_view to simd of any type. It wasn't the issue when functions received only a single type but becomes the issue when function can accept multiple types so the compiler can't choose which type to use.
Using multiple functions instead of template i.e. having scatter(Tx *p, simd<uint32_t, N> offsets, simd<Tx, N> vals) and scatter(Tx *p, simd<uint64_t, N> offsets, simd<Tx, N> vals) doesn't solve the problem as the compiler faces the same issue.
I am not sure that the code we have now is good one as it would accept even vector of floats without any issue (I have 2 tests where offsets and vals were flipped and did not cause any compilation issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backward compatibility of non-experimental code must not be broken. This is a hard requirement. The only exception is approved list of APIs changed in major releases.

The only code breaking problem I saw so far is when offset is a simd_view i.e. something like:
scatter(p, x.select<1,8>(0), vals)

that is a problem too.
I tried short as offset type for scatter - it works now. So it should continue to work. My suggestion is:

  1. create template<class OffsetT> ... scatter_impl(..., simd<OffsetT, N> offsets,...) in namespace detail.
  2. create ...scatter(Tx *p, simd<uint64_t, N> offsets,...
  3. have ...scatter(Tx *p, simd<uint64_t, N> offsets,... and ...scatter(Tx *p, simd<uint32_t, N> offsets,... delegate to scatter_impl.

I am not sure that the code we have now is good one as it would accept even vector of floats without any issue (I have 2 tests where offsets and vals were flipped and did not cause any compilation issue

Implicit conversion between arithmetic types in most contexts is basic C++ behavior, esimd::simd tries to follow. The user logic error you mention is desirable to protect against, of course, but even the enable_if_t you added does not fully shield from this if element type is uint32/64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed type checks for offset types, so it should be now compatible with existing code.
I believe the suggested approach would break the interface as there is ambiguity which function to use.
For example here is the error I got when I tried to use this approach with test where offsets were defined as vector of floats while it compiles perfectly with current version with removed offset type checks. I believe there will be similar issues when offset types other than uint32_t or uint64_t will be provided.

/home/gregory/src/dpc/sysl_workspace/work/tests/llvm-test-suite/SYCL/ESIMD/Stencil.cpp:179:17: error: call to 'scatter' is ambiguous
                scatter<float, WIDTH>(outputMatrix, sum, elm16_off, p);
                ^~~~~~~~~~~~~~~~~~~~~
/home/gregory/src/dpc/sysl_workspace/work/llvm/build/bin/../include/sycl/ext/intel/esimd/memory.hpp:218:1: note: candidate function [with Tx = float, N = 16, T = float]
scatter(Tx *p, simd<uint32_t, N> offsets, simd<Tx, N> vals,
^
/home/gregory/src/dpc/sysl_workspace/work/llvm/build/bin/../include/sycl/ext/intel/esimd/memory.hpp:225:1: note: candidate function [with Tx = float, N = 16, T = float]
scatter(Tx *p, simd<uint64_t, N> offsets, simd<Tx, N> vals,

int N, typename Toffset>
__ESIMD_API std::enable_if_t<(N == 8 || N == 16 || N == 32) && sizeof(T) == 4 &&
(std::is_same_v<Toffset, uint32_t> ||
std::is_same_v<Toffset, uint64_t>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure there is a test with 64-bit offset and N.

scatter_rgba(T *p, simd<uint32_t, N> offsets,
int N, typename Toffset>
__ESIMD_API std::enable_if_t<(N == 8 || N == 16 || N == 32) && sizeof(T) == 4 &&
(std::is_same_v<Toffset, uint32_t> ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to use the same mechanism for template parameter checking in all APIs. Here std::enable_if_t is used, but in lsc_gather below - static_assert.
static_assert seems better choice, as gives clearer idea of the problem to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that non-lsc API uses std::enable_if_t approach while lsc API uses static_assert.
We probably need to decide on common approach between these 2 APIs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I suggest - use static_assert everywhere

__ESIMD_API simd<Tx, N> atomic_update(Tx *p, simd<unsigned, N> offset,
simd<Tx, N> src0, simd_mask<N> mask) {
template <atomic_op Op, typename Tx, int N, typename Toffset>
__ESIMD_API std::enable_if_t<std::is_same_v<Toffset, uint32_t> ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here an in other places - template parameter checking approach should be consistent in all APIs.

template <typename Tx, int N, class T = detail::__raw_t<Tx>>
__ESIMD_API std::enable_if_t<detail::isPowerOf2(N, 32), simd<Tx, N>>
gather(const Tx *p, simd<uint32_t, N> offsets, simd_mask<N> mask = 1) {
template <typename Tx, int N, class T = detail::__raw_t<Tx>, typename Toffset>
Copy link
Contributor

@kbobrovs kbobrovs Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other places:
Parameters with default values (T) should go after parameters w/o default values.

Actually, T calculation should be moved out of the parameter list and replaced with using T = detail::__raw_t<Tx>, it is never supposed to be set by the user.

scatter_rgba(T *p, simd_view<Toffset, RegionTy> offsets,
simd<T, N * get_num_channels_enabled(RGBAMask)> vals,
simd_mask<N> mask = 1) {
using Ty = typename simd_view<Toffset, RegionTy>::element_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert that the number of offsets matches the number of vals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not expected to have the same number of elements. One element in offsets controls multiple elements in vals as specified by mask. It is enforced by template specialization

@@ -611,6 +647,17 @@ scatter_rgba(T *p, simd<uint32_t, N> offsets,
addrs.data(), vals.data(), mask.data());
}

template <rgba_channel_mask RGBAMask = rgba_channel_mask::ABGR, typename T,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other new overloads:
please add doxygen, and explain what is the difference with the other scatter_rgba overload.

Comment on lines 155 to 159
/// Loads ("gathers") elements from different memory locations and returns a
/// vector of them. Each memory location is base address plus an offset - a
/// value of the corresponding element in the input offset vector. Access to
/// any element's memory location can be disabled via the input vector of
/// predicates (mask).
Copy link
Contributor

@kbobrovs kbobrovs Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other places:

Suggested change
/// Loads ("gathers") elements from different memory locations and returns a
/// vector of them. Each memory location is base address plus an offset - a
/// value of the corresponding element in the input offset vector. Access to
/// any element's memory location can be disabled via the input vector of
/// predicates (mask).
/// A variation of \c gather API with \c offsets represented as a \c simd_view object.

@@ -580,6 +613,39 @@ gather_rgba(const T *p, simd<Toffset, N> offsets, simd_mask<N> mask = 1) {
addrs.data(), mask.data());
}

/// @anchor usm_gather_rgba
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here my comment above is especially relevant.

@@ -148,6 +152,19 @@ gather(const Tx *p, simd<uint32_t, N> offsets, simd_mask<N> mask = 1) {
mask.data());
}

/// A variation of \c gather API with \c offsets represented as \c simd_view
/// object
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review request was actually to get rid of description duplication only. Please return parameter doxygen comments.

@kbobrovs
Copy link
Contributor

kbobrovs commented Dec 1, 2022

Looks like more fixes are needed (might be a test problem):

******************** TEST 'SYCL :: ESIMD/Stencil.cpp' FAILED ********************
...
/__w/llvm/llvm/toolchain/bin/../include/sycl/ext/intel/esimd/memory.hpp:195:3: error: static assertion failed due to requirement 'std::is_integral_v<float>': Unsupported offset type
  static_assert(std::is_integral_v<Toffset>, "Unsupported offset type");
  ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~
/__w/llvm/llvm/llvm_test_suite/SYCL/ESIMD/Stencil.cpp:179:17: note: in instantiation of function template specialization 'sycl::_V1::ext::intel::esimd::scatter<float, 16, float>' requested here
                scatter<float, WIDTH>(outputMatrix, sum, elm16_off, p);
                ^
1 error generated.

@fineg74
Copy link
Contributor Author

fineg74 commented Dec 1, 2022

Looks like more fixes are needed (might be a test problem):

******************** TEST 'SYCL :: ESIMD/Stencil.cpp' FAILED ********************
...
/__w/llvm/llvm/toolchain/bin/../include/sycl/ext/intel/esimd/memory.hpp:195:3: error: static assertion failed due to requirement 'std::is_integral_v<float>': Unsupported offset type
  static_assert(std::is_integral_v<Toffset>, "Unsupported offset type");
  ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~
/__w/llvm/llvm/llvm_test_suite/SYCL/ESIMD/Stencil.cpp:179:17: note: in instantiation of function template specialization 'sycl::_V1::ext::intel::esimd::scatter<float, 16, float>' requested here
                scatter<float, WIDTH>(outputMatrix, sum, elm16_off, p);
                ^
1 error generated.

It is known issue and it is fixed in test PR.

@fineg74
Copy link
Contributor Author

fineg74 commented Dec 1, 2022

/verify with intel/llvm-test-suite#1385

@kbobrovs
Copy link
Contributor

kbobrovs commented Dec 5, 2022

waiting for comments on failed tests

@fineg74
Copy link
Contributor Author

fineg74 commented Dec 5, 2022

/verify with intel/llvm-test-suite#1385

@fineg74
Copy link
Contributor Author

fineg74 commented Dec 7, 2022

/verify with intel/llvm-test-suite#1385

@kbobrovs
Copy link
Contributor

kbobrovs commented Dec 8, 2022

Please provide analysis of the failed tests (or state which ones are unrelated)

@fineg74
Copy link
Contributor Author

fineg74 commented Dec 9, 2022

Test failures in
SYCL :: ESIMD/Stencil.cpp
SYCL :: ESIMD/stencil2.cpp
are expected and are fixed in test PR

@fineg74
Copy link
Contributor Author

fineg74 commented Dec 13, 2022

/verify with intel/llvm-test-suite#1385

@fineg74
Copy link
Contributor Author

fineg74 commented Dec 14, 2022

Test failures in
SYCL :: ESIMD/Stencil.cpp
SYCL :: ESIMD/stencil2.cpp
are expected and are fixed in test PR

@fineg74
Copy link
Contributor Author

fineg74 commented Dec 14, 2022

llvm-test-suite failures

HostInteropTask/host-task-failure.cpp
Basic/group_async_copy.cpp
DeviceLib/imf_fp16_trivial_test.cpp
DeviceLib/imf_fp32_test.cpp
DeviceLib/imf_half_type_cast.cpp
Reduction/reduction_big_data.cpp
Reduction/reduction_nd_N_vars.cpp
Reduction/reduction_nd_conditional.cpp
Reduction/reduction_nd_dw.cpp
Reduction/reduction_nd_ext_half.cpp
Reduction/reduction_nd_lambda.cpp
Reduction/reduction_nd_rw.cpp
Reduction/reduction_range_1d_dw.cpp
Reduction/reduction_range_1d_rw.cpp
Reduction/reduction_range_2d_dw.cpp
Reduction/reduction_range_2d_rw.cpp
Reduction/reduction_range_3d_dw.cpp
Reduction/reduction_range_3d_rw.cpp
Reduction/reduction_range_N_vars.cpp
Reduction/reduction_usm.cpp
Reduction/reduction_usm_dw.cpp
are not related to the change

@v-klochkov v-klochkov merged commit c63f802 into intel:sycl Dec 14, 2022
@fineg74 fineg74 deleted the 64bitOffset branch December 14, 2022 18:35
@fineg74 fineg74 restored the 64bitOffset branch May 17, 2023 17:03
@fineg74 fineg74 deleted the 64bitOffset branch May 17, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants