-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][ESIMD] Introduce predicates for lsc_block_store/load #6688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Complementary Test PR: intel/llvm-test-suite#1194 |
#ifdef __ESIMD_FORCE_STATELESS_MEM | ||
return lsc_block_load<T, NElts, DS, L1H, L3H>( | ||
__ESIMD_DNS::accessorToPointer<T>(acc, offset)); | ||
__ESIMD_DNS::accessorToPointer<T>(acc, offset, pred)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pred
should apparently be an argument of lsc_block_load
, not accessorToPointer
.
This should have been caught by the intel/llvm-test-suite#1194 test, right? Do you have idea why it wasn't? Probably, the test should also compile/run with forced stateless memory access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several reasons:
- We do not run PVC tests in CI pipeline
- The test doesn't use __ESIMD_FORCE_STATELESS_MEM so the issue went unnoticed
@@ -442,6 +442,7 @@ lsc_gather(AccessorTy acc, __ESIMD_NS::simd<uint32_t, N> offsets, | |||
/// @tparam AccessorTy is the \ref sycl::accessor type. | |||
/// @param acc is the SYCL accessor. | |||
/// @param offset is the zero-based offset in bytes. | |||
/// @param pred is predicate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add description of the effect of the predicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#ifdef __ESIMD_FORCE_STATELESS_MEM | ||
lsc_block_store<T, NElts, DS, L1H>( | ||
__ESIMD_DNS::accessorToPointer<T>(acc, offset), vals); | ||
__ESIMD_DNS::accessorToPointer<T>(acc, offset), vals, pred); | ||
#else | ||
detail::check_lsc_vector_size<NElts>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure new functionality also works on ESIMD emulator, as lsc
accesses work on emulator. @dongkyunahn-intel can help with logistics.
Upd: I'm not sure if CI runs tests on emulator - probably only after merge as post-commit. So please verify your new tests work on emulator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I locally verified if two samples from intel/llvm-test-suite#1194 are working fine with this PR. While lsc_predicate.cpp
passed, lsc_predicate_stateless.cpp
failed during compilation.
Building dpcpp toolchain supporting ESIMD_EMULATOR and running ESIMD kernels on emulator requires some environment setup. Let me know if you need help in locally verifying your implementation with ESIMD_EMULATOR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After __ESIMD_FORCE_STATELESS_MEM
fix, both tests are passing.
$ SYCL/ESIMD/lsc/Output/lsc_predicate.cpp.tmp.out
Device name: ESIMD_EMULATOR
USM lsc predicate test passed
USM lsc predicate test passed
USM lsc predicate test passed
Accessor lsc predicate test passed
Accessor lsc predicate test passed
Accessor lsc predicate test passed
$ SYCL/ESIMD/lsc/Output/lsc_predicate_stateless.cpp.tmp.out
Device name: ESIMD_EMULATOR
Accessor lsc predicate test passed
Accessor lsc predicate test passed
Accessor lsc predicate test passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address comments.
lint failure occurs in files that are not part of this PR. |
/// @tparam L1H is L1 cache hint. | ||
/// @tparam L3H is L3 cache hint. | ||
/// @param p is the base pointer. | ||
/// @param pred is predicate to enable the operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// @param pred is predicate to enable the operation | |
/// @param pred operation predicate. Zero means operation is skipped entirely, non-zero - operation is performed. The default is '1' - perform the operation. |
@@ -442,6 +527,8 @@ lsc_gather(AccessorTy acc, __ESIMD_NS::simd<uint32_t, N> offsets, | |||
/// @tparam AccessorTy is the \ref sycl::accessor type. | |||
/// @param acc is the SYCL accessor. | |||
/// @param offset is the zero-based offset in bytes. | |||
/// @param pred is predicate to enable the operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here please also expand description a bit telling which values mean what
# Conflicts: # sycl/test/basic_tests/stdcpp_compat.cpp
@fineg74, please file internal ticket (if it does not exist yet) for CUDA testing failures. |
No description provided.