Skip to content

[SYCL][ESIMD][EMU] LSC support for ESIMD_EMULATOR backend #6099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 31, 2022

Conversation

dongkyunahn-intel
Copy link
Contributor

36 out of 40 LSC tests are passing

@dongkyunahn-intel dongkyunahn-intel requested a review from a team as a code owner May 4, 2022 21:15
static_assert(ImmOffset == 0);
static_assert(DS != __ESIMD_ENS::lsc_data_size::u16u32h);

constexpr uint MASK = loadstoreAlignMask<Ty, VS, DS, N>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this down to the place of use. Same is true for Output, ElemCount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

constexpr uint MASK = loadstoreAlignMask<Ty, VS, DS, N>();
__ESIMD_DNS::vector_type_t<Ty, N * __ESIMD_EDNS::to_int<VS>()> Output = 0;

constexpr int ElemCount = __ESIMD_EDNS::to_int<VS>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think N's definition /// @tparam N is the number of channels (platform dependent). is wrong here and in many other places in this file. This is actually the simd size - the number of chunks of VS*sizeof(T) size. If we take rgba load/strore as reference, then VS would be the number of channels, not N. So ElemCount is actually the number of channels. It would be great if you could change N's definition to something like "the SIMD size of operation (the number of addresses to access)".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated.

ByteDistance += rawAddressIncrement<Ty, DS>(),
VecIdx += vectorIndexIncrement<N, _Transposed>()) {

Output[VecIdx] = *((Ty *)(addrs[AddrIdx] + ByteDistance));
Copy link
Contributor

@kbobrovs kbobrovs May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please hoist addrs[AddrIdx] out of the loop

BaseAddr = addrs[AddrIdx];
assert((BaseAddr  & MASK) == 0 && "Address Alignment Error!!");

    for (...) {
      Output[VecIdx] = *((Ty *)(BaseAddr + ByteDistance));
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Updating comments for definition of 'N' : SIMD operationsize
- addrs[] to BaseAddr
- Relocating variables close to the place of its use
- Elem* to Chanl*
@@ -260,7 +657,8 @@ __ESIMD_INTRIN void __esimd_raw_send_nbarrier_signal(
/// @tparam DS is the data size.
/// @tparam VS is the number of elements to load per address.
/// @tparam Transposed indicates if the data is transposed during the transfer.
/// @tparam N is the number of channels (platform dependent).
/// @tparam N is the SIMD size of operation (the number of addresses to access,
/// platform dependent).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what "platform-dependent" means for N, as this is what user passes in the code. Restrictions maybe platform-dependent, but that's normal situation for many other ESIMD API parameters, so I suggest to remove it here and on all other similar places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 'platform dependent' comment.

@dongkyunahn-intel
Copy link
Contributor Author

Unrelated failures from HIP AMDGPU LLVM Test Suite

Failed Tests (6):
  SYCL :: GroupAlgorithm/SYCL2020/exclusive_scan.cpp
  SYCL :: GroupAlgorithm/SYCL2020/inclusive_scan.cpp
  SYCL :: GroupAlgorithm/SYCL2020/reduce.cpp
  SYCL :: GroupAlgorithm/exclusive_scan.cpp
  SYCL :: GroupAlgorithm/inclusive_scan.cpp
  SYCL :: GroupAlgorithm/reduce.cpp

@pvchupin
Copy link
Contributor

pvchupin commented Jun 1, 2022

@dongkyunahn-intel, @kbobrovs, please check post-commit fails on Windows: https://github.com/intel/llvm/actions/runs/2416580740

@dongkyunahn-intel
Copy link
Contributor Author

Posted PR for fix - #6223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants