Skip to content

[SYCL] Adjust for all Dims offset in accessor's device __init #6560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 18, 2022

Conversation

aelovikov-intel
Copy link
Contributor

@aelovikov-intel aelovikov-intel commented Aug 10, 2022

The optimization done for 1-dim accessor is suitable for all dimensions.

The test sycl/test/gdb/accessors-device.cpp had to be updated as its previous
implementation was fragile in capturing info it tried to verify.

The optimization done for 1-dim accessor is suitable for all dimensions.
@aelovikov-intel
Copy link
Contributor Author

@elizabethandrews , can you please look at the test's change?

@aelovikov-intel
Copy link
Contributor Author

Failures on

Failed Tests (1):
  SYCL :: Basic/code_location_e2e.cpp

are unrelated and have been addressed already.

@aelovikov-intel
Copy link
Contributor Author

@sergey-semenov , @elizabethandrews , gentle ping.

sergey-semenov
sergey-semenov previously approved these changes Aug 16, 2022
Copy link
Contributor

@sergey-semenov sergey-semenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header part looks good to me. @elizabethandrews Could you please take a look at the test?

@aelovikov-intel
Copy link
Contributor Author

OCL failures

********************
Failed Tests (1):
  SYCL :: XPTI/kernel/content.cpp

********************
Unexpectedly Passed Tests (2):

  SYCL :: SubGroup/reduce_spirv13.cpp
  SYCL :: SubGroup/reduce_spirv13_fp64.cpp

are known.

@aelovikov-intel
Copy link
Contributor Author

@intel/dpcpp-esimd-reviewers , ping.

Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for ESIMD only part.
I still have some questions for general/sycl part of the changes, but relying on @sergey-semenov opinion for @intel/llvm-reviewers-runtime part of the changes.

@aelovikov-intel
Copy link
Contributor Author

Ok for ESIMD only part. I still have some questions for general/sycl part of the changes, but relying on @sergey-semenov opinion for @intel/llvm-reviewers-runtime part of the changes.

Resolved via a call.

@aelovikov-intel
Copy link
Contributor Author

@intel/llvm-gatekeepers , PR is ready.

@steffenlarsen steffenlarsen merged commit 7c58b9a into intel:sycl Aug 18, 2022
aelovikov-intel added a commit to aelovikov-intel/llvm-test-suite that referenced this pull request Aug 23, 2022
Internally, we run the test suite with optimizations disabled and this
test started to fail after intel/llvm#6560.
Adust it to pass again.
againull pushed a commit to intel/llvm-test-suite that referenced this pull request Aug 24, 2022
…O0 (#1174)

Internally, we run the test suite with optimizations disabled and this
test started to fail after intel/llvm#6560.
Adust it to pass again.
@aelovikov-intel aelovikov-intel deleted the accessor_init_offset branch August 25, 2022 20:10
aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Oct 3, 2022
The utility was introduced in intel#6560 because
"#pragma unroll" doesn't always work and template-based solution is much more
reliable. Original PR only changed the loops that resulted in immediate
performance difference but other occurrences were missed. This PR updates
remaining ones. Note that I've found them by looking into the LLVM IR produced
by our device compiler and having the loop really unrolled improves readability
of such dumps (and most likely codesize/perf, although not significantly).
pvchupin pushed a commit that referenced this pull request Oct 3, 2022
The utility was introduced in #6560
because "#pragma unroll" doesn't always work and template-based solution
is much more reliable. Original PR only changed the loops that resulted
in immediate performance difference but other occurrences were missed.
This PR updates remaining ones. Note that I've found them by looking
into the LLVM IR produced by our device compiler and having the loop
really unrolled improves readability of such dumps (and most likely
codesize/perf, although not significantly).
aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…O0 (intel/llvm-test-suite#1174)

Internally, we run the test suite with optimizations disabled and this
test started to fail after intel#6560.
Adust it to pass again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants