-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Adjust for all Dims offset in accessor's device __init #6560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Adjust for all Dims offset in accessor's device __init #6560
Conversation
The optimization done for 1-dim accessor is suitable for all dimensions.
7c97e36
to
de52645
Compare
@elizabethandrews , can you please look at the test's change? |
f4c56cb
to
77ee765
Compare
Failures on
are unrelated and have been addressed already. |
@sergey-semenov , @elizabethandrews , gentle ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header part looks good to me. @elizabethandrews Could you please take a look at the test?
OCL failures
are known. |
@intel/dpcpp-esimd-reviewers , ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for ESIMD only part.
I still have some questions for general/sycl part of the changes, but relying on @sergey-semenov opinion for @intel/llvm-reviewers-runtime part of the changes.
Resolved via a call. |
@intel/llvm-gatekeepers , PR is ready. |
Internally, we run the test suite with optimizations disabled and this test started to fail after intel/llvm#6560. Adust it to pass again.
…O0 (#1174) Internally, we run the test suite with optimizations disabled and this test started to fail after intel/llvm#6560. Adust it to pass again.
The utility was introduced in intel#6560 because "#pragma unroll" doesn't always work and template-based solution is much more reliable. Original PR only changed the loops that resulted in immediate performance difference but other occurrences were missed. This PR updates remaining ones. Note that I've found them by looking into the LLVM IR produced by our device compiler and having the loop really unrolled improves readability of such dumps (and most likely codesize/perf, although not significantly).
The utility was introduced in #6560 because "#pragma unroll" doesn't always work and template-based solution is much more reliable. Original PR only changed the loops that resulted in immediate performance difference but other occurrences were missed. This PR updates remaining ones. Note that I've found them by looking into the LLVM IR produced by our device compiler and having the loop really unrolled improves readability of such dumps (and most likely codesize/perf, although not significantly).
…O0 (intel/llvm-test-suite#1174) Internally, we run the test suite with optimizations disabled and this test started to fail after intel#6560. Adust it to pass again.
The optimization done for 1-dim accessor is suitable for all dimensions.
The test sycl/test/gdb/accessors-device.cpp had to be updated as its previous
implementation was fragile in capturing info it tried to verify.