Skip to content

[NFC][SYCL] Replace #pragma unroll with dim_loop in accessor.hpp #6939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 3, 2022

Conversation

aelovikov-intel
Copy link
Contributor

The utility was introduced in #6560 because "#pragma unroll" doesn't always work and template-based solution is much more reliable. Original PR only changed the loops that resulted in immediate performance difference but other occurrences were missed. This PR updates remaining ones. Note that I've found them by looking into the LLVM IR produced by our device compiler and having the loop really unrolled improves readability of such dumps (and most likely codesize/perf, although not significantly).

The utility was introduced in intel#6560 because
"#pragma unroll" doesn't always work and template-based solution is much more
reliable. Original PR only changed the loops that resulted in immediate
performance difference but other occurrences were missed. This PR updates
remaining ones. Note that I've found them by looking into the LLVM IR produced
by our device compiler and having the loop really unrolled improves readability
of such dumps (and most likely codesize/perf, although not significantly).
@aelovikov-intel aelovikov-intel requested a review from a team as a code owner October 3, 2022 15:46
Copy link
Contributor

@bso-intel bso-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aelovikov-intel
Copy link
Contributor Author

@intel/llvm-gatekeepers PR is ready.

@pvchupin pvchupin merged commit fee486e into intel:sycl Oct 3, 2022
whitneywhtsang added a commit to whitneywhtsang/llvm that referenced this pull request Oct 28, 2022
whitneywhtsang added a commit to whitneywhtsang/llvm that referenced this pull request Nov 3, 2022
@aelovikov-intel aelovikov-intel deleted the unroll branch November 8, 2022 20:53
bader pushed a commit that referenced this pull request Feb 22, 2023
…backend. (#7948)

A performance regression was reported when using `reduce_over_group`
with sycl::vec.
This was due to a loop over calls to the scalar `reduce_over_group` for
each of the `sycl::vec` components that was not unrolled and led to
register spills even at -O3.
It was initially possible to fix the performance by calling `#pragma
unroll` and declare `reduce_over_group` with
`__attribute__((always_inline))`. However the `SYCL_UNROLL` macro that
calls `#pragma unroll` has been removed in favour of `dim_loop`
(#6939).
I have used dim_loop to fix the loop unrolling. However, in the cuda
backend, just using `dim_loop` in this way actually makes the
performance worse. This is because `dim_loop` introduces new non inlined
function calls in the cuda backend that lead to register spills. The
solution to this coincides with the solution of several user reports
that the cuda backend is not aggressive enough with inlining. In this PR
I have also therefore increased the inlining threshold multiplier value
to 11.

See https://reviews.llvm.org/D142232/new/ for the corresponding upstream
PR (for the inlining threshold change) that includes much more details
on benchmarking dpc++ cuda with this change. In short, for dpc++ cuda,
there is no other downside apart from a very small increase in compile
time in some cases, but there is a massive benefit to increasing the
inlining threshold across a large amount of applications.

Testing using opencl cpu backend reveals that this code change has no
effect on this backend. This change is required for the cuda backend but
should have no performance effect for other backends.

fixes #6583.

---------

Signed-off-by: JackAKirk <[email protected]>
Co-authored-by: JackAKirk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants