[SYCL] Use `dim_loop` to unroll loops in `reduce_over_group` in cuda backend. #7948

JackAKirk · 2023-01-06T18:04:36Z

A performance regression was reported when using reduce_over_group with sycl::vec.
This was due to a loop over calls to the scalar reduce_over_group for each of the sycl::vec components that was not unrolled and led to register spills even at -O3.
It was initially possible to fix the performance by calling #pragma unroll and declare reduce_over_group with __attribute__((always_inline)). However the SYCL_UNROLL macro that calls #pragma unroll has been removed in favour of dim_loop (#6939).
I have used dim_loop to fix the loop unrolling. However, in the cuda backend, just using dim_loop in this way actually makes the performance worse. This is because dim_loop introduces new non inlined function calls in the cuda backend that lead to register spills. The solution to this coincides with the solution of several user reports that the cuda backend is not aggressive enough with inlining. In this PR I have also therefore increased the inlining threshold multiplier value to 11.

See https://reviews.llvm.org/D142232/new/ for the corresponding upstream PR (for the inlining threshold change) that includes much more details on benchmarking dpc++ cuda with this change. In short, for dpc++ cuda, there is no other downside apart from a very small increase in compile time in some cases, but there is a massive benefit to increasing the inlining threshold across a large amount of applications.

Testing using opencl cpu backend reveals that this code change has no effect on this backend. This change is required for the cuda backend but should have no performance effect for other backends.

fixes #6583.

Signed-off-by: JackAKirk [email protected]

Signed-off-by: JackAKirk <[email protected]>

JackAKirk · 2023-01-06T18:25:19Z

Seems that dim_loop is favoured instead of #pragma unroll, which is making CI build fail here due to warnings. Marked as draft. I will try to get dim_loop to work correctly.

Signed-off-by: JackAKirk <[email protected]>

JackAKirk · 2023-02-03T13:58:25Z

Seems that dim_loop is favoured instead of #pragma unroll, which is making CI build fail here due to warnings. Marked as draft. I will try to get dim_loop to work correctly.

Done. I've updated the PR description accordingly.

Signed-off-by: JackAKirk <[email protected]>

JackAKirk · 2023-02-08T12:40:31Z

ping @intel/dpcpp-tools-reviewers

JackAKirk · 2023-02-16T16:11:33Z

Fixes #6583. esimd and opencl CI just didn't run this time and should be unrelated: they ran before and passed. There was an unrelated hip failure before but this time the hip ci passed.F

JackAKirk · 2023-02-16T16:13:36Z

Fixes #6583.

JackAKirk · 2023-02-21T11:32:33Z

ping @intel/dpcpp-tools-reviewers

@intel/dpcpp-tools-reviewers Would it be possible to get a review?
Thanks

bader · 2023-02-22T04:22:24Z

fixes #7948.

Please, fix the issue reference in the description. This is the id of this PR.

NOTE: to link PRs with issues, keywords must be used in the description. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword.

JackAKirk · 2023-02-22T09:31:43Z

fixes #7948.

Please, fix the issue reference in the description. This is the id of this PR.

NOTE: to link PRs with issues, keywords must be used in the description. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword.

I see thanks. Done.

use #pragma unroll and inline to stop reg spill.

c9cf800

Signed-off-by: JackAKirk <[email protected]>

JackAKirk requested a review from a team as a code owner January 6, 2023 18:04

JackAKirk requested a review from againull January 6, 2023 18:04

JackAKirk temporarily deployed to aws January 6, 2023 18:11 — with GitHub Actions Inactive

JackAKirk marked this pull request as draft January 6, 2023 18:15

JackAKirk changed the title ~~[SYCL] Use #pragma unroll and always_inline to stop reg spills in reduce_over_group.~~ [SYCL] Use dim_loop to stop reg spills in reduce_over_group in cuda backend. Feb 3, 2023

Switched to dim_loop impl and increased inline threshold.

cf0ea28

Signed-off-by: JackAKirk <[email protected]>

JackAKirk marked this pull request as ready for review February 3, 2023 13:57

JackAKirk requested a review from a team as a code owner February 3, 2023 13:57

JackAKirk temporarily deployed to aws February 3, 2023 14:18 — with GitHub Actions Inactive

JackAKirk temporarily deployed to aws February 3, 2023 14:50 — with GitHub Actions Inactive

Moved dim_loop to sycl/detail/helpers.hpp

fe71446

Signed-off-by: JackAKirk <[email protected]>

JackAKirk temporarily deployed to aws February 6, 2023 16:23 — with GitHub Actions Inactive

JackAKirk temporarily deployed to aws February 6, 2023 16:54 — with GitHub Actions Inactive

JackAKirk requested a review from aelovikov-intel February 7, 2023 10:00

JackAKirk changed the title ~~[SYCL] Use dim_loop to stop reg spills in reduce_over_group in cuda backend.~~ [SYCL] Use dim_loop to unroll loops in reduce_over_group in cuda backend. Feb 7, 2023

aelovikov-intel approved these changes Feb 7, 2023

View reviewed changes

JackAKirk closed this Feb 14, 2023

JackAKirk reopened this Feb 14, 2023

JackAKirk temporarily deployed to aws February 15, 2023 12:27 — with GitHub Actions Inactive

JackAKirk temporarily deployed to aws February 15, 2023 21:09 — with GitHub Actions Inactive

AlexeySachkov approved these changes Feb 21, 2023

View reviewed changes

bader merged commit c7bb4c1 into intel:sycl Feb 22, 2023

jle-quel mentioned this pull request Jun 2, 2023

Introduce complex's group algorithms argonne-lcf/SyclCPLX#41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Use `dim_loop` to unroll loops in `reduce_over_group` in cuda backend. #7948

[SYCL] Use `dim_loop` to unroll loops in `reduce_over_group` in cuda backend. #7948

Uh oh!

JackAKirk commented Jan 6, 2023 •

edited

Loading

Uh oh!

JackAKirk commented Jan 6, 2023 •

edited

Loading

Uh oh!

JackAKirk commented Feb 3, 2023

Uh oh!

JackAKirk commented Feb 8, 2023

Uh oh!

JackAKirk commented Feb 16, 2023 •

edited

Loading

Uh oh!

JackAKirk commented Feb 16, 2023

Uh oh!

JackAKirk commented Feb 21, 2023

Uh oh!

bader commented Feb 22, 2023

Uh oh!

JackAKirk commented Feb 22, 2023

Uh oh!

Uh oh!

[SYCL] Use dim_loop to unroll loops in reduce_over_group in cuda backend. #7948

[SYCL] Use dim_loop to unroll loops in reduce_over_group in cuda backend. #7948

Uh oh!

Conversation

JackAKirk commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Feb 3, 2023

Uh oh!

JackAKirk commented Feb 8, 2023

Uh oh!

JackAKirk commented Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Feb 16, 2023

Uh oh!

JackAKirk commented Feb 21, 2023

Uh oh!

bader commented Feb 22, 2023

Uh oh!

JackAKirk commented Feb 22, 2023

Uh oh!

Uh oh!

[SYCL] Use `dim_loop` to unroll loops in `reduce_over_group` in cuda backend. #7948

[SYCL] Use `dim_loop` to unroll loops in `reduce_over_group` in cuda backend. #7948

JackAKirk commented Jan 6, 2023 •

edited

Loading

JackAKirk commented Jan 6, 2023 •

edited

Loading

JackAKirk commented Feb 16, 2023 •

edited

Loading