Skip to content

[SYCL] Fix two reduction bugs #7347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 10, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions sycl/include/sycl/reduction.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -915,7 +915,7 @@ struct NDRangeReduction<
auto &PartialSumsBuf = Redu.getTempBuffer(NWorkGroups * NElements, CGH);
accessor PartialSums(PartialSumsBuf, CGH, sycl::read_write, sycl::no_init);

bool IsUpdateOfUserVar = !Reduction::is_usm && !Redu.initializeToIdentity();
bool IsUpdateOfUserVar = !Redu.initializeToIdentity();
auto Rest = [&](auto NWorkGroupsFinished) {
local_accessor<int, 1> DoReducePartialSumsInLastWG{1, CGH};

Expand Down Expand Up @@ -1480,8 +1480,6 @@ template <> struct NDRangeReduction<reduction::strategy::basic> {
// group size may be not power of two. Those two cases considered
// inefficient as they require additional code and checks in the kernel.
bool HasUniformWG = NWorkGroups * WGSize == NWorkItems;
if (!Reduction::has_fast_reduce)
HasUniformWG = HasUniformWG && (WGSize & (WGSize - 1)) == 0;

// Get read accessor to the buffer that was used as output
// in the previous kernel.
Expand All @@ -1493,7 +1491,7 @@ template <> struct NDRangeReduction<reduction::strategy::basic> {
!Redu.initializeToIdentity() &&
NWorkGroups == 1;

bool UniformPow2WG = HasUniformWG;
bool UniformPow2WG = HasUniformWG && (WGSize & (WGSize - 1)) == 0;
// Use local memory to reduce elements in work-groups into 0-th element.
// If WGSize is not power of two, then WGSize+1 elements are allocated.
// The additional last element is used to catch elements that could
Expand Down