Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

Fix tests after enabling rounding-up optimizations when this_item call is used #699

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions SYCL/Basic/free_function_queries/free_function_queries.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
// FIXME: Investigate OS-agnostic failures
// REQUIRES: TEMPORARY_DISABLED
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
// RUN: %ACC_RUN_PLACEHOLDER %t.out
// RUN: %CPU_RUN_PLACEHOLDER %t.out
Expand Down Expand Up @@ -48,7 +46,8 @@ int main() {

auto that_item = sycl::ext::oneapi::experimental::this_item<1>();
results_acc[1] = that_item.get_id() == i;
results_acc[2] = that_item.get_range() == sycl::range<1>(n);
results_acc[2] =
that_item.get_range() == ((sycl::range<1>(n) + (31)) & ~(31));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 31 mean here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rounding up to the nearest divisible by 32. Formula: (x + 31) div 32

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why 32? Why not 42?

My understanding is that you are trying to guess work group size here, but you can't do that in a reliable way. Moreover, I don't quite understand why get_range has changed when range rounding was re-enabled for that kernel (due to removed check for this_item call in intel/llvm@e4791d1).

According to 4.9.4.2.2. parallel_for invoke:

Below is an example of invoking a SYCL kernel function with parallel_for using a lambda function and passing a SYCL item parameter. In this case, both the global id and global range are queryable. This variant of parallel_for is designed for when it is necessary to query the global range of the index space being executed across.

free function queries spec doesn't say anything special about item returned through this_item, so I assume that it should return global range.

If our implementation of range rounding alters the range returned by item, then I would say we have a bug in that feature and it shouldn't be enabled by default as it is now. Alternatively, if we can detect cases when that feature can't be enabled, then we should do so. My understanding is that we used to detect presence of this_item to disable the feature in cases where it doesn't work correctly, but by some reason we removed that code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @Pennycook here for awareness as well

acc[i]++;
});
});
Expand All @@ -63,7 +62,7 @@ int main() {
}

{
constexpr int checks_number{2};
constexpr int checks_number{3};
int results[checks_number]{};
{
sycl::buffer<int> buf(data, sycl::range<1>(n));
Expand All @@ -82,7 +81,9 @@ int main() {
auto that_id = sycl::ext::oneapi::experimental::this_id<1>();
results_acc[0] = i.get_id() == that_id;
auto that_item = sycl::ext::oneapi::experimental::this_item<1>();
results_acc[1] = i == that_item;
results_acc[1] = i == that_item.get_id();
results_acc[2] =
that_item.get_range() == ((sycl::range<1>(n) + (31)) & ~(31));
acc[i]++;
});
});
Expand Down