Skip to content

[SYCL][CUDA] Update program manager and queue to resolve multi-targeting issues #4921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions sycl/include/CL/sycl/queue.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@

// Helper macro to identify if fallback assert is needed
// FIXME remove __NVPTX__ condition once devicelib supports CUDA
#if !defined(SYCL_DISABLE_FALLBACK_ASSERT) && !defined(__NVPTX__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition shouldn't be modified at the moment as it's due CUDA native support of assertions.

Copy link
Contributor Author

@AidanBeltonS AidanBeltonS Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something I was struggling with for a while. When building for both cuda and opencl it can fail, based on the target order (#3631), because there is a kernel in spirv64 which is not in nvptx64 _ZTSN2cl4sycl6detail23__sycl_service_kernel__16AssertInfoCopierE. The program_manager requires binary images to either be identical or disjoint. So I am proposing removing this to create a version of this kernel which does not perform any action; similar to amdgcn which will support native asserts but generates this kernel. I am not seeing any failing tests as a result of this change. Please let me know if there is a better way of achieving this. Many thanks!

#if !defined(SYCL_DISABLE_FALLBACK_ASSERT)
#define __SYCL_USE_FALLBACK_ASSERT 1
#else
#define __SYCL_USE_FALLBACK_ASSERT 0
Expand Down Expand Up @@ -1188,11 +1188,11 @@ event submitAssertCapture(queue &Self, event &Event, queue *SecondaryQueue,
auto Acc = Buffer.get_access<access::mode::write>(CGH);

CGH.single_task<__sycl_service_kernel__::AssertInfoCopier>([Acc] {
#ifdef __SYCL_DEVICE_ONLY__
#if defined(__SYCL_DEVICE_ONLY__) && !defined(__NVPTX__)
__devicelib_assert_read(&Acc[0]);
#else
(void)Acc;
#endif // __SYCL_DEVICE_ONLY__
#endif // defined(__SYCL_DEVICE_ONLY__) && !defined(__NVPTX__)
});
};
auto CheckerCGF = [&CopierEv, &Buffer](handler &CGH) {
Expand Down
5 changes: 4 additions & 1 deletion sycl/source/detail/program_manager/program_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1313,9 +1313,12 @@ static bool compatibleWithDevice(RTDeviceBinaryImage *BinImage,
pi_uint32 SuitableImageID = std::numeric_limits<pi_uint32>::max();
pi_device_binary DevBin =
const_cast<pi_device_binary>(&BinImage->getRawData());
Plugin.call<PiApiKind::piextDeviceSelectBinary>(
RT::PiResult Error = Plugin.call_nocheck<PiApiKind::piextDeviceSelectBinary>(
PIDeviceHandle, &DevBin,
/*num bin images = */ (cl_uint)1, &SuitableImageID);
if (Error != PI_SUCCESS && Error != PI_INVALID_BINARY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are effects of allowing invalid binary for the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PI_INVALID_BINARY is returned when piextDeviceSelectBinary cannot find any suitable image within the passed list. This is a valid response, in this case, as it is checking if the binary is suitable for the plugin.

throw runtime_error("Invalid binary image or device", PI_INVALID_VALUE);

return (0 == SuitableImageID);
}

Expand Down
40 changes: 40 additions & 0 deletions sycl/test/regression/multi_targeting.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// REQUIRES: cuda || hip_be
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple,spir64 %s -o -
// RUN: %clangxx -fsycl -fsycl-targets=spir64,%sycl_triple %s -o -
//
// RUN: %clangxx -fsycl -fsycl-device-code-split=per_kernel -fsycl-targets=%sycl_triple,spir64 %s -o -
// RUN: %clangxx -fsycl -fsycl-device-code-split=per_kernel -fsycl-targets=spir64,%sycl_triple %s -o -
//
// Test checks that compiling for multiple devices works regardless of target
// order.

#include <sycl/sycl.hpp>

using namespace cl::sycl;

int main() {
sycl::queue q;

float A_Data[5] = {1.1};
float B_Data[5] = {0};
int C_Data[10] = {0};

{
buffer<float, 1> A_buff(A_Data, range<1>(5));
buffer<float, 1> B_buff(B_Data, range<1>(5));
q.submit([&](handler &cgh) {
auto A_acc = A_buff.get_access<access::mode::read>(cgh);
auto B_acc = B_buff.get_access<access::mode::write>(cgh);
cgh.parallel_for(range<1>{5},
[=](id<1> index) { B_acc[index] = A_acc[index]; });
}).wait();
}

{
buffer<int, 1> C_buff(C_Data, range<1>(10));
q.submit([&](handler &cgh) {
auto C_acc = C_buff.get_access<access::mode::write>(cgh);
cgh.parallel_for(range<1>{10}, [=](id<1> index) { C_acc[index] = 15; });
}).wait();
}
}