-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][CUDA] Update program manager and queue to resolve multi-targeting issues #4921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1313,9 +1313,12 @@ static bool compatibleWithDevice(RTDeviceBinaryImage *BinImage, | |
pi_uint32 SuitableImageID = std::numeric_limits<pi_uint32>::max(); | ||
pi_device_binary DevBin = | ||
const_cast<pi_device_binary>(&BinImage->getRawData()); | ||
Plugin.call<PiApiKind::piextDeviceSelectBinary>( | ||
RT::PiResult Error = Plugin.call_nocheck<PiApiKind::piextDeviceSelectBinary>( | ||
PIDeviceHandle, &DevBin, | ||
/*num bin images = */ (cl_uint)1, &SuitableImageID); | ||
if (Error != PI_SUCCESS && Error != PI_INVALID_BINARY) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are effects of allowing invalid binary for the caller? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
throw runtime_error("Invalid binary image or device", PI_INVALID_VALUE); | ||
|
||
return (0 == SuitableImageID); | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
// REQUIRES: cuda || hip_be | ||
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple,spir64 %s -o - | ||
// RUN: %clangxx -fsycl -fsycl-targets=spir64,%sycl_triple %s -o - | ||
// | ||
// RUN: %clangxx -fsycl -fsycl-device-code-split=per_kernel -fsycl-targets=%sycl_triple,spir64 %s -o - | ||
// RUN: %clangxx -fsycl -fsycl-device-code-split=per_kernel -fsycl-targets=spir64,%sycl_triple %s -o - | ||
// | ||
// Test checks that compiling for multiple devices works regardless of target | ||
// order. | ||
|
||
#include <sycl/sycl.hpp> | ||
|
||
using namespace cl::sycl; | ||
|
||
int main() { | ||
sycl::queue q; | ||
|
||
float A_Data[5] = {1.1}; | ||
float B_Data[5] = {0}; | ||
int C_Data[10] = {0}; | ||
|
||
{ | ||
buffer<float, 1> A_buff(A_Data, range<1>(5)); | ||
buffer<float, 1> B_buff(B_Data, range<1>(5)); | ||
q.submit([&](handler &cgh) { | ||
auto A_acc = A_buff.get_access<access::mode::read>(cgh); | ||
auto B_acc = B_buff.get_access<access::mode::write>(cgh); | ||
cgh.parallel_for(range<1>{5}, | ||
[=](id<1> index) { B_acc[index] = A_acc[index]; }); | ||
}).wait(); | ||
} | ||
|
||
{ | ||
buffer<int, 1> C_buff(C_Data, range<1>(10)); | ||
q.submit([&](handler &cgh) { | ||
auto C_acc = C_buff.get_access<access::mode::write>(cgh); | ||
cgh.parallel_for(range<1>{10}, [=](id<1> index) { C_acc[index] = 15; }); | ||
}).wait(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition shouldn't be modified at the moment as it's due CUDA native support of assertions.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was something I was struggling with for a while. When building for both cuda and opencl it can fail, based on the target order (#3631), because there is a kernel in
spirv64
which is not innvptx64
_ZTSN2cl4sycl6detail23__sycl_service_kernel__16AssertInfoCopierE
. The program_manager requires binary images to either be identical or disjoint. So I am proposing removing this to create a version of this kernel which does not perform any action; similar toamdgcn
which will support native asserts but generates this kernel. I am not seeing any failing tests as a result of this change. Please let me know if there is a better way of achieving this. Many thanks!