Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

[SYCL][CUDA] Add test for a multi-device context #1102

Open
wants to merge 13 commits into
base: intel
Choose a base branch
from
Open
1 change: 1 addition & 0 deletions SYCL/Basic/alloc_pinned_host_memory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,5 @@ int main() {
// CHECK:---> piMemBufferCreate
// CHECK:---> piMemBufferCreate
// CHECK-NEXT: {{.*}} : {{.*}}
// CHECK-NEXT: {{.*}} : {{.*}}
// CHECK-NEXT: {{.*}} : 17
1 change: 1 addition & 0 deletions SYCL/Basic/buffer/native_buffer_creation_flags.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ int main() {
// buffer is created with the PI_MEM_FLAGS_HOST_PTR_USE flag.
// CHECK: piMemBufferCreate
// CHECK-NEXT: {{.*}} : {{.*}}
// CHECK-NEXT: {{.*}} : {{.*}}
// CHECK-NEXT: {{.*}} : 9
auto BufAcc = Buf.get_access<access::mode::read>(Cgh);
Cgh.single_task<Foo>([=]() { int A = BufAcc[0]; });
Expand Down
64 changes: 64 additions & 0 deletions SYCL/Basic/multi_device_context.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
// RUN: %CPU_RUN_PLACEHOLDER %t.out
// RUN: %GPU_RUN_PLACEHOLDER %t.out

// REQUIRES: cuda

#include <stdlib.h>
#include <sycl/sycl.hpp>

sycl::event add(sycl::queue &q, sycl::buffer<int> &buff, int *usm,
sycl::event &e) {
return q.submit([&](sycl::handler &cgh) {
auto acc = buff.get_access<sycl::access::mode::read_write>(cgh);
cgh.depends_on(e);
cgh.single_task([=]() { acc[0] += *usm; });
});
}

int main() {
sycl::platform plat = sycl::platform::get_platforms()[0];
auto devices = plat.get_devices();
if (devices.size() < 2) {
std::cout << "Need two devices for the test!" << std::endl;
return 0;
}

sycl::device dev1 = devices[0];
sycl::device dev2 = devices[1];

sycl::context ctx{{dev1, dev2}};

sycl::queue q1{ctx, dev1};
sycl::queue q2{ctx, dev2};

int a = 1;
int b = 2;
{
sycl::buffer<int> buff1(&a, 1);
sycl::buffer<int> buff2(&b, 1);

// Test copying usm.
int *usm1 = sycl::malloc_device<int>(1, q1);
int *usm2 = sycl::malloc_device<int>(1, q2);
sycl::event e1 = q1.fill(usm1, 4, 1);
sycl::event e2 = q2.fill(usm2, 5, 1);

// Test combination of usm and buffers in a kernel.
sycl::event e3 = add(q1, buff1, usm1, e1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the main purpose of the test? Can we use explict q.wait() to eliminate the need for all the events?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check that common operations work when using two devices within the same context, including transferring buffer data between the devices and waiting on event , associated with an operation on another device. So while we could change each event synchronization with a wait on one or both of the queues, that would reduce the scope of the test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, would the test somewhat reliably fail if we didn't pass the events around?

In other words, would race condition manifest itself on a reasonable HW with such small workloads?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe a race condition would be relatively unlikely to manifest in such a test. But I still prefer events, as a race condition is not the only possible mode of failure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also using waits would not make code much more readable, as there would need to be a wait call after every operation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every other, I think. Anyway, I'm fine with events.

as a race condition is not the only possible mode of failure

For my education, what are the others here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine a deadlock.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something completely unexpected is also always an option.

sycl::event e4 = add(q2, buff2, usm2, e2);

// Change values in usm to ensure results are distinct.
sycl::event e5 = q1.fill(usm1, 5, 1, e3);
sycl::event e6 = q2.fill(usm2, 4, 1, e4);

// Use each buffer on the other device than before - tests that copying
// between devices works.
add(q1, buff2, usm1, e5);
add(q2, buff1, usm2, e6);
}
assert(a == 1 + 2 * 4);
assert(b == 2 + 2 * 5);

return 0;
}
1 change: 1 addition & 0 deletions SYCL/Basic/use_pinned_host_memory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,5 @@ int main() {

// CHECK:---> piMemBufferCreate
// CHECK-NEXT: {{.*}} : {{.*}}
// CHECK-NEXT: {{.*}} : {{.*}}
// CHECK-NEXT: {{.*}} : 17
1 change: 1 addition & 0 deletions SYCL/Tracing/pi_tracing_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
// CHECK: <nullptr>
// CHECK: ---> piMemBufferCreate(
// CHECK-NEXT: <unknown> : {{0[xX]?[0-9a-fA-F]*}}
// CHECK-NEXT: <unknown> : {{0[xX]?[0-9a-fA-F]*}}
// CHECK-NEXT: <unknown> : 1
// CHECK-NEXT: <unknown> : 40
// CHECK-NEXT: <unknown> : 0
Expand Down