Skip to content

Commit c541c22

Browse files
authored
[SYCL][Graph][CUDA] Skip unsupported Windows E2E tests (#13764)
On a CUDA & Windows setup when shared USM is used there is an issue with using the allocations concurrently in both device commands and host-tasks. This is based on an underlying CUDA restriction: https://forums.developer.nvidia.com/t/cudamallocmanaged-clarification-needed/67611 > Applications running on Windows (whether in TCC or WDDM mode) or macOS will use the basic Unified Memory model as on pre-6.x architectures even when they are running on hardware with compute capability 6.x or higher.” > “Simultaneous access to managed memory on devices of compute capability lower than 6.x is not possible,” Therefore, simultaneous access to managed memory on Windows is not possible. This appears in SYCL-Graph tests where the graph has multiple roots, allowing host-tasks branching from one root to run concurrently with device commands from the other root. With the issue manifesting as a page fault in the host-task when trying to access a USM allocation. I've created a more minimal test `test-e2e/USM/host-task.cpp` which exhibits the same issues.
1 parent 761a5f2 commit c541c22

File tree

5 files changed

+61
-0
lines changed

5 files changed

+61
-0
lines changed

sycl/test-e2e/Graph/Explicit/host_task2_multiple_roots.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@
77

88
// REQUIRES: aspect-usm_shared_allocations
99

10+
// Concurrent access to shared USM allocations is not supported by CUDA on
11+
// Windows
12+
// UNSUPPORTED: cuda && windows
13+
1014
#define GRAPH_E2E_EXPLICIT
1115

1216
#include "../Inputs/host_task2_multiple_roots.cpp"

sycl/test-e2e/Graph/Explicit/host_task_multiple_roots.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@
77

88
// REQUIRES: aspect-usm_shared_allocations
99

10+
// Concurrent access to shared USM allocations is not supported by CUDA on
11+
// Windows
12+
// UNSUPPORTED: cuda && windows
13+
1014
#define GRAPH_E2E_EXPLICIT
1115

1216
#include "../Inputs/host_task_multiple_roots.cpp"

sycl/test-e2e/Graph/RecordReplay/host_task2_multiple_roots.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@
77

88
// REQUIRES: aspect-usm_shared_allocations
99

10+
// Concurrent access to shared USM allocations is not supported by CUDA on
11+
// Windows
12+
// UNSUPPORTED: cuda && windows
13+
1014
#define GRAPH_E2E_RECORD_REPLAY
1115

1216
#include "../Inputs/host_task2_multiple_roots.cpp"

sycl/test-e2e/Graph/RecordReplay/host_task_multiple_roots.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@
77

88
// REQUIRES: aspect-usm_shared_allocations
99

10+
// Concurrent access to shared USM allocations is not supported by CUDA on
11+
// Windows
12+
// UNSUPPORTED: cuda && windows
13+
1014
#define GRAPH_E2E_RECORD_REPLAY
1115

1216
#include "../Inputs/host_task_multiple_roots.cpp"

sycl/test-e2e/USM/host_task.cpp

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
// RUN: %{build} -o %t.out
2+
// RUN: %{run} %t.out
3+
4+
// Concurrent access to shared USM allocations is not supported by CUDA on
5+
// Windows, this occurs when the host-task and device kernel both access
6+
// USM without a dependency between the commands.
7+
// UNSUPPORTED: cuda && windows
8+
9+
// REQUIRES: aspect-usm_shared_allocations
10+
11+
#include <sycl/sycl.hpp>
12+
13+
int main() {
14+
using namespace sycl;
15+
queue Queue{};
16+
17+
constexpr size_t Size = 1024;
18+
int *PtrA = malloc_shared<int>(Size, Queue);
19+
int *PtrB = malloc_shared<int>(Size, Queue);
20+
21+
Queue.submit([&](handler &CGH) {
22+
CGH.parallel_for(range<1>(Size), [=](item<1> id) { PtrA[id] = id; });
23+
});
24+
25+
const int ConstValue = 42;
26+
Queue.submit([&](handler &CGH) {
27+
CGH.host_task([=]() {
28+
for (size_t i = 0; i < Size; i++) {
29+
PtrB[i] = ConstValue;
30+
}
31+
});
32+
});
33+
34+
Queue.wait_and_throw();
35+
36+
for (size_t i = 0; i < Size; i++) {
37+
assert(i == PtrA[i]);
38+
assert(ConstValue == PtrB[i]);
39+
}
40+
41+
free(PtrA, Queue);
42+
free(PtrB, Queue);
43+
44+
return 0;
45+
}

0 commit comments

Comments
 (0)