Skip to content

Commit 08feb64

Browse files
[SYCL] Overlapping subbuffers with different sizes need separate piMemBufferPartition (#10890)
Before this PR we were creating a single PI sub-buffer (of a smaller size for the sycl/test-e2e/Basic/buffer/subbuffer_overlap.cpp added here). Suprisingly, on many platforms an "out-of-partition" access didn't cause any issues and all the modified memory was copied between host/device. However, that wasn't guaranteed by the PI calls SYCL RT was performing and for some scenarios/HW the device-to-host copy after the second kernel only brought back the smaller part of the original buffer corresponding to the first smaller subbuffer.
1 parent 093dae1 commit 08feb64

File tree

2 files changed

+45
-2
lines changed

2 files changed

+45
-2
lines changed

sycl/source/detail/scheduler/graph_builder.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -654,8 +654,7 @@ AllocaCommandBase *Scheduler::GraphBuilder::findAllocaForReq(
654654
const Requirement *TmpReq = AllocaCmd->getRequirement();
655655
Res &= AllocaCmd->getType() == Command::CommandType::ALLOCA_SUB_BUF;
656656
Res &= TmpReq->MOffsetInBytes == Req->MOffsetInBytes;
657-
Res &= TmpReq->MSYCLMemObj->getSizeInBytes() ==
658-
Req->MSYCLMemObj->getSizeInBytes();
657+
Res &= TmpReq->MAccessRange == Req->MAccessRange;
659658
Res &= AllowConst || !AllocaCmd->MIsConst;
660659
}
661660
return Res;
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
// RUN: %{build} -o %t.out
2+
// RUN: %{run} %t.out
3+
// RUN: env SYCL_PI_TRACE=-1 %{run} %t.out 2>&1 | FileCheck %s
4+
5+
#include <sycl/sycl.hpp>
6+
7+
int main() {
8+
sycl::queue q;
9+
sycl::buffer<int, 1> b{1024};
10+
sycl::id<1> start_offset{64};
11+
size_t size = 16;
12+
sycl::buffer<int, 1> sub1{b, start_offset, sycl::range<1>{size}};
13+
sycl::buffer<int, 1> sub2{b, start_offset, sycl::range<1>{size * 2}};
14+
15+
int idx = 0;
16+
for (auto &e : sycl::host_accessor{b})
17+
e = idx++ % size;
18+
19+
// CHECK: piMemBufferPartition
20+
// CHECK: pi_buffer_region origin/size : 256/64
21+
q.submit([&](sycl::handler &cgh) {
22+
sycl::accessor acc{sub1, cgh};
23+
cgh.parallel_for(size, [=](auto id) { acc[id] += 1; });
24+
});
25+
// CHECK: piMemBufferPartition
26+
// CHECK: pi_buffer_region origin/size : 256/128
27+
q.submit([&](sycl::handler &cgh) {
28+
sycl::accessor acc{sub2, cgh};
29+
cgh.parallel_for(size * 2, [=](auto id) { acc[id] -= 1; });
30+
});
31+
32+
// Print before asserts to ensure stream is flushed.
33+
for (auto &e : sycl::host_accessor{sub2})
34+
std::cout << e << " ";
35+
std::cout << std::endl;
36+
37+
idx = 0;
38+
for (auto &e : sycl::host_accessor{sub2}) {
39+
assert(e == idx % size - idx / size);
40+
++idx;
41+
}
42+
43+
return 0;
44+
}

0 commit comments

Comments
 (0)