Skip to content

Commit a1d0cc7

Browse files
author
iclsrc
committed
Merge from 'sycl' to 'sycl-web'
2 parents de88506 + 929a764 commit a1d0cc7

File tree

6 files changed

+127
-21
lines changed

6 files changed

+127
-21
lines changed

sycl/doc/LinkedAllocations.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Linked allocations
2+
3+
## Brief overview of allocations for memory objects
4+
5+
A SYCL memory object (`buffer`/`image`) can be accessed in multiple contexts
6+
throughout its lifetime. Since this is dynamic information that is unknown
7+
during memory object construction, no allocation takes place at that point.
8+
Instead, memory is allocated in each context whenever the SYCL memory object
9+
is first accessed there:
10+
11+
```
12+
cl::sycl::buffer<int, 1> buf{cl::sycl::range<1>(1)}; // No allocation here
13+
14+
cl::sycl::queue q;
15+
q.submit([&](cl::sycl::handler &cgh){
16+
// First access to buf in q's context: allocate memory
17+
auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);
18+
...
19+
});
20+
21+
// First access to buf on host (assuming q is not host): allocate memory
22+
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
23+
```
24+
25+
In the DPCPP execution graph these allocations are represented by allocation
26+
command nodes (`cl::sycl::detail::AllocaCommand`). A finished allocation
27+
command means that the associated memory object is ready for its first use in
28+
that context, but for host allocation commands it might be the case that no
29+
actual memory allocation takes place: either because it is possible to reuse the
30+
data pointer provided by the user:
31+
32+
```
33+
int val;
34+
cl::sycl::buffer<int, 1> buf{&val, cl::sycl::range<1>(1)};
35+
36+
// An alloca command is created, but it does not allocate new memory: &val
37+
// is reused instead.
38+
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
39+
```
40+
41+
Or because a mapped host pointer obtained from a native device memory object
42+
is used in its place (as is the case for linked commands, covered below).
43+
44+
## Linked allocation commands
45+
46+
Whenever an allocation command is created for a memory object, it can be created
47+
as "linked" to another one if they satisfy these requirements:
48+
- Both allocation commands are associated with the same memory object.
49+
- Exactly one of the two commands is associated with a host context.
50+
- Neither of the commands is already linked.
51+
52+
The idea behind linked commands is that the device allocation of the pair is
53+
supposed to reuse the host allocation, i.e. the host memory is requested to be
54+
shared between the two (the underlying backend is still free to ignore that
55+
request and allocate additional memory if needed). The difference in handling
56+
linked and unlinked allocations is summarized in the table below.
57+
58+
| | Unlinked | Linked |
59+
| - | -------- | ------ |
60+
| Native memory object creation | Created with COPY_HOST_PTR if a host pointer is available and the first access mode does not discard the data. | Created with USE_HOST_PTR if a suitable host pointer is available, regardless of the first access mode. |
61+
| Host allocation command behaviour | Skipped if a suitable user host pointer is available. | In addition to skipping the allocation if a suitable user pointer is provided, the allocation is also skipped if the host command is created after its linked counterpart (it's retrieved via map operation instead). |
62+
| Memory transfer | Performed with read/write operations, device-to-device transfer is done with a host allocation as an intermediary (direct transfer is not supported by PI). | Only one allocation from the pair can be active at a time, the switch is done with map/unmap operations. Device-to-device transfer where one of the device allocations is linked is done with the host allocation from the pair as an intermediary (e.g. for transfer from unlinked device allocation A to linked device allocation B: map B -> read A to the host allocation -> unmap B). |
63+
64+
## Command linking approach
65+
66+
Whenever two allocation commands are considered for linking, the decision is
67+
made based on the following criterion: the commands are linked if and only if
68+
the non-host device of the pair supports host unified memory (i.e. the device
69+
and host share the same physical memory). The motivation for this is two-fold:
70+
- If the non-host device supports host unified memory, the USE_HOST_PTR flag
71+
should not result in any additional device memory allocation or copying between
72+
the two during map/unmap operations.
73+
- Even if the point above makes no difference for a particular pair of
74+
allocations (e.g. no host pointer is available for the device allocation),
75+
it might be possible to exploit that later in the application for another device
76+
that does support host unified memory.

sycl/doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,4 @@ Developing oneAPI DPC++ Compiler
3030
SpecializationConstants
3131
KernelProgramCache
3232
GlobalObjectsInRuntime
33+
LinkedAllocations

sycl/plugins/level_zero/pi_level_zero.cpp

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -448,12 +448,6 @@ pi_result _pi_context::finalize() {
448448

449449
// Destroy the command list used for initializations
450450
ZE_CALL(zeCommandListDestroy(ZeCommandListInit));
451-
452-
// Destruction of some members of pi_context uses L0 context
453-
// and therefore it must be valid at that point.
454-
// Technically it should be placed to the destructor of pi_context
455-
// but this makes API error handling more complex.
456-
ZE_CALL(zeContextDestroy(ZeContext));
457451
return PI_SUCCESS;
458452
}
459453

@@ -1830,9 +1824,22 @@ pi_result piContextRelease(pi_context Context) {
18301824

18311825
assert(Context);
18321826
if (--(Context->RefCount) == 0) {
1827+
auto ZeContext = Context->ZeContext;
1828+
18331829
// Clean up any live memory associated with Context
18341830
pi_result Result = Context->finalize();
1831+
1832+
// We must delete Context first and then destroy zeContext because
1833+
// Context deallocation requires ZeContext in some member deallocation of
1834+
// pi_context.
18351835
delete Context;
1836+
1837+
// Destruction of some members of pi_context uses L0 context
1838+
// and therefore it must be valid at that point.
1839+
// Technically it should be placed to the destructor of pi_context
1840+
// but this makes API error handling more complex.
1841+
ZE_CALL(zeContextDestroy(ZeContext));
1842+
18361843
return Result;
18371844
}
18381845
return PI_SUCCESS;

sycl/source/detail/error_handling/enqueue_kernel.cpp

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,14 @@ bool handleInvalidWorkGroupSize(const device_impl &DeviceImpl, pi_kernel Kernel,
8383
NDRDesc.LocalSize[1] != CompileWGSize[1] ||
8484
NDRDesc.LocalSize[2] != CompileWGSize[2])
8585
throw sycl::nd_range_error(
86-
"Specified local size doesn't match the required work-group size "
87-
"specified in the program source",
86+
"The specified local size {" + std::to_string(NDRDesc.LocalSize[0]) +
87+
", " + std::to_string(NDRDesc.LocalSize[1]) + ", " +
88+
std::to_string(NDRDesc.LocalSize[2]) +
89+
"} doesn't match the required work-group size specified "
90+
"in the program source {" +
91+
std::to_string(CompileWGSize[0]) + ", " +
92+
std::to_string(CompileWGSize[1]) + ", " +
93+
std::to_string(CompileWGSize[2]) + "}",
8894
PI_INVALID_WORK_GROUP_SIZE);
8995
}
9096
if (IsOpenCL) {
@@ -185,11 +191,22 @@ bool handleInvalidWorkGroupSize(const device_impl &DeviceImpl, pi_kernel Kernel,
185191
Opts.find("-cl-std=CL2.0") != string_class::npos;
186192
const bool RequiresUniformWGSize =
187193
Opts.find("-cl-uniform-work-group-size") != string_class::npos;
194+
std::string LocalWGSize = std::to_string(NDRDesc.LocalSize[0]) +
195+
", " +
196+
std::to_string(NDRDesc.LocalSize[1]) +
197+
", " + std::to_string(NDRDesc.LocalSize[2]);
198+
std::string GlobalWGSize =
199+
std::to_string(NDRDesc.GlobalSize[0]) + ", " +
200+
std::to_string(NDRDesc.GlobalSize[1]) + ", " +
201+
std::to_string(NDRDesc.GlobalSize[2]);
188202
std::string message =
189203
LocalExceedsGlobal
190-
? "Local workgroup size greater than global range size. "
191-
: "Global_work_size not evenly divisible by "
192-
"local_work_size. ";
204+
? "Local work-group size {" + LocalWGSize +
205+
"} is greater than global range size {" + GlobalWGSize +
206+
"}. "
207+
: "Global work size {" + GlobalWGSize +
208+
"} is not evenly divisible by local work-group size {" +
209+
LocalWGSize + "}. ";
193210
if (!HasStd20)
194211
throw sycl::nd_range_error(
195212
message.append(

sycl/test/on-device/basic_tests/parallel_for_range.cpp

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -57,10 +57,10 @@ int main() {
5757
"thrown\n";
5858
return 1; // We shouldn't be here, exception is expected
5959
} catch (nd_range_error &E) {
60-
if (string_class(E.what()).find("Specified local size doesn't match "
61-
"the required work-group size "
62-
"specified in the program source") ==
63-
string_class::npos) {
60+
if (string_class(E.what()).find("The specified local size {8, 8, 8} "
61+
"doesn't match the required work-group "
62+
"size specified in the program source "
63+
"{4, 4, 4}") == string_class::npos) {
6464
std::cerr
6565
<< "Test case ReqdWGSizeNegativeA failed: unexpected exception: "
6666
<< E.what() << std::endl;
@@ -670,7 +670,8 @@ int main() {
670670
}
671671
} catch (nd_range_error &E) {
672672
if (string_class(E.what()).find(
673-
"Global_work_size not evenly divisible by local_work_size. "
673+
"Global work size {100, 1, 1} is not evenly divisible "
674+
"by local work-group size {3, 1, 1}. "
674675
"Non-uniform work-groups are not allowed by when "
675676
"-cl-uniform-work-group-size flag is used. Underlying "
676677
"OpenCL 2.x implementation supports this feature, but it is "
@@ -720,7 +721,8 @@ int main() {
720721
}
721722
} catch (nd_range_error &E) {
722723
if (string_class(E.what()).find(
723-
"Global_work_size not evenly divisible by local_work_size. "
724+
"Global work size {16, 33, 100} is not evenly divisible by "
725+
"local work-group size {5, 3, 2}. "
724726
"Non-uniform work-groups are not allowed by when "
725727
"-cl-uniform-work-group-size flag is used. Underlying "
726728
"OpenCL 2.x implementation supports this feature, but it is "
@@ -773,7 +775,8 @@ int main() {
773775
}
774776
} catch (nd_range_error &E) {
775777
if (string_class(E.what()).find(
776-
"Local workgroup size greater than global range size. "
778+
"Local work-group size {17, 1, 1} is greater than global range "
779+
"size {16, 1, 1}. "
777780
"Non-uniform work-groups are not allowed by when "
778781
"-cl-uniform-work-group-size flag is used. Underlying "
779782
"OpenCL 2.x implementation supports this feature, but it is "
@@ -824,7 +827,8 @@ int main() {
824827
}
825828
} catch (nd_range_error &E) {
826829
if (string_class(E.what()).find(
827-
"Local workgroup size greater than global range size. "
830+
"Local work-group size {7, 2, 2} is greater than global range "
831+
"size {6, 6, 6}. "
828832
"Non-uniform work-groups are not allowed by when "
829833
"-cl-uniform-work-group-size flag is used. Underlying "
830834
"OpenCL 2.x implementation supports this feature, but it is "

sycl/test/on-device/basic_tests/reqd_work_group_size.cpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,9 @@ int main() {
9999
return 1; // We shouldn't be here, exception is expected
100100
} catch (nd_range_error &E) {
101101
if (string_class(E.what()).find(
102-
"Specified local size doesn't match the required work-group size "
103-
"specified in the program source") == string_class::npos) {
102+
"The specified local size {8, 8, 8} doesn't match the required "
103+
"work-group size specified in the program source {4, 4, 4}") ==
104+
string_class::npos) {
104105
std::cerr
105106
<< "Test case ReqdWGSizeNegativeA failed: unexpected nd_range_error "
106107
"exception: "

0 commit comments

Comments
 (0)