Skip to content

Commit 929a764

Browse files
[SYCL][DOC] Add a document covering linked allocation commands (#2863)
1 parent 6cfa921 commit 929a764

File tree

2 files changed

+77
-0
lines changed

2 files changed

+77
-0
lines changed

sycl/doc/LinkedAllocations.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Linked allocations
2+
3+
## Brief overview of allocations for memory objects
4+
5+
A SYCL memory object (`buffer`/`image`) can be accessed in multiple contexts
6+
throughout its lifetime. Since this is dynamic information that is unknown
7+
during memory object construction, no allocation takes place at that point.
8+
Instead, memory is allocated in each context whenever the SYCL memory object
9+
is first accessed there:
10+
11+
```
12+
cl::sycl::buffer<int, 1> buf{cl::sycl::range<1>(1)}; // No allocation here
13+
14+
cl::sycl::queue q;
15+
q.submit([&](cl::sycl::handler &cgh){
16+
// First access to buf in q's context: allocate memory
17+
auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);
18+
...
19+
});
20+
21+
// First access to buf on host (assuming q is not host): allocate memory
22+
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
23+
```
24+
25+
In the DPCPP execution graph these allocations are represented by allocation
26+
command nodes (`cl::sycl::detail::AllocaCommand`). A finished allocation
27+
command means that the associated memory object is ready for its first use in
28+
that context, but for host allocation commands it might be the case that no
29+
actual memory allocation takes place: either because it is possible to reuse the
30+
data pointer provided by the user:
31+
32+
```
33+
int val;
34+
cl::sycl::buffer<int, 1> buf{&val, cl::sycl::range<1>(1)};
35+
36+
// An alloca command is created, but it does not allocate new memory: &val
37+
// is reused instead.
38+
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
39+
```
40+
41+
Or because a mapped host pointer obtained from a native device memory object
42+
is used in its place (as is the case for linked commands, covered below).
43+
44+
## Linked allocation commands
45+
46+
Whenever an allocation command is created for a memory object, it can be created
47+
as "linked" to another one if they satisfy these requirements:
48+
- Both allocation commands are associated with the same memory object.
49+
- Exactly one of the two commands is associated with a host context.
50+
- Neither of the commands is already linked.
51+
52+
The idea behind linked commands is that the device allocation of the pair is
53+
supposed to reuse the host allocation, i.e. the host memory is requested to be
54+
shared between the two (the underlying backend is still free to ignore that
55+
request and allocate additional memory if needed). The difference in handling
56+
linked and unlinked allocations is summarized in the table below.
57+
58+
| | Unlinked | Linked |
59+
| - | -------- | ------ |
60+
| Native memory object creation | Created with COPY_HOST_PTR if a host pointer is available and the first access mode does not discard the data. | Created with USE_HOST_PTR if a suitable host pointer is available, regardless of the first access mode. |
61+
| Host allocation command behaviour | Skipped if a suitable user host pointer is available. | In addition to skipping the allocation if a suitable user pointer is provided, the allocation is also skipped if the host command is created after its linked counterpart (it's retrieved via map operation instead). |
62+
| Memory transfer | Performed with read/write operations, device-to-device transfer is done with a host allocation as an intermediary (direct transfer is not supported by PI). | Only one allocation from the pair can be active at a time, the switch is done with map/unmap operations. Device-to-device transfer where one of the device allocations is linked is done with the host allocation from the pair as an intermediary (e.g. for transfer from unlinked device allocation A to linked device allocation B: map B -> read A to the host allocation -> unmap B). |
63+
64+
## Command linking approach
65+
66+
Whenever two allocation commands are considered for linking, the decision is
67+
made based on the following criterion: the commands are linked if and only if
68+
the non-host device of the pair supports host unified memory (i.e. the device
69+
and host share the same physical memory). The motivation for this is two-fold:
70+
- If the non-host device supports host unified memory, the USE_HOST_PTR flag
71+
should not result in any additional device memory allocation or copying between
72+
the two during map/unmap operations.
73+
- Even if the point above makes no difference for a particular pair of
74+
allocations (e.g. no host pointer is available for the device allocation),
75+
it might be possible to exploit that later in the application for another device
76+
that does support host unified memory.

sycl/doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,4 @@ Developing oneAPI DPC++ Compiler
3030
SpecializationConstants
3131
KernelProgramCache
3232
GlobalObjectsInRuntime
33+
LinkedAllocations

0 commit comments

Comments
 (0)