Skip to content

Commit 64640f2

Browse files
[SYCL][DOC] Add a document covering linked allocation commands
1 parent 8905a8c commit 64640f2

File tree

1 file changed

+74
-0
lines changed

1 file changed

+74
-0
lines changed

sycl/doc/LinkedAllocations.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
## Brief overview of allocations for memory objects
2+
3+
A SYCL memory object (`buffer`/`image`) can be accessed in multiple contexts
4+
throughout its lifetime. Since this is dynamic information that is unknown
5+
during memory object construction, no allocation takes place at that point.
6+
Instead, memory is allocated in each context whenever the SYCL memory object
7+
is first accessed there:
8+
9+
```C++
10+
cl::sycl::buffer<int, 1> buf{cl::sycl::range<1>(1)}; // No allocation here
11+
12+
cl::sycl::queue q;
13+
q.submit([&](cl::sycl::handler &cgh){
14+
// First access to buf in q's context: allocate memory
15+
auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);
16+
...
17+
});
18+
19+
// First access to buf on host (assuming q is not host): allocate memory
20+
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
21+
```
22+
23+
In the DPCPP execution graph these allocations are represented by allocation
24+
command nodes (`cl::sycl::detail::AllocaCommand`). A finished allocation
25+
command means that the associated memory object is ready for its first use in
26+
that context, but for host allocation commands it might be the case that no
27+
actual memory allocation takes place: either because it is possible to reuse the
28+
data pointer provided by the user:
29+
30+
```C++
31+
int val;
32+
cl::sycl::buffer<int, 1> buf{&val, cl::sycl::range<1>(1)};
33+
34+
// An alloca command is created, but it does not allocate new memory: &val
35+
// is reused instead.
36+
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
37+
```
38+
39+
Or because a mapped host pointer obtained from a non-host native memory object
40+
is used in its place (as is the case for linked commands, covered below).
41+
42+
## Linked allocation commands
43+
44+
Whenever an allocation command is created for a memory object, it can be created
45+
as "linked" to another one if they satisfy these requirements:
46+
- Both allocation commands are associated with the same memory object.
47+
- Exactly one of the two commands is associated with a host context.
48+
- Neither of the commands is already linked.
49+
50+
The idea behind linked commands is that the non-host allocation of the pair is
51+
supposed to reuse the host allocation, i. e. the host memory is requested to be
52+
shared between the two (the underlying backend is still free to ignore that
53+
request and allocate additional memory if needed). The difference in handling
54+
linked and unlinked allocations is summarized in the table below.
55+
56+
| | Unlinked | Linked |
57+
| - | -------- | ------ |
58+
| Native memory object creation | Created with COPY_HOST_PTR if a host pointer is available and the first access mode does not discard the data. | Created with USE_HOST_PTR if a suitable host pointer is available, regardless of the first access mode. |
59+
| Host allocation command behaviour | Skipped if a suitable user host pointer is available. | In addition to skipping the allocation if a suitable user pointer is provided, the allocation is also skipped if the host command is created after its linked counterpart (it's retrieved via map operation instead). |
60+
| Memory transfer | Performed with read/write operations, non-host to non-host transfer is done with a host allocation as an intermediary (direct transfer is not supported by PI). | Only one allocation from the pair can be active at a time, the switch is done with map/unmap operations. |
61+
62+
## Command linking approach
63+
64+
Whenever two allocation commands are considered for linking, the decision is
65+
made based on the following criterion: the commands are linked if and only if
66+
the non-host device of the pair supports host unified memory (i. e. the device
67+
and host share the same physical memory). The motivation for this is two-fold:
68+
- If the non-host device supports host unified memory, the USE_HOST_PTR flag
69+
should not result in any additional device memory allocation or copying between
70+
the two during map/unmap operations.
71+
- Even if the point above makes no difference for a particular pair of
72+
allocations (e. g. no host pointer is available for the non-host allocation),
73+
it might be possible to exploit that later in the application for another device
74+
that does support host unified memory.

0 commit comments

Comments
 (0)