|
| 1 | +# Linked allocations |
| 2 | + |
| 3 | +## Brief overview of allocations for memory objects |
| 4 | + |
| 5 | +A SYCL memory object (`buffer`/`image`) can be accessed in multiple contexts |
| 6 | +throughout its lifetime. Since this is dynamic information that is unknown |
| 7 | +during memory object construction, no allocation takes place at that point. |
| 8 | +Instead, memory is allocated in each context whenever the SYCL memory object |
| 9 | +is first accessed there: |
| 10 | + |
| 11 | +``` |
| 12 | + cl::sycl::buffer<int, 1> buf{cl::sycl::range<1>(1)}; // No allocation here |
| 13 | +
|
| 14 | + cl::sycl::queue q; |
| 15 | + q.submit([&](cl::sycl::handler &cgh){ |
| 16 | + // First access to buf in q's context: allocate memory |
| 17 | + auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh); |
| 18 | + ... |
| 19 | + }); |
| 20 | +
|
| 21 | + // First access to buf on host (assuming q is not host): allocate memory |
| 22 | + auto acc = buf.get_access<cl::sycl::access::mode::read_write>(); |
| 23 | +``` |
| 24 | + |
| 25 | +In the DPCPP execution graph these allocations are represented by allocation |
| 26 | +command nodes (`cl::sycl::detail::AllocaCommand`). A finished allocation |
| 27 | +command means that the associated memory object is ready for its first use in |
| 28 | +that context, but for host allocation commands it might be the case that no |
| 29 | +actual memory allocation takes place: either because it is possible to reuse the |
| 30 | +data pointer provided by the user: |
| 31 | + |
| 32 | +``` |
| 33 | + int val; |
| 34 | + cl::sycl::buffer<int, 1> buf{&val, cl::sycl::range<1>(1)}; |
| 35 | +
|
| 36 | + // An alloca command is created, but it does not allocate new memory: &val |
| 37 | + // is reused instead. |
| 38 | + auto acc = buf.get_access<cl::sycl::access::mode::read_write>(); |
| 39 | +``` |
| 40 | + |
| 41 | +Or because a mapped host pointer obtained from a native device memory object |
| 42 | +is used in its place (as is the case for linked commands, covered below). |
| 43 | + |
| 44 | +## Linked allocation commands |
| 45 | + |
| 46 | +Whenever an allocation command is created for a memory object, it can be created |
| 47 | +as "linked" to another one if they satisfy these requirements: |
| 48 | +- Both allocation commands are associated with the same memory object. |
| 49 | +- Exactly one of the two commands is associated with a host context. |
| 50 | +- Neither of the commands is already linked. |
| 51 | + |
| 52 | +The idea behind linked commands is that the device allocation of the pair is |
| 53 | +supposed to reuse the host allocation, i.e. the host memory is requested to be |
| 54 | +shared between the two (the underlying backend is still free to ignore that |
| 55 | +request and allocate additional memory if needed). The difference in handling |
| 56 | +linked and unlinked allocations is summarized in the table below. |
| 57 | + |
| 58 | +| | Unlinked | Linked | |
| 59 | +| - | -------- | ------ | |
| 60 | +| Native memory object creation | Created with COPY_HOST_PTR if a host pointer is available and the first access mode does not discard the data. | Created with USE_HOST_PTR if a suitable host pointer is available, regardless of the first access mode. | |
| 61 | +| Host allocation command behaviour | Skipped if a suitable user host pointer is available. | In addition to skipping the allocation if a suitable user pointer is provided, the allocation is also skipped if the host command is created after its linked counterpart (it's retrieved via map operation instead). | |
| 62 | +| Memory transfer | Performed with read/write operations, device-to-device transfer is done with a host allocation as an intermediary (direct transfer is not supported by PI). | Only one allocation from the pair can be active at a time, the switch is done with map/unmap operations. Device-to-device transfer where one of the device allocations is linked is done with the host allocation from the pair as an intermediary (e.g. for transfer from unlinked device allocation A to linked device allocation B: map B -> read A to the host allocation -> unmap B). | |
| 63 | + |
| 64 | +## Command linking approach |
| 65 | + |
| 66 | +Whenever two allocation commands are considered for linking, the decision is |
| 67 | +made based on the following criterion: the commands are linked if and only if |
| 68 | +the non-host device of the pair supports host unified memory (i.e. the device |
| 69 | +and host share the same physical memory). The motivation for this is two-fold: |
| 70 | +- If the non-host device supports host unified memory, the USE_HOST_PTR flag |
| 71 | +should not result in any additional device memory allocation or copying between |
| 72 | +the two during map/unmap operations. |
| 73 | +- Even if the point above makes no difference for a particular pair of |
| 74 | +allocations (e.g. no host pointer is available for the device allocation), |
| 75 | +it might be possible to exploit that later in the application for another device |
| 76 | +that does support host unified memory. |
0 commit comments