Skip to content

[SYCL][DOC] Add a document covering linked allocation commands #2863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 8, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions sycl/doc/LinkedAllocations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Linked allocations

## Brief overview of allocations for memory objects

A SYCL memory object (`buffer`/`image`) can be accessed in multiple contexts
throughout its lifetime. Since this is dynamic information that is unknown
during memory object construction, no allocation takes place at that point.
Instead, memory is allocated in each context whenever the SYCL memory object
is first accessed there:

```
cl::sycl::buffer<int, 1> buf{cl::sycl::range<1>(1)}; // No allocation here

cl::sycl::queue q;
q.submit([&](cl::sycl::handler &cgh){
// First access to buf in q's context: allocate memory
auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);
...
});

// First access to buf on host (assuming q is not host): allocate memory
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
```

In the DPCPP execution graph these allocations are represented by allocation
command nodes (`cl::sycl::detail::AllocaCommand`). A finished allocation
command means that the associated memory object is ready for its first use in
that context, but for host allocation commands it might be the case that no
actual memory allocation takes place: either because it is possible to reuse the
data pointer provided by the user:

```
int val;
cl::sycl::buffer<int, 1> buf{&val, cl::sycl::range<1>(1)};

// An alloca command is created, but it does not allocate new memory: &val
// is reused instead.
auto acc = buf.get_access<cl::sycl::access::mode::read_write>();
```

Or because a mapped host pointer obtained from a native device memory object
is used in its place (as is the case for linked commands, covered below).

## Linked allocation commands

Whenever an allocation command is created for a memory object, it can be created
as "linked" to another one if they satisfy these requirements:
- Both allocation commands are associated with the same memory object.
- Exactly one of the two commands is associated with a host context.
- Neither of the commands is already linked.

The idea behind linked commands is that the device allocation of the pair is
supposed to reuse the host allocation, i.e. the host memory is requested to be
shared between the two (the underlying backend is still free to ignore that
request and allocate additional memory if needed). The difference in handling
linked and unlinked allocations is summarized in the table below.

| | Unlinked | Linked |
| - | -------- | ------ |
| Native memory object creation | Created with COPY_HOST_PTR if a host pointer is available and the first access mode does not discard the data. | Created with USE_HOST_PTR if a suitable host pointer is available, regardless of the first access mode. |
| Host allocation command behaviour | Skipped if a suitable user host pointer is available. | In addition to skipping the allocation if a suitable user pointer is provided, the allocation is also skipped if the host command is created after its linked counterpart (it's retrieved via map operation instead). |
| Memory transfer | Performed with read/write operations, device-to-device transfer is done with a host allocation as an intermediary (direct transfer is not supported by PI). | Only one allocation from the pair can be active at a time, the switch is done with map/unmap operations. Device-to-device transfer where one of the device allocations is linked is done with the host allocation from the pair as an intermediary (e.g. for transfer from unlinked device allocation A to linked device allocation B: map B -> read A to the host allocation -> unmap B). |

## Command linking approach

Whenever two allocation commands are considered for linking, the decision is
made based on the following criterion: the commands are linked if and only if
the non-host device of the pair supports host unified memory (i.e. the device
and host share the same physical memory). The motivation for this is two-fold:
- If the non-host device supports host unified memory, the USE_HOST_PTR flag
should not result in any additional device memory allocation or copying between
the two during map/unmap operations.
- Even if the point above makes no difference for a particular pair of
allocations (e.g. no host pointer is available for the device allocation),
it might be possible to exploit that later in the application for another device
that does support host unified memory.
1 change: 1 addition & 0 deletions sycl/doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,4 @@ Developing oneAPI DPC++ Compiler
SpecializationConstants
KernelProgramCache
GlobalObjectsInRuntime
LinkedAllocations