-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][Scheduler] Rework of host accessor and host allocation #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
96e0cce
to
647b076
Compare
It turned out that there is additional problem which blocked first attempt to reimplement host accessor. |
cb45f05
to
d112c01
Compare
Will rebase tomorrow so there will be only on commit for review. |
c937935
to
900e41c
Compare
@@ -1,44 +0,0 @@ | |||
// RUN: %clangxx -fsycl %s -o %t.out | |||
// RUN: env SYCL_PI_TRACE=1 %CPU_RUN_PLACEHOLDER %t.out 2>&1 %CPU_CHECK_PLACEHOLDER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the test because it expects line(piEnqueueMemBufferWrite) which is not generated for the code in the test anymore. Attempt to rewrite the test was not successful as the problem this test was written for cannot happen anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just some minor comments
8de61c3
to
2f0d953
Compare
Other than the last comment, LGTM |
Background for use_host_ptr property: The SYCL specification defines use_host_ptr property which can be passed into the buffer or image constructors. The SYCL specification says the following: The use_host_ptr property adds the requirement that the SYCL runtime must not allocate any memory for the SYCL buffer and instead uses the provided host pointer directly. This prevents the SYCL runtime from allocating additional temporary storage on the host. Current implementation violates this rule in some scenarios, such as moving the latest data of the buffer from one OCL device context to another and on update_host type of explicit memory operation. Background for host accessor: Current implementation of host accessor requires that the underlying backend supports async/delayed execution: SYCL RT calls Unmap or Write commands that are initially blocked by an user event. Because of that, for instance, host accessors work incorrectly if the latest memory are located on the host device. Also, user events are usually not very well supported, there is "UseExclusiveQueue" workaround because of that. Solution: The patch eliminates the need to have more then one host allocation by introducing concept of linked alloca commands. One device and one host alloca commands can be linked together so they share* the same host memory. Such commands have "active" state which indicates that it's valid to work with corresponding memory allocation. Only one alloca command from a pair can be active at the same time. Memory transition between them is performed using map/unmap commands. * share from the SYCL point of view, because even if SYCL RT asks underlying backends to reuse host pointer they are free to allocate additional memory on their own. In order to resolve the limitations of current implementation of the host accessor the patch updates addHostAccessor API to emit map or read operations along with an empty command which is in blocked state(cannot be enqueued). An empty command is unblocked inside in the new releaseHostAccessor API which is called from host accessor destructor. As bonuses/side effects the patch: 1. Allows having multiple host accessor with read access mode at the same time in all cases. 2. Adds environment variable SYCL_THROW_ON_BLOCK which if set makes SYCL RT throw an exception on attempt to wait for a blocked command. 3. Makes so the latest state of memory object is still on the host after host accessor destructor in all cases, so when multiple host accessors are constructed in a row no map/read operations are performed starting from the second host accessor. Before the patch memory was moved back to the source(for example, to OpenCL memory allocation) so SYCL RT had to make map/read each time. 4. Makes Scheduler considering host and different instances of the host device to be in the same context when it looks for a suitable allocation. Signed-off-by: Vlad Romanov <[email protected]>
Signed-off-by: Vlad Romanov <[email protected]>
Signed-off-by: Vlad Romanov <[email protected]>
Signed-off-by: Vlad Romanov <[email protected]>
2f0d953
to
1e900b3
Compare
Background for use_host_ptr property:
The SYCL specification defines use_host_ptr property which can be passed
into the buffer or image constructors. The SYCL specification says the
following:
The use_host_ptr property adds the requirement that the SYCL runtime
must not allocate any memory for the SYCL buffer and instead uses the
provided host pointer directly. This prevents the SYCL runtime from
allocating additional temporary storage on the host.
Current implementation violates this rule in some scenarios, such as
moving the latest data of the buffer from one OCL device context to
another and on update_host type of explicit memory operation.
Background for host accessor:
Current implementation of host accessor requires that the underlying
backend supports async/delayed execution: SYCL RT calls Unmap or Write
commands that are initially blocked by an user event. Because of that,
for instance, host accessors work incorrectly if the latest memory are
located on the host device. Also, user events are usually not very well
supported, there is "UseExclusiveQueue" workaround because of that.
Solution:
The patch eliminates the need to have more then one host allocation by
introducing concept of linked alloca commands. One device and one host
alloca commands can be linked together so they share* the same host
memory. Such commands have "active" state which indicates that it's
valid to work with corresponding memory allocation. Only one alloca
command from a pair can be active at the same time. Memory transition
between them is performed using map/unmap commands.
underlying backends to reuse host pointer they are free to allocate
additional memory on their own.
In order to resolve the limitations of current implementation of the
host accessor the patch updates addHostAccessor API to emit map or
read operations along with an empty command which is in blocked
state(cannot be enqueued). An empty command is unblocked inside in the
new releaseHostAccessor API which is called from host accessor
destructor.
As bonuses/side effects the patch:
same time in all cases.
RT throw an exception on attempt to wait for a blocked command.
host accessor destructor in all cases, so when multiple host
accessors are constructed in a row no map/read operations are
performed starting from the second host accessor. Before the patch
memory was moved back to the source(for example, to OpenCL memory
allocation) so SYCL RT had to make map/read each time.
device to be in the same context when it looks for a suitable
allocation.
Signed-off-by: Vlad Romanov [email protected]