Skip to content

[SYCL][Scheduler] Rework of host accessor and host allocation #724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 22, 2019

Conversation

romanovvlad
Copy link
Contributor

@romanovvlad romanovvlad commented Oct 14, 2019

Background for use_host_ptr property:

The SYCL specification defines use_host_ptr property which can be passed
into the buffer or image constructors. The SYCL specification says the
following:

The use_host_ptr property adds the requirement that the SYCL runtime
must not allocate any memory for the SYCL buffer and instead uses the
provided host pointer directly. This prevents the SYCL runtime from
allocating additional temporary storage on the host.

Current implementation violates this rule in some scenarios, such as
moving the latest data of the buffer from one OCL device context to
another and on update_host type of explicit memory operation.

Background for host accessor:

Current implementation of host accessor requires that the underlying
backend supports async/delayed execution: SYCL RT calls Unmap or Write
commands that are initially blocked by an user event. Because of that,
for instance, host accessors work incorrectly if the latest memory are
located on the host device. Also, user events are usually not very well
supported, there is "UseExclusiveQueue" workaround because of that.

Solution:

The patch eliminates the need to have more then one host allocation by
introducing concept of linked alloca commands. One device and one host
alloca commands can be linked together so they share* the same host
memory. Such commands have "active" state which indicates that it's
valid to work with corresponding memory allocation. Only one alloca
command from a pair can be active at the same time. Memory transition
between them is performed using map/unmap commands.

  • share from the SYCL point of view, because even if SYCL RT asks
    underlying backends to reuse host pointer they are free to allocate
    additional memory on their own.

In order to resolve the limitations of current implementation of the
host accessor the patch updates addHostAccessor API to emit map or
read operations along with an empty command which is in blocked
state(cannot be enqueued). An empty command is unblocked inside in the
new releaseHostAccessor API which is called from host accessor
destructor.

As bonuses/side effects the patch:

  1. Allows having multiple host accessor with read access mode at the
    same time in all cases.
  2. Adds environment variable SYCL_THROW_ON_BLOCK which if set makes SYCL
    RT throw an exception on attempt to wait for a blocked command.
  3. Makes so the latest state of memory object is still on the host after
    host accessor destructor in all cases, so when multiple host
    accessors are constructed in a row no map/read operations are
    performed starting from the second host accessor. Before the patch
    memory was moved back to the source(for example, to OpenCL memory
    allocation) so SYCL RT had to make map/read each time.
  4. Makes Scheduler considering host and different instances of the host
    device to be in the same context when it looks for a suitable
    allocation.

Signed-off-by: Vlad Romanov [email protected]

@romanovvlad romanovvlad changed the title Reimplement host accessor without using user event. [WIP] Reimplement host accessor without using user event. Oct 14, 2019
@romanovvlad
Copy link
Contributor Author

It turned out that there is additional problem which blocked first attempt to reimplement host accessor.
The problem is that sometime ago we have started ignoring use_host_ptr property. So, new raw diff contains the fix for that problem too.
Separating these two things would require to spend additional time to update host accessor implementation which is anyway going to be replaced by this one that why I'm going to try to commit host accessor re implementation and support for use_host_ptr in the patch.

@romanovvlad romanovvlad force-pushed the private/vromanov/HostAcc branch 3 times, most recently from cb45f05 to d112c01 Compare November 16, 2019 13:09
@romanovvlad romanovvlad changed the title [WIP] Reimplement host accessor without using user event. [SYCL][Scheduler] Rework of host accessor and host allocation Nov 18, 2019
@romanovvlad romanovvlad marked this pull request as ready for review November 18, 2019 20:25
@romanovvlad
Copy link
Contributor Author

Will rebase tomorrow so there will be only on commit for review.

@romanovvlad romanovvlad force-pushed the private/vromanov/HostAcc branch 2 times, most recently from c937935 to 900e41c Compare November 19, 2019 11:36
@@ -1,44 +0,0 @@
// RUN: %clangxx -fsycl %s -o %t.out
// RUN: env SYCL_PI_TRACE=1 %CPU_RUN_PLACEHOLDER %t.out 2>&1 %CPU_CHECK_PLACEHOLDER
Copy link
Contributor Author

@romanovvlad romanovvlad Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the test because it expects line(piEnqueueMemBufferWrite) which is not generated for the code in the test anymore. Attempt to rewrite the test was not successful as the problem this test was written for cannot happen anymore.

Copy link
Contributor

@sergey-semenov sergey-semenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just some minor comments

@romanovvlad romanovvlad force-pushed the private/vromanov/HostAcc branch from 8de61c3 to 2f0d953 Compare November 20, 2019 17:51
@sergey-semenov
Copy link
Contributor

Other than the last comment, LGTM

Background for use_host_ptr property:

The SYCL specification defines use_host_ptr property which can be passed
into the buffer or image constructors. The SYCL specification says the
following:

The use_host_ptr property adds the requirement that the SYCL runtime
must not allocate any memory for the SYCL buffer and instead uses the
provided host pointer directly. This prevents the SYCL runtime from
allocating additional temporary storage on the host.

Current implementation violates this rule in some scenarios, such as
moving the latest data of the buffer from one OCL device context to
another and on update_host type of explicit memory operation.

Background for host accessor:

Current implementation of host accessor requires that the underlying
backend supports async/delayed execution: SYCL RT calls Unmap or Write
commands that are initially blocked by an user event. Because of that,
for instance, host accessors work incorrectly if the latest memory are
located on the host device. Also, user events are usually not very well
supported, there is "UseExclusiveQueue" workaround because of that.

Solution:

The patch eliminates the need to have more then one host allocation by
introducing concept of linked alloca commands. One device and one host
alloca commands can be linked together so they share* the same host
memory. Such commands have "active" state which indicates that it's
valid to work with corresponding memory allocation. Only one alloca
command from a pair can be active at the same time. Memory transition
between them is performed using map/unmap commands.

* share from the SYCL point of view, because even if SYCL RT asks
underlying backends to reuse host pointer they are free to allocate
additional memory on their own.

In order to resolve the limitations of current implementation of the
host accessor the patch updates addHostAccessor API to emit map or
read operations along with an empty command which is in blocked
state(cannot be enqueued). An empty command is unblocked inside in the
new releaseHostAccessor API which is called from host accessor
destructor.

As bonuses/side effects the patch:
1. Allows having multiple host accessor with read access mode at the
   same time in all cases.
2. Adds environment variable SYCL_THROW_ON_BLOCK which if set makes SYCL
   RT throw an exception on attempt to wait for a blocked command.
3. Makes so the latest state of memory object is still on the host after
   host accessor destructor in all cases, so when multiple host
   accessors are constructed in a row no map/read operations are
   performed starting from the second host accessor. Before the patch
   memory was moved back to the source(for example, to OpenCL memory
   allocation) so SYCL RT had to make map/read each time.
4. Makes Scheduler considering host and different instances of the host
   device to be in the same context when it looks for a suitable
   allocation.

Signed-off-by: Vlad Romanov <[email protected]>
Signed-off-by: Vlad Romanov <[email protected]>
Signed-off-by: Vlad Romanov <[email protected]>
Signed-off-by: Vlad Romanov <[email protected]>
@romanovvlad romanovvlad force-pushed the private/vromanov/HostAcc branch from 2f0d953 to 1e900b3 Compare November 21, 2019 19:13
@romanovvlad romanovvlad merged commit 16ae15a into intel:sycl Nov 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants