-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Support make_buffer for the Level Zero backend #5229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
23bedc7
to
5e2dff1
Compare
/verify with intel/llvm-test-suite#708 |
* Check type of the allocation in the piextMemCreateWithNativeHandle and disallow multi-device contexts for device and shared allocations because of the lack of support in the plugin. * Describe restrictions in the doc. * Describe synchronization rules in the doc. * Other minor fixes.
|
||
Construct a SYCL buffer instance from a pointer to a Level Zero memory buffer. The pointer must be the value returned from a previous call to <code>zeMemAllocShared()</code>, <code>zeMemAllocDevice()</code>, or <code>zeMemAllocHost()</code>. If pointer is the value returned from <code>zeMemAllocDevice()</code> or <code>zeMemAllocShared()</code> then SYCL context <code>Context</code> must be associated with a single device. | ||
The <code>Context</code> argument must be a valid SYCL context encapsulating a Level-Zero context, and the Level-Zero memory must be allocated on the same context. Created SYCL buffer is | ||
associated only with the single SYCL context <code>Context</code>, the buffer can't be accessed on another contexts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the buffer can't be accessed on another contexts
Do we really need this strict limitation, or can we say instead "the buffer can only be accessed from the devices contained by Context
"? I'm thinking about the case when the buffer is created from context C1, and the user creates a sycl::queue
for context C2 and device D. Device D is contained by both C1 and C2. Is it legal to access the buffer from this kernel? With your wording, this is not legal because the contexts are different. However, it is legal for my proposed wording.
Whichever you choose, I think violations of this restriction should be diagnosed at runtime with an exception. This will make it easier for users to know when they've done something wrong. I think you can diagnose this error in two places:
-
If the user constructs a non-placeholder accessor, you can throw an exception (probably
errc::invalid
) if the handler's device/context does not match the buffer's context. -
If the user constructs a placeholder accessor, you can do the same check in the
handler::rquires()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I've realized that we don't need this limitation at all.
Initially I made this restriction because memory buffer can be used only in the context where it is created:
https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=zememallocdevice#zememallocdevice
The application must only use the memory allocation for the context and device, or its sub-devices, which was provided during allocation.
But it turns out SYCL RT handles the case when SYCL buffer is used in different context (opencl buffer is also isolated in the context). So, if buffer encapsulates allocation on ctx1 and this buffer is accessed on ctx2 then SYCL RT creates a new allocation on ctx2 and copies contents of the first allocation to the second allocation through host (i.e. 2 copy operations - device to host and host to device)
But there is one problem. After buffer destruction data is not automatically copied back to original allocation on ctx1 (to follow keep semantics that you described). Currently contents of the buffer can be accessed through host accessor (or probably somehow artificially by submitting buffer to the original context with empty kernel).
Copy back logic requires additional changes in buffer and/or scheduler. I am not sure if we can somehow describe this as a limitation and implement later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid documenting a feature that we don't implement. This will just lead to customer confusion and bug reports. Here are a couple of options:
-
Add the support now to copy back the data to the original buffer for this case.
-
Support only the
transfer
case in this version of the extension. This is the default behavior anyway, so presumably we think this is the common case. We could retain theOwnership
field, butmake_buffer
would throw an exception if the user passeskeep
. Since the issue about copying data back only exists in the "keep" case, we don't need to implement that now. We have the option in the future to add support forkeep
and implement the copy-back behavior in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But "keep" is only "broken" on case of access to the buffer from multiple contexts, which would be a very rare case, I believe. I'd vote for documenting the "keep" and make it work while in the same context, and create a bug about missing copy for multiple contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we think the "keep" case is important, we should implement it for all cases. How hard is it to do the write-back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented write-back and added test to intel/llvm-test-suite#708
|
||
* If the ownership is keep (i.e. the application retains ownership of the Level Zero memory buffer), then the SYCL buffer destructor blocks until all work in queues on the buffer have completed. The buffer's contents are written to the Level Zero memory buffer by the time the destructor completes. | ||
|
||
* If the ownership is transfer (i.e. the SYCL runtime has ownership of the Level Zero memory buffer), then the SYCL buffer destructor blocks until all work in queues on the buffer have completed. The SYCL runtime frees the Level Zero memory buffer asynchronously when it is no longer in use in queues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's up to us to decide whether the destructor blocks in this case, but I think it might be surprising to users if it blocks. The core SYCL spec says:
The basic rule for the blocking behavior of a buffer destructor is that it blocks if there is some data to write back because a write accessor on it has been created, or if the buffer was constructed with attached host memory and is still in use.
For example, the destructor for a buffer that is created from just a size (and no host pointer) does not block because it does not write any data back to the host.
The case when ownership is transfer
seems more similar this case because no data is available on the host after the destructor finishes. Therefore, it seems like there is no need to block.
What is your reasoning for thinking it should block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, indeed spec says that destructor is not necessarily blocking:
A buffer can be constructed with just a size and using the default buffer allocator. ... The destructor for this type of buffer does not need to block, even if work on the buffer has not completed.
But problem is that currently implementation is blocking in this case, even if I do set_write_back(false) explicitly. Implementation of buffer destructor is blocking in all cases now.
I propose to use the same wording for interop buffer - "does not need to block" instead of "doesn't block". And in the future when non-blocking version of the buffer will be implemented, we will use it for "transfer" semantics as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to use the same wording for interop buffer - "does not need to block" instead of "doesn't block".
Given that we use this wording also in the core spec, I do not object to using it in this extension spec.
/verify with intel/llvm-test-suite#708 |
This reverts commit 2a21b30.
… to enforce copy back
/verify with intel/llvm-test-suite#708 |
|
||
The behavior of the SYCL buffer destructor depends on the Ownership flag. As with other SYCL buffers, this behavior is triggered only when the last reference count to the buffer is dropped, as described in the core SYCL specification section 4.7.2.3, "Buffer synchronization rules". | ||
|
||
* If the ownership is keep (i.e. the application retains ownership of the Level Zero memory allocation), then the SYCL buffer destructor blocks until all work in queues on the buffer have completed. The buffer's contents is not copied back to the Level Zero memory allocation. This means that contents of the Level Zero memory allocation is not up-to-date after destruction if the SYCL buffer was used in a SYCL context which is different from the SYCL context provided to ```make_buffer``` or if the SYCL buffer was used on devices different from the device where the Level Zero memory was allocated. SYCL RT can be enforced to write data back to the original allocation by means of the ```get_native_mem()``` method of the ```interop_handle``` SYCL class which is described in the core SYCL specification section 4.10.2, "Class interop_handle" and in the Section 4.2 of this document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make sense for the destructor to block unless there is also a guarantee that the buffer contents are written back to the USM memory. We should either:
- Block and provide a guarantee that the buffer contents are copied back to the USM memory, or
- Do not block and provide no guarantee about the contents of the USM memory after the buffer is destroyed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we need to block in case of "keep" ownership even if we don't copy back. User may start using device pointer (for example, write something to this memory allocation or free this allocation) as soon as buffer is destroyed. So, if I get it right, we must be sure that all work on the buffer must be completed after buffer destruction. Or probably I miss something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's a good point. I agree that the destructor needs to block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one more question:
SYCL RT can be enforced to write data back to the original allocation by means of the
get_native_mem()
method of theinterop_handle
SYCL class which is described in the core SYCL specification section 4.10.2
Is this statement always true? For example, if you called get_native_mem
from a different device (not the device associated with the L0 memory), then this would not guarantee that the data is written to the original L0 memory allocation. Even if you call get_native_mem
from the same device, we do not currently guarantee that the native memory handle will be the same as the one the user passed to make_buffer
(though I presume it would be the same in a reasonable implementation).
Do we need this sentence explaining how to cause the buffer content to be written back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was trying to say the the data can be copied back if you call get_native_mem from the same device.
But after our discussion I agree that even though our current implementation is using the same pointer as the one the user passed to make_buffer, in the future it will change. So I removed the confusing statement.
Users may refer to core sycl spec where get_native_mem behaviour is described.
|
||
``` C++ | ||
Queue.submit([&](handler &CGH) { | ||
auto BufferAcc = BufferInterop.get_access<access::mode::write>(CGH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto BufferAcc = BufferInterop.get_access<access::mode::write>(CGH); | |
auto BufferAcc = Buffer.get_access<access::mode::write>(CGH); |
I don't want to give the impression that get_native_mem
only works for interop buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
'SYCL :: ESIMD/accessor.cpp' failure is unrelated. |
@gmlueck Could you please review this PR. It can't be merged without your approval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec looks good.
Thanks, only doc change after last testing, merging. |
CUDA and Windows LIT tested locally - passed (buildbots aren't working) |
Test for the feature: intel/llvm-test-suite#708