[SYCL] Support make_buffer for the Level Zero backend #5229

againull · 2021-12-27T20:18:09Z

Test for the feature: intel/llvm-test-suite#708

againull · 2022-01-06T21:35:51Z

/verify with intel/llvm-test-suite#708

sycl/doc/extensions/LevelZeroBackend/LevelZeroBackend.md

* Check type of the allocation in the piextMemCreateWithNativeHandle and disallow multi-device contexts for device and shared allocations because of the lack of support in the plugin. * Describe restrictions in the doc. * Describe synchronization rules in the doc. * Other minor fixes.

gmlueck · 2022-01-10T21:40:36Z

sycl/doc/extensions/LevelZeroBackend/LevelZeroBackend.md

+
+Construct a SYCL buffer instance from a pointer to a Level Zero memory buffer. The pointer must be the value returned from a previous call to <code>zeMemAllocShared()</code>, <code>zeMemAllocDevice()</code>, or <code>zeMemAllocHost()</code>. If pointer is the value returned from <code>zeMemAllocDevice()</code> or <code>zeMemAllocShared()</code> then SYCL context <code>Context</code> must be associated with a single device.
+The <code>Context</code> argument must be a valid SYCL context encapsulating a Level-Zero context, and the Level-Zero memory must be allocated on the same context. Created SYCL buffer is
+associated only with the single SYCL context <code>Context</code>, the buffer can't be accessed on another contexts.


the buffer can't be accessed on another contexts

Do we really need this strict limitation, or can we say instead "the buffer can only be accessed from the devices contained by Context"? I'm thinking about the case when the buffer is created from context C1, and the user creates a sycl::queue for context C2 and device D. Device D is contained by both C1 and C2. Is it legal to access the buffer from this kernel? With your wording, this is not legal because the contexts are different. However, it is legal for my proposed wording.

Whichever you choose, I think violations of this restriction should be diagnosed at runtime with an exception. This will make it easier for users to know when they've done something wrong. I think you can diagnose this error in two places:

If the user constructs a non-placeholder accessor, you can throw an exception (probably errc::invalid) if the handler's device/context does not match the buffer's context.

If the user constructs a placeholder accessor, you can do the same check in the handler::rquires() function.

Actually, I've realized that we don't need this limitation at all.

Initially I made this restriction because memory buffer can be used only in the context where it is created:
https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=zememallocdevice#zememallocdevice

The application must only use the memory allocation for the context and device, or its sub-devices, which was provided during allocation.

But it turns out SYCL RT handles the case when SYCL buffer is used in different context (opencl buffer is also isolated in the context). So, if buffer encapsulates allocation on ctx1 and this buffer is accessed on ctx2 then SYCL RT creates a new allocation on ctx2 and copies contents of the first allocation to the second allocation through host (i.e. 2 copy operations - device to host and host to device)

But there is one problem. After buffer destruction data is not automatically copied back to original allocation on ctx1 (to follow keep semantics that you described). Currently contents of the buffer can be accessed through host accessor (or probably somehow artificially by submitting buffer to the original context with empty kernel).

Copy back logic requires additional changes in buffer and/or scheduler. I am not sure if we can somehow describe this as a limitation and implement later.

I think we should avoid documenting a feature that we don't implement. This will just lead to customer confusion and bug reports. Here are a couple of options:

Add the support now to copy back the data to the original buffer for this case.

Support only the transfer case in this version of the extension. This is the default behavior anyway, so presumably we think this is the common case. We could retain the Ownership field, but make_buffer would throw an exception if the user passes keep. Since the issue about copying data back only exists in the "keep" case, we don't need to implement that now. We have the option in the future to add support for keep and implement the copy-back behavior in the future.

But "keep" is only "broken" on case of access to the buffer from multiple contexts, which would be a very rare case, I believe. I'd vote for documenting the "keep" and make it work while in the same context, and create a bug about missing copy for multiple contexts.

If we think the "keep" case is important, we should implement it for all cases. How hard is it to do the write-back?

Implemented write-back and added test to intel/llvm-test-suite#708

gmlueck · 2022-01-10T21:59:56Z

sycl/doc/extensions/LevelZeroBackend/LevelZeroBackend.md

+
+* If the ownership is keep (i.e. the application retains ownership of the Level Zero memory buffer), then the SYCL buffer destructor blocks until all work in queues on the buffer have completed. The buffer's contents are written to the Level Zero memory buffer by the time the destructor completes.
+
+* If the ownership is transfer (i.e. the SYCL runtime has ownership of the Level Zero memory buffer), then the SYCL buffer destructor blocks until all work in queues on the buffer have completed. The SYCL runtime frees the Level Zero memory buffer asynchronously when it is no longer in use in queues.


It's up to us to decide whether the destructor blocks in this case, but I think it might be surprising to users if it blocks. The core SYCL spec says:

The basic rule for the blocking behavior of a buffer destructor is that it blocks if there is some data to write back because a write accessor on it has been created, or if the buffer was constructed with attached host memory and is still in use.

For example, the destructor for a buffer that is created from just a size (and no host pointer) does not block because it does not write any data back to the host.

The case when ownership is transfer seems more similar this case because no data is available on the host after the destructor finishes. Therefore, it seems like there is no need to block.

What is your reasoning for thinking it should block?

I see, indeed spec says that destructor is not necessarily blocking:

A buffer can be constructed with just a size and using the default buffer allocator. ... The destructor for this type of buffer does not need to block, even if work on the buffer has not completed.

But problem is that currently implementation is blocking in this case, even if I do set_write_back(false) explicitly. Implementation of buffer destructor is blocking in all cases now.

I propose to use the same wording for interop buffer - "does not need to block" instead of "doesn't block". And in the future when non-blocking version of the buffer will be implemented, we will use it for "transfer" semantics as well.

I propose to use the same wording for interop buffer - "does not need to block" instead of "doesn't block".

Given that we use this wording also in the core spec, I do not object to using it in this extension spec.

sycl/doc/extensions/LevelZeroBackend/LevelZeroBackend.md

sycl/include/CL/sycl/backend.hpp

againull · 2022-04-05T18:01:57Z

/verify with intel/llvm-test-suite#708

This reverts commit 2a21b30.

…is "keep""

… to enforce copy back

againull · 2022-04-06T06:51:16Z

/verify with intel/llvm-test-suite#708

gmlueck · 2022-04-06T14:39:18Z

sycl/doc/extensions/supported/sycl_ext_oneapi_backend_level_zero.md

+
+The behavior of the SYCL buffer destructor depends on the Ownership flag. As with other SYCL buffers, this behavior is triggered only when the last reference count to the buffer is dropped, as described in the core SYCL specification section 4.7.2.3, "Buffer synchronization rules".
+
+* If the ownership is keep (i.e. the application retains ownership of the Level Zero memory allocation), then the SYCL buffer destructor blocks until all work in queues on the buffer have completed. The buffer's contents is not copied back to the Level Zero memory allocation. This means that contents of the Level Zero memory allocation is not up-to-date after destruction if the SYCL buffer was used in a SYCL context which is different from the SYCL context provided to ```make_buffer``` or if the SYCL buffer was used on devices different from the device where the Level Zero memory was allocated. SYCL RT can be enforced to write data back to the original allocation by means of the ```get_native_mem()``` method of the ```interop_handle``` SYCL class which is described in the core SYCL specification section 4.10.2, "Class interop_handle" and in the Section 4.2 of this document. 


It does not make sense for the destructor to block unless there is also a guarantee that the buffer contents are written back to the USM memory. We should either:

Block and provide a guarantee that the buffer contents are copied back to the USM memory, or

Do not block and provide no guarantee about the contents of the USM memory after the buffer is destroyed.

I thought we need to block in case of "keep" ownership even if we don't copy back. User may start using device pointer (for example, write something to this memory allocation or free this allocation) as soon as buffer is destroyed. So, if I get it right, we must be sure that all work on the buffer must be completed after buffer destruction. Or probably I miss something.

OK, that's a good point. I agree that the destructor needs to block.

Sorry, one more question:

SYCL RT can be enforced to write data back to the original allocation by means of the get_native_mem() method of the interop_handle SYCL class which is described in the core SYCL specification section 4.10.2

Is this statement always true? For example, if you called get_native_mem from a different device (not the device associated with the L0 memory), then this would not guarantee that the data is written to the original L0 memory allocation. Even if you call get_native_mem from the same device, we do not currently guarantee that the native memory handle will be the same as the one the user passed to make_buffer (though I presume it would be the same in a reasonable implementation).

Do we need this sentence explaining how to cause the buffer content to be written back?

Yes, I was trying to say the the data can be copied back if you call get_native_mem from the same device.
But after our discussion I agree that even though our current implementation is using the same pointer as the one the user passed to make_buffer, in the future it will change. So I removed the confusing statement.
Users may refer to core sycl spec where get_native_mem behaviour is described.

gmlueck · 2022-04-06T14:40:24Z

sycl/doc/extensions/supported/sycl_ext_oneapi_backend_level_zero.md


+``` C++
+    Queue.submit([&](handler &CGH) {
+        auto BufferAcc = BufferInterop.get_access<access::mode::write>(CGH);


Suggested change

auto BufferAcc = BufferInterop.get_access<access::mode::write>(CGH);

auto BufferAcc = Buffer.get_access<access::mode::write>(CGH);

I don't want to give the impression that get_native_mem only works for interop buffers.

againull · 2022-04-06T20:16:01Z

'SYCL :: ESIMD/accessor.cpp' failure is unrelated.

againull · 2022-04-07T17:53:22Z

@gmlueck Could you please review this PR. It can't be merged without your approval.

gmlueck

Spec looks good.

againull · 2022-04-07T19:16:39Z

Thanks, only doc change after last testing, merging.

againull · 2022-04-07T20:30:23Z

CUDA and Windows LIT tested locally - passed (buildbots aren't working)

againull force-pushed the interop branch 2 times, most recently from 23bedc7 to 5e2dff1 Compare December 27, 2021 23:15

againull force-pushed the interop branch from 5e2dff1 to 62b9142 Compare January 5, 2022 23:48

[SYCL] Support make_buffer for the Level Zero backend

ae8bb7f

againull force-pushed the interop branch from 62b9142 to ae8bb7f Compare January 6, 2022 07:59

Clean up and update doc

95a8255

againull marked this pull request as ready for review January 6, 2022 21:34

againull requested review from a team as code owners January 6, 2022 21:34

againull requested a review from sergey-semenov January 6, 2022 21:34

againull requested a review from smaslov-intel January 6, 2022 22:04

smaslov-intel reviewed Jan 7, 2022

View reviewed changes

sycl/doc/extensions/LevelZeroBackend/LevelZeroBackend.md Outdated Show resolved Hide resolved

againull requested a review from smaslov-intel January 7, 2022 02:52

gmlueck reviewed Jan 7, 2022

View reviewed changes

againull added 2 commits January 10, 2022 11:11

Create 2 version of make_buffer: with event and without event

d18bd6d

againull requested a review from gmlueck January 10, 2022 20:34

gmlueck reviewed Jan 10, 2022

View reviewed changes

againull mentioned this pull request Jan 10, 2022

[SYCL] Test buffer interop for the Level Zero backend intel/llvm-test-suite#708

Merged

againull added 3 commits January 10, 2022 16:13

[SYCL] Use zeMemGetAddressRange to get a size of a buffer

fb0f959

Fix synchronization rules

e5db872

[SYCL] Created buffer can be accessed on another contexts

6fe6c01