Skip to content

[SYCL] Support make_buffer for the Level Zero backend #5229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Apr 7, 2022

Conversation

againull
Copy link
Contributor

@againull againull commented Dec 27, 2021

Test for the feature: intel/llvm-test-suite#708

@againull againull marked this pull request as ready for review January 6, 2022 21:34
@againull againull requested review from a team as code owners January 6, 2022 21:34
@againull
Copy link
Contributor Author

againull commented Jan 6, 2022

/verify with intel/llvm-test-suite#708

@againull againull requested a review from smaslov-intel January 6, 2022 22:04
@againull againull requested a review from smaslov-intel January 7, 2022 02:52
* Check type of the allocation in the piextMemCreateWithNativeHandle and
disallow multi-device contexts for device and shared allocations because
of the lack of support in the plugin.
* Describe restrictions in the doc.
* Describe synchronization rules in the doc.
* Other minor fixes.
@againull againull requested a review from gmlueck January 10, 2022 20:34

Construct a SYCL buffer instance from a pointer to a Level Zero memory buffer. The pointer must be the value returned from a previous call to <code>zeMemAllocShared()</code>, <code>zeMemAllocDevice()</code>, or <code>zeMemAllocHost()</code>. If pointer is the value returned from <code>zeMemAllocDevice()</code> or <code>zeMemAllocShared()</code> then SYCL context <code>Context</code> must be associated with a single device.
The <code>Context</code> argument must be a valid SYCL context encapsulating a Level-Zero context, and the Level-Zero memory must be allocated on the same context. Created SYCL buffer is
associated only with the single SYCL context <code>Context</code>, the buffer can't be accessed on another contexts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the buffer can't be accessed on another contexts

Do we really need this strict limitation, or can we say instead "the buffer can only be accessed from the devices contained by Context"? I'm thinking about the case when the buffer is created from context C1, and the user creates a sycl::queue for context C2 and device D. Device D is contained by both C1 and C2. Is it legal to access the buffer from this kernel? With your wording, this is not legal because the contexts are different. However, it is legal for my proposed wording.

Whichever you choose, I think violations of this restriction should be diagnosed at runtime with an exception. This will make it easier for users to know when they've done something wrong. I think you can diagnose this error in two places:

  • If the user constructs a non-placeholder accessor, you can throw an exception (probably errc::invalid) if the handler's device/context does not match the buffer's context.

  • If the user constructs a placeholder accessor, you can do the same check in the handler::rquires() function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've realized that we don't need this limitation at all.

Initially I made this restriction because memory buffer can be used only in the context where it is created:
https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=zememallocdevice#zememallocdevice

The application must only use the memory allocation for the context and device, or its sub-devices, which was provided during allocation.

But it turns out SYCL RT handles the case when SYCL buffer is used in different context (opencl buffer is also isolated in the context). So, if buffer encapsulates allocation on ctx1 and this buffer is accessed on ctx2 then SYCL RT creates a new allocation on ctx2 and copies contents of the first allocation to the second allocation through host (i.e. 2 copy operations - device to host and host to device)

But there is one problem. After buffer destruction data is not automatically copied back to original allocation on ctx1 (to follow keep semantics that you described). Currently contents of the buffer can be accessed through host accessor (or probably somehow artificially by submitting buffer to the original context with empty kernel).

Copy back logic requires additional changes in buffer and/or scheduler. I am not sure if we can somehow describe this as a limitation and implement later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid documenting a feature that we don't implement. This will just lead to customer confusion and bug reports. Here are a couple of options:

  1. Add the support now to copy back the data to the original buffer for this case.

  2. Support only the transfer case in this version of the extension. This is the default behavior anyway, so presumably we think this is the common case. We could retain the Ownership field, but make_buffer would throw an exception if the user passes keep. Since the issue about copying data back only exists in the "keep" case, we don't need to implement that now. We have the option in the future to add support for keep and implement the copy-back behavior in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But "keep" is only "broken" on case of access to the buffer from multiple contexts, which would be a very rare case, I believe. I'd vote for documenting the "keep" and make it work while in the same context, and create a bug about missing copy for multiple contexts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think the "keep" case is important, we should implement it for all cases. How hard is it to do the write-back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented write-back and added test to intel/llvm-test-suite#708


* If the ownership is keep (i.e. the application retains ownership of the Level Zero memory buffer), then the SYCL buffer destructor blocks until all work in queues on the buffer have completed. The buffer's contents are written to the Level Zero memory buffer by the time the destructor completes.

* If the ownership is transfer (i.e. the SYCL runtime has ownership of the Level Zero memory buffer), then the SYCL buffer destructor blocks until all work in queues on the buffer have completed. The SYCL runtime frees the Level Zero memory buffer asynchronously when it is no longer in use in queues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to us to decide whether the destructor blocks in this case, but I think it might be surprising to users if it blocks. The core SYCL spec says:

The basic rule for the blocking behavior of a buffer destructor is that it blocks if there is some data to write back because a write accessor on it has been created, or if the buffer was constructed with attached host memory and is still in use.

For example, the destructor for a buffer that is created from just a size (and no host pointer) does not block because it does not write any data back to the host.

The case when ownership is transfer seems more similar this case because no data is available on the host after the destructor finishes. Therefore, it seems like there is no need to block.

What is your reasoning for thinking it should block?

Copy link
Contributor Author

@againull againull Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, indeed spec says that destructor is not necessarily blocking:

A buffer can be constructed with just a size and using the default buffer allocator. ... The destructor for this type of buffer does not need to block, even if work on the buffer has not completed.

But problem is that currently implementation is blocking in this case, even if I do set_write_back(false) explicitly. Implementation of buffer destructor is blocking in all cases now.

I propose to use the same wording for interop buffer - "does not need to block" instead of "doesn't block". And in the future when non-blocking version of the buffer will be implemented, we will use it for "transfer" semantics as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to use the same wording for interop buffer - "does not need to block" instead of "doesn't block".

Given that we use this wording also in the core spec, I do not object to using it in this extension spec.

@againull
Copy link
Contributor Author

againull commented Apr 5, 2022

/verify with intel/llvm-test-suite#708

smaslov-intel
smaslov-intel previously approved these changes Apr 6, 2022
@againull
Copy link
Contributor Author

againull commented Apr 6, 2022

/verify with intel/llvm-test-suite#708


The behavior of the SYCL buffer destructor depends on the Ownership flag. As with other SYCL buffers, this behavior is triggered only when the last reference count to the buffer is dropped, as described in the core SYCL specification section 4.7.2.3, "Buffer synchronization rules".

* If the ownership is keep (i.e. the application retains ownership of the Level Zero memory allocation), then the SYCL buffer destructor blocks until all work in queues on the buffer have completed. The buffer's contents is not copied back to the Level Zero memory allocation. This means that contents of the Level Zero memory allocation is not up-to-date after destruction if the SYCL buffer was used in a SYCL context which is different from the SYCL context provided to ```make_buffer``` or if the SYCL buffer was used on devices different from the device where the Level Zero memory was allocated. SYCL RT can be enforced to write data back to the original allocation by means of the ```get_native_mem()``` method of the ```interop_handle``` SYCL class which is described in the core SYCL specification section 4.10.2, "Class interop_handle" and in the Section 4.2 of this document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense for the destructor to block unless there is also a guarantee that the buffer contents are written back to the USM memory. We should either:

  1. Block and provide a guarantee that the buffer contents are copied back to the USM memory, or
  2. Do not block and provide no guarantee about the contents of the USM memory after the buffer is destroyed.

Copy link
Contributor Author

@againull againull Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we need to block in case of "keep" ownership even if we don't copy back. User may start using device pointer (for example, write something to this memory allocation or free this allocation) as soon as buffer is destroyed. So, if I get it right, we must be sure that all work on the buffer must be completed after buffer destruction. Or probably I miss something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's a good point. I agree that the destructor needs to block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one more question:

SYCL RT can be enforced to write data back to the original allocation by means of the get_native_mem() method of the interop_handle SYCL class which is described in the core SYCL specification section 4.10.2

Is this statement always true? For example, if you called get_native_mem from a different device (not the device associated with the L0 memory), then this would not guarantee that the data is written to the original L0 memory allocation. Even if you call get_native_mem from the same device, we do not currently guarantee that the native memory handle will be the same as the one the user passed to make_buffer (though I presume it would be the same in a reasonable implementation).

Do we need this sentence explaining how to cause the buffer content to be written back?

Copy link
Contributor Author

@againull againull Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was trying to say the the data can be copied back if you call get_native_mem from the same device.
But after our discussion I agree that even though our current implementation is using the same pointer as the one the user passed to make_buffer, in the future it will change. So I removed the confusing statement.
Users may refer to core sycl spec where get_native_mem behaviour is described.


``` C++
Queue.submit([&](handler &CGH) {
auto BufferAcc = BufferInterop.get_access<access::mode::write>(CGH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto BufferAcc = BufferInterop.get_access<access::mode::write>(CGH);
auto BufferAcc = Buffer.get_access<access::mode::write>(CGH);

I don't want to give the impression that get_native_mem only works for interop buffers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@againull
Copy link
Contributor Author

againull commented Apr 6, 2022

'SYCL :: ESIMD/accessor.cpp' failure is unrelated.

smaslov-intel
smaslov-intel previously approved these changes Apr 7, 2022
@againull
Copy link
Contributor Author

againull commented Apr 7, 2022

@gmlueck Could you please review this PR. It can't be merged without your approval.

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec looks good.

@againull againull requested a review from smaslov-intel April 7, 2022 19:14
@againull
Copy link
Contributor Author

againull commented Apr 7, 2022

Thanks, only doc change after last testing, merging.

@againull
Copy link
Contributor Author

againull commented Apr 7, 2022

CUDA and Windows LIT tested locally - passed (buildbots aren't working)

@againull againull merged commit 7c49984 into intel:sycl Apr 7, 2022
@againull againull deleted the interop branch December 3, 2022 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants