-
Notifications
You must be signed in to change notification settings - Fork 788
Draft: [SYCL] Implement resource pool for implementation allocations #5662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: [SYCL] Implement resource pool for implementation allocations #5662
Conversation
These changes add a "resource pool" used by the SYCL implementation to get auxiliary device memory resources. When a resource is deleted it returns its memory to the pool to be reacquired by other internal logic. Additionally, reductions that need additional resources are made to use the pool when getting these. Signed-off-by: Steffen Larsen <[email protected]>
Signed-off-by: Steffen Larsen <[email protected]>
/summary:run |
Functionality tested by intel/llvm-test-suite#873. Verification with this requires #5653. |
Signed-off-by: Steffen Larsen <[email protected]>
Signed-off-by: Steffen Larsen <[email protected]>
Signed-off-by: Steffen Larsen <[email protected]>
+@smaslov-intel. As far as I remember where is some memory pool in L0 plugin/BE. Are there any drawbacks of having one more pool of memory in the SW stack? |
We need to make the current L0-only pooling serve the needs of other SYCL backends (and other language runtimes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl/doc/EnvironmentVariables.md looks okay, but there are merge conflicts in other files.
/verify with intel/llvm-test-suite#873 |
Signed-off-by: Steffen Larsen <[email protected]>
Signed-off-by: Steffen Larsen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, fix pre-commit tests.
Signed-off-by: Steffen Larsen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it feasible to have resource pool unit-tested?
SYCLMemObjT(RT::PiMem MemObject, const context &SyclContext, | ||
const size_t SizeInBytes, event AvailableEvent, | ||
std::unique_ptr<SYCLMemObjAllocator> Allocator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this c-tor supersede the other one? That is, should the other c-tor be marked with TODO for removal when ABI break is allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so. The other constructor is intended to be user-facing and as such has additional checks for argument validity. The new constructor is meant for internal use so we do not need the additional checks.
Signed-off-by: Steffen Larsen <[email protected]>
There is not currently a way to test reductions in unittests, but it may be possible to use the resource pool directly. I see there is a failure in testing that is likely related, so I will be investigating that and will see what is possible with unittests. |
Signed-off-by: Steffen Larsen <[email protected]>
Signed-off-by: Steffen Larsen <[email protected]>
That's exactly what I mean. Thanks. |
Signed-off-by: Steffen Larsen <[email protected]>
Signed-off-by: Steffen Larsen <[email protected]>
Absolutely. I have added such tests. Please let me know if there are test cases I've missed. |
Signed-off-by: Steffen Larsen <[email protected]>
@dm-vodopyanov | @vladimirlaz | @smaslov-intel - Friendly ping. |
@steffenlarsen: could you react on my previous comment:
We already have a memory pool underneath, and it looks that this new pooling adds overheads without clear benefits. |
@smaslov-intel - Sorry, I think I misunderstood your comment. We may be able to generalize the pooling, but it seems that the L0-only implementation is primarily focused on USM memory allocations. Originally, the plan for this pool was to use USM memory as well, but it would make reductions dependent on USM support. That would mean any backend that does not support USM would not be able to support reductions either. Do you think we should make the pooling work for non-USM allocations? Should we consider using this as an intermediate solution and retire it when we have generalized the L0-only pooling? Correction: Reading it, it seem to be possible to enable it for buffers as well. Is that correct? If so, what would be needed to make it not L0-only? Is this enabled for buffers by default? If not, what would be the cost of enabling it? Could we enable it for certain allocations and not others? E.g. enable it when allocating auxiliary resource but not when creating other buffers? |
You mean SYCL buffers specifically? Or pi_mem objects? Or something else?
In Level-Zero backend SYCL buffers are backed by regular USM allocations, and those USM allocations are already pooled (but not pi_mem objects enclosing them). For backends that don't have USM and maintain distinct buffer objects the pooling needs to be extended. And your changes are probably good for that. But I think it should eventually be outside of SYCL RT, but in the code shared between plugins for different backends. |
I just meant pi_mem in general. This PR was mostly motivated by L0, but the pooling may already be solving the issue now that the leak fix is in. I will investigate to see if that is the case. |
I am currently investigating whether the existing pooling in the L0 plugin is enough to solve the problem. Converted this back to draft. |
The L0 pooling has shown to be enough to achieve this. We should consider expanding it to other backends, but as such we do not need this pooling patch. |
These changes add a "resource pool" used by the SYCL implementation to get auxiliary device memory resources. When a resource is deleted it returns its memory to the pool to be reacquired by other internal logic. Additionally, reductions that need additional resources are made to use the pool when getting these.