Skip to content

Update Level Zero provider config API #920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 25, 2024

Conversation

vinser52
Copy link
Contributor

@vinser52 vinser52 commented Nov 20, 2024

Description

Change the L0 provider config as we agreed in #844

Checklist

  • Code compiles without errors locally
  • All tests pass locally
  • CI workflows execute properly
  • New tests added, especially if they will fail without my changes
  • Added/extended example(s) to cover this functionality

@vinser52 vinser52 requested a review from a team as a code owner November 20, 2024 09:57
@vinser52 vinser52 force-pushed the svinogra_l0_params branch 3 times, most recently from 863a73f to eb215fe Compare November 20, 2024 12:44
@vinser52 vinser52 force-pushed the svinogra_l0_params branch 2 times, most recently from 2da4641 to 7d704a0 Compare November 20, 2024 23:11
struct DlHandleCloser {
void operator()(void *dlHandle) {
if (dlHandle) {
// Reset all function pointers to no-op stubs in case the library
// but some other global object still try to call Level Zero functions.
libze_ops.zeInit = [](auto... args) { return noop_stub(args...); };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, the issue here is that the library is unloaded (when we use dlopen) before we call destroyContext.

Should we also add something similar to the L0 provider itself? For example, I can imagine someone (SYCL) trying to free memory in the global object destructor. @vinser52 could you extend the test to see if calling umfMemoryProviderFree() in the global object dtor works? (assuming the provider itself is also handled by that global object)

@pbalcer any thoughts on that? do you know if there were any changes in how SYCL destroys global objects on windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, the issue here is that the library is unloaded (when we use dlopen) before we call destroyContext.

Correct. Our test creates the L0 context and stores it in a global object. The test also opens L0 library via dlopen and stores the handle to it in a global std::unique_ptr'. When the test ends, the global objects start destructing. The unique_ptr' with dlHandle' destructs first and calls dlclose' for the L0 library. Then the global object with the L0 context is destroyed and calls the libze_ops.zeContextDestroy which is already invalid. Such order of destruction happens in our CI only on Windows, but I believe it is also possible on Linux.

Should we also add something similar to the L0 provider itself? For example, I can imagine someone (SYCL) trying to free memory in the global object destructor.

Yeah, I also thought about such scenario. But L0 provider does not control load/unload of L0 library, the L0 provider just calls dlsym. So in general it is a client responsibility to make sure that L0 library is unloaded after L0 provider is destroyed.
When a user uses L0 Provider there is the following sequence of actions to create L0 provider:

  1. L0 library should be loaded.
  2. Create L0 context.
  3. Create L0 provider.
  4. allocate memory from L0 provider

The destruction should happens in the reverse order:

  1. Deallocate all memory to L0 provider.
  2. Destroy L0 provider.
  3. Destroy L0 context.
  4. L0 library is unloaded.

If the destruction order is broken it is an issue on the client side. On the UMF level we cannot control that. The L0 provider does not know when L0 library is unloaded, as a result, we do not know when to substitute real callbacks with stubs.

@vinser52 could you extend the test to see if calling umfMemoryProviderFree() in the global object dtor works? (assuming the provider itself is also handled by that global object)

We can add such test, but as I described below there is nothing we can do on the UMF side. The test will just test itself (not the UMF implementation) how well it is handle destruction order. Anyway I think it is should be a separate PR.

@lukaszstolarczuk lukaszstolarczuk merged commit a93d93d into oneapi-src:main Nov 25, 2024
78 checks passed
@vinser52 vinser52 deleted the svinogra_l0_params branch February 4, 2025 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants