-
Notifications
You must be signed in to change notification settings - Fork 35
Update Level Zero provider config API #920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f39c009
to
222fd7b
Compare
863a73f
to
eb215fe
Compare
2da4641
to
7d704a0
Compare
7d704a0
to
a0fac94
Compare
a0fac94
to
62503c6
Compare
struct DlHandleCloser { | ||
void operator()(void *dlHandle) { | ||
if (dlHandle) { | ||
// Reset all function pointers to no-op stubs in case the library | ||
// but some other global object still try to call Level Zero functions. | ||
libze_ops.zeInit = [](auto... args) { return noop_stub(args...); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, the issue here is that the library is unloaded (when we use dlopen) before we call destroyContext.
Should we also add something similar to the L0 provider itself? For example, I can imagine someone (SYCL) trying to free memory in the global object destructor. @vinser52 could you extend the test to see if calling umfMemoryProviderFree() in the global object dtor works? (assuming the provider itself is also handled by that global object)
@pbalcer any thoughts on that? do you know if there were any changes in how SYCL destroys global objects on windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, the issue here is that the library is unloaded (when we use dlopen) before we call destroyContext.
Correct. Our test creates the L0 context and stores it in a global object. The test also opens L0 library via dlopen and stores the handle to it in a global std::unique_ptr'. When the test ends, the global objects start destructing. The
unique_ptr' with dlHandle' destructs first and calls
dlclose' for the L0 library. Then the global object with the L0 context is destroyed and calls the libze_ops.zeContextDestroy
which is already invalid. Such order of destruction happens in our CI only on Windows, but I believe it is also possible on Linux.
Should we also add something similar to the L0 provider itself? For example, I can imagine someone (SYCL) trying to free memory in the global object destructor.
Yeah, I also thought about such scenario. But L0 provider does not control load/unload of L0 library, the L0 provider just calls dlsym
. So in general it is a client responsibility to make sure that L0 library is unloaded after L0 provider is destroyed.
When a user uses L0 Provider there is the following sequence of actions to create L0 provider:
- L0 library should be loaded.
- Create L0 context.
- Create L0 provider.
- allocate memory from L0 provider
The destruction should happens in the reverse order:
- Deallocate all memory to L0 provider.
- Destroy L0 provider.
- Destroy L0 context.
- L0 library is unloaded.
If the destruction order is broken it is an issue on the client side. On the UMF level we cannot control that. The L0 provider does not know when L0 library is unloaded, as a result, we do not know when to substitute real callbacks with stubs.
@vinser52 could you extend the test to see if calling umfMemoryProviderFree() in the global object dtor works? (assuming the provider itself is also handled by that global object)
We can add such test, but as I described below there is nothing we can do on the UMF side. The test will just test itself (not the UMF implementation) how well it is handle destruction order. Anyway I think it is should be a separate PR.
62503c6
to
a2a812a
Compare
a2a812a
to
8462487
Compare
8462487
to
447093d
Compare
Description
Change the L0 provider config as we agreed in #844
Checklist