-
Notifications
You must be signed in to change notification settings - Fork 35
add Level Zero memory provider #237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@bratpiorka can you describe what's your plan regarding integration with UR? Do we want to have a duplicate provider in UMF for now or are you aiming to use this provider in UR? If it's the latter, there are more features you need to support: see the original implementation here https://github.com/oneapi-src/unified-runtime/blob/main/source/adapters/level_zero/usm.cpp#L170 |
Also, the L0 provider should be a separate static library: see #225 |
84baf5b
to
5957ace
Compare
@igchor thanks for the links. What do you think about creating a basic functionality first and then adding more features (in separate PRs) so it could finally replace the providers used in UR? |
Sure, that sounds reasonable, I just wanted to highlight that there might be some gaps with the current implementation. |
55a88e9
to
eddecb1
Compare
But what is the issue of migrating the whole UR implementation as is? Probably with some minor tweaks |
This version of the Level Zero provider is very simple and is intended to be used by oneCCL, but does not contain any oneCCL-specific code. On the other hand, the UR provider contains a lot of UR-specific code and is quite complex. Making it generic would require a lot of work. Also keep in mind that most of the contents of this PR are additional files needed by umf: workflows, cmake files, def/map etc. So even with this PR porting is still an option - it wouldn't change a lot of code. |
My point is since UR already uses UMF and we agreed to move L0 provider impl from UR repo to UMF we should migrate it first (of course UR specific code should be cleaned up) and make sure we are not breaking UR flow. And then we can test if it fits for oneCCL needs. |
Will it introduce any runtime dependencies to some L0 components? It is important in context of MPI, today they have no link dependency to L0 they try to load L0 at runtime (using dlopen) if I am correct. I will double-check. |
oneCCL also loads L0 at runtime |
@vinser52 Okay, then in that case we'll need a dynamic version I guess. Can you add this info to the issue so that we don't miss it? |
I looked into the code more deeply + consulted with @igchor and we could break the existing L0 provider in UR into two providers: a UR-specific top-level provider (implemented in UR) and a low-level generic L0 provider that just translates UMF provider API to ZE calls. With this approach, the provider implementation in this PR should be really simple and generic so it could be used by both UR and oneCCL |
eddecb1
to
981487d
Compare
I have the same idea and wanted to discuss it in our sync today. Looks like we are on the same page. |
Sure, I will update the issue. But, the dynamic version can be omitted if our static version will try to dlopen/dlsym |
9ab1e48
to
15e3522
Compare
cf34132
to
e66b0bc
Compare
e66b0bc
to
5e35703
Compare
5e35703
to
92b4420
Compare
Can we also have a similar test for when the application links with Level Zero normally (not using dlopen)? I.e. just have a second test that just assign all those functions pointers in libze_ops to &zeInit, &zeDriverGet, etc? |
92b4420
to
5ad6edb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@igchor done |
5ad6edb
to
1dd0cb7
Compare
d8e07c3
to
799e4fc
Compare
@vinser52 I have resolved the discussions started by you, as we agreed |
799e4fc
to
5330b56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not build on WSL Ubuntu:
/home/ldorau/work/unified-memory-framework/src/provider/provider_level_zero.c: In function ‘init_ze_global_state’:
/home/ldorau/work/unified-memory-framework/src/provider/provider_level_zero.c:48:48: error: ‘RTLD_DEFAULT’ undeclared (first use in this function)
48 | *(void **)&g_ze_ops.zeMemAllocHost = dlsym(RTLD_DEFAULT, "zeMemAllocHost");
| ^~~~~~~~~~~~
/home/ldorau/work/unified-memory-framework/src/provider/provider_level_zero.c:48:48: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [src/CMakeFiles/umf.dir/build.make:328: src/CMakeFiles/umf.dir/provider/provider_level_zero.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1153: src/CMakeFiles/umf.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
5330b56
to
d5c014e
Compare
@ldorau please check now. BTW we definitely should add WSL to our CI |
d5c014e
to
cdf96c4
Compare
cdf96c4
to
ba84ef3
Compare
This PR adds a Level Zero memory provider that supports #225 enhancement. TODO:
Please note that shared memory would be supported in separate PR