-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][E2E] Enable llvm-lit to accept device architecture #18197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][E2E] Enable llvm-lit to accept device architecture #18197
Conversation
What problem are you trying to solve? Empty PR description and no comments in the change don't help to understand the idea behind this. |
Ah sorry. You are right. The issue is that on devices with say a battlemage GPU and an intel intergrated iGPU, when running llvm-lit with This is a problem with bindless images as the aspect Currently, sycl/test-e2e/lit.cfg.py does not support the |
First, that should go to the PR description, not the comment. Second, please add an example usage, e.g., sycl-ls shows this, we run llvm-lit like that, and it does the following (and print detected aspects from its output). |
I am putting this as a comment here as it feels like a lot to put in the commit message. Though I can put it there if you prefer or a cut down version. I might be going a bit too verbose. On the system I am using, when running sycl-ls like this it shows these devices as available: ./bin/sycl-ls`
I am abbreviating this, but adding
And the intel Intel(R) UHD Graphics 770:
When running llvm-lit via this command: I get the following output before the tests start running:
Then, running llvm-lit using Get the following output:
As can be seen with the llvm-lit command using As far as I am aware, it is not currently possible to make llvm-lit sycl run tests that require an aspect that is available on a device selectable via Enabling the |
@aelovikov-intel I apologize for taking so long. Other work got in the way. I might have gone overboard with all the aspects I am showing but I felt I would be leaving out context without them. I appreciate your feedback if you want anything else. |
I don't like I'd like to see the following logic: if we have multiple However, that won't work with multi-tile GPUs that are seen as separate root devices. Maybe "only one" above is unnecessary and we should always select a numbered device out of several options matching a given WDYT? + @sarnex , @uditagarwal97 |
I agree, I don't like using the device ID number, totally agree we could silently loose coverage. I like the But since we don't have that, a LIT param is fine. For the multi-tile case, I'm sure we can figure it out. The devices will probably have the same UUID or PCI bus address or something so we can detect it's multi-tile. |
This seems like a good solution. The only thing is because I am thinking to not require to put as many |
I think we can do a sensible simple default and then expand it in the future if we need it. Maybe we could just have a way to specify Maybe something like
and we would apply the |
I think CPU arch isn't useful. Can you imagine a system with two different CPUs on it? We aren't MPI... |
Yeah I don't have a strong opinion, for now we could only support GPUs. Maybe name the option |
I have been investigating implementing this and I believe internally I will need to make it so that when |
That's what I was thinking. Maybe even go further and do a switch anytime there are multiple devices matching |
…cture Allow for selection of spercific devices via 'require_device_arch' which accepts device architectures listed under 'sycl-ls --verbose' and filters devices selected uisng 'sycl_devices'. Devices with multiple GPU's under the devices with multiple GPU's under the device selection style of 'backend:device type' such as 'level_zero:gpu' can cause incorrect aspects to be marked as unsupported in the tests ran through 'llvm-lit' using '// REQUIRES: aspect-'. In addition, only the first device listed via 'sycl-ls' under the label is ran. This occurs because using the device selection style of 'level_zero:gpu' there is currently no way to choose a spercific device that falls under that category. Such as a battlemage GPU and an intel iGPU. 'sycl/test-e2e/lit.cfg.py' takes all available device aspects and marks aspects as supported that are available for all the devices. This is problematic for aspects such as 'ext_oneapi_bindless_images' which is supported on battlemage but not Intel iGPU's. Even if battlemage, which supports bindless images is used to run all the tests as it comes first in the device list shown under 'level_zero:gpu', tests requiring that aspect will not run.
@sarnex @aelovikov-intel Here is the new version using I designed it so you can only filter devices on a one to one basis. Meaning that if say you want to run both Feedback is appreciated! This design is perhaps slightly different then the design discussed in this thread before but should be a good starting point. Apologies if the python code is not very pythonic. Have not written in it for a while but is nice to get some more experience. |
Brainstorming... What if we do it like this: |
yeah i like andrei's idea, today with |
Ooh, that actually could work. Internally would have to check for devices via |
I think I would have to make it so that if the second part of available_devices = {
"opencl": ("cpu", "gpu", "fpga"),
"cuda": "gpu",
"level_zero": "gpu",
"hip": "gpu",
"native_cpu": "cpu",
} then assume it is a device architecture specific filter. |
My assumption was that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough to me, none of us are python experts (to the best of my knowledge).
@sarnex ?
I missed a third |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not a python export but lgtm minus a minor question
|
||
detected_architectures = [] | ||
|
||
for line in get_sycl_ls_verbose(backend + ":*", env): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i really double this actually matters but is it possible to prevent a second call to sycl-ls
here and capture what we need in the first call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we use different device filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah in this one we set a filter but it seems the other one has no device filter, so it's output should contain the info we need here, but if its not trivial to implement its not worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first sycl-ls, line 638 in the code is used to get all available devices if all
is used so no filter is passed.
The second sycl-ls, line 882 is only called if the string arch-
appears in the device filter. We could also grab the aspects at the same time but that feels messy.
Lastly, sycl-ls at line 938 is called again with all the filtered device filters to set available aspects and other properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prob could remove one of the calls to sycl-ls in the code but I think the separation of concerns with filtering the device filters and then using them to query sycl-ls again to set test parameters makes things less tangled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, thanks
Once you fix the code formatting CI fail we can merge this IMO |
I knew I forgot something. Had to install the black formatter extension. Should be good now. |
thanks, this is a really useful feature! |
Thanks! There is a very minor documentation update i missed at line 281 of sycl/test-e2e/README.md just to explain that device selection like |
Np, thanks, take it easy! |
The refactor to call
|
Sorry Harald, just got back from holiday. Will fix this. Need to be more careful jumping from a compiled to interpreted language with things like this... |
@RossBrunton also separately encountered this and already put up a fix as #18535 🙂 |
Ah, ok lol |
Allow for selection of specific devices via
level_zero:arch-intel_gpu_bmg_g21
which accepts device architectures listed undersycl-ls --verbose
. Devices with multiple GPU's under the devices with multiple GPU's under the device selection style ofbackend:*device type*
such aslevel_zero:gpu
can cause incorrect aspects to be marked as unsupported in the tests ran throughllvm-lit
using// REQUIRES: aspect-
. In addition, only the first device listed viasycl-ls
under the label is ran.This occurs because using the device selection style of
level_zero:gpu
there is currently no way to choose a specific device that falls under that category. Such as a battlemage GPU and an intel iGPU.sycl/test-e2e/lit.cfg.py
takes all available device aspects and marks aspects as supported that are available for all the devices. This is problematic for aspects such asext_oneapi_bindless_images
which is supported on battlemage but not Intel iGPU's. Even if battlemage, which supports bindless images is used to run all the tests as it comes first in the device list shown underlevel_zero:gpu
, tests requiring that aspect will not run.