Skip to content

[SYCL][Bindless] Fix bindless vulkan interop tests selecting wrong vulkan device #14386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DBDuncan
Copy link
Contributor

@DBDuncan DBDuncan commented Jul 2, 2024

The incorrect vulkan device can be selected when both Intel and Nvidia devices are available on the same system. Change tests to use the same device as SYCL default to.

…lkan device

The incorrect vulkan device can be selected when both Intel and Nvidia devices are available on the same system. Change tests to use the same device as SYCL default to.
@DBDuncan DBDuncan requested a review from a team as a code owner July 2, 2024 15:18
@DBDuncan DBDuncan temporarily deployed to WindowsCILock July 2, 2024 15:20 — with GitHub Actions Inactive
@DBDuncan
Copy link
Contributor Author

DBDuncan commented Jul 2, 2024

@wenju-he Hey Wenju. Added this fix to the vulkan tests to prevent the wrong vulkan device being selected. Can you test this using the Level Zero backend. Want to make sure that there is no weirdness with the sycl and vulkan devices having slightly different naming causing issues when using an Intel device.

@wenju-he
Copy link
Contributor

wenju-he commented Jul 2, 2024

@wenju-he Hey Wenju. Added this fix to the vulkan tests to prevent the wrong vulkan device being selected. Can you test this using the Level Zero backend.

@DBDuncan the change LGTM. I've tested it on DG2 linux. There is device name mismatch due to lower/upper case difference:
dev.get_info<sycl::info::device::name>() returns Intel(R) Arc(TM) A770 Graphics, while vkGetPhysicalDeviceProperties props.deviceName are:
Intel(R) Arc(tm) A770 Graphics (DG2)
llvmpipe (LLVM 15.0.7, 256 bits)
So it is probably necessary to do case insensitive compare. Then 4 subtests in sampled_images.cpp are passing on DG2 linux, same as before this PR.

@DBDuncan
Copy link
Contributor Author

DBDuncan commented Jul 3, 2024

@wenju-he Dang. TBH thought something annoying like that might happen. Thankfully yeah making the comparison case insensitive should fix the issue. Pushed a patch. Please verify that it works.

@wenju-he
Copy link
Contributor

wenju-he commented Jul 3, 2024

@DBDuncan it is working now, following is log of sampled_images.cpp (only 4 subtests are enabled):

Found suitable Vulkan device: Intel(R) Arc(tm) A770 Graphics (DG2)
All tests passed!

Copy link
Contributor

@wenju-he wenju-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DBDuncan
Copy link
Contributor Author

DBDuncan commented Jul 4, 2024

@intel/llvm-gatekeepers Should be ready to merge.

@sommerlukas sommerlukas merged commit b249afc into intel:sycl Jul 4, 2024
13 checks passed
@ph0b
Copy link
Contributor

ph0b commented Jul 4, 2024

@DBDuncan @wenju-he a mismatch could also happen with two Intel dGPUs.
Implementations should use ext::intel::info::device::uuid and VkPhysicalDeviceIDProperties::deviceUUID as https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_external_memory_capabilities.html proposes.
(and for DX12, it'd be with LUID, that L0 exposes but not SYCL)

@DBDuncan
Copy link
Contributor Author

DBDuncan commented Jul 4, 2024

@DBDuncan @wenju-he a mismatch could also happen with two Intel dGPUs. Implementations should use ext::intel::info::device::uuid and VkPhysicalDeviceIDProperties::deviceUUID as https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_external_memory_capabilities.html proposes. (and for DX12, it'd be with LUID)

Thanks for letting me know. The test is currently disabled running on intel devices so should not be an issue for now. I don't think CI has the updated drivers yet? I could be wrong. At least should prevent vulkan picking an intel device when sycl is using an nvidia device for now.

@ph0b
Copy link
Contributor

ph0b commented Jul 4, 2024

Sure it makes the test pass now but it's fundamentally not the correct approach. if one day we add CI instances with 2x nvidia GPUs it could break.
Using UUID is what applications will do so it's also relevant to have this done by the tests, in case UUID comparison would break for any reason.

@DBDuncan
Copy link
Contributor Author

DBDuncan commented Jul 4, 2024

You are right. I should had also been a lot more pessimistic about constant naming of devices between vulkan and sycl in the first place and considered multiple GPUs. I will look at updating this on Monday. Not working this Friday.

@wenju-he
Copy link
Contributor

wenju-he commented Jul 5, 2024

@DBDuncan @wenju-he a mismatch could also happen with two Intel dGPUs. Implementations should use ext::intel::info::device::uuid and VkPhysicalDeviceIDProperties::deviceUUID as https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_external_memory_capabilities.html proposes. (and for DX12, it'd be with LUID)

Thanks for letting me know. The test is currently disabled running on intel devices so should not be an issue for now. I don't think CI has the updated drivers yet? I could be wrong. At least should prevent vulkan picking an intel device when sycl is using an nvidia device for now.

@DBDuncan you're right that CI isn't enabled for bindless tests because drivers are not ready. SPIRV-LLVM-Translator commit used in latest IGC release igc-1.0.16900.23 doesn't contains llvm-spirv support for bindless image. Fortunately latest tag igc-1.0.17193.3 has the llvm-spirv commit for bindless image, so we just need to wait for the next IGC release.

@ph0b thanks for the comment. I can test following-up PR from @DBDuncan on DG2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants