Skip to content

[SYCL][UR][L0] Add support for zeCommandListHostSynchronize #10003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jandres742
Copy link
Contributor

Instead of creating an event, appending, and synchronizing it on it to wait for all commands in an immediate command list, emit a zeCommandListHostSynchronize instead.

@jandres742 jandres742 temporarily deployed to aws June 21, 2023 00:44 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 21, 2023 01:25 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 21, 2023 01:55 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 21, 2023 03:36 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 21, 2023 15:33 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 21, 2023 16:40 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 21, 2023 17:37 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 21, 2023 18:38 — with GitHub Actions Inactive
@jandres742
Copy link
Contributor Author

@bader : seeing this error, is it infra related or my patch?

2023-06-21T21:01:53.2646219Z [0/1] Running SYCL End-to-End tests
2023-06-21T21:01:53.3650135Z lit.py: D:\github\_work\llvm\llvm\llvm\llvm\utils\lit\lit\llvm\config.py:46: note: using lit tools: C:\Program Files\Git\usr\bin
2023-06-21T21:01:54.0382035Z lit.py: D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\lit.cfg.py:229: note: Targeted devices: ext_oneapi_level_zero:gpu
2023-06-21T21:01:54.0396912Z lit.py: D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\lit.cfg.py:345: warning: Couldn't find pre-installed AOT device compiler ocloc
2023-06-21T21:01:54.0402609Z lit.py: D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\lit.cfg.py:345: warning: Couldn't find pre-installed AOT device compiler opencl-aot
2023-06-21T21:01:56.5082228Z lit.py: D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\lit.cfg.py:397: error: Cannot detect device aspect for ext_oneapi_level_zero:gpu
2023-06-21T21:01:56.5082565Z stdout:
2023-06-21T21:01:56.5082655Z 
2023-06-21T21:01:56.5082720Z Platforms: 0
2023-06-21T21:01:56.5085720Z default_selector()      : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5086044Z accelerator_selector()  : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5086506Z cpu_selector()          : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5086998Z gpu_selector()          : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5087299Z custom_selector(gpu)    : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5087665Z custom_selector(cpu)    : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5087950Z custom_selector(acc)    : No device of requested type available. -1 (PI_ERRO...

@bader
Copy link
Contributor

bader commented Jun 21, 2023

@bader : seeing this error, is it infra related or my patch?

2023-06-21T21:01:53.2646219Z [0/1] Running SYCL End-to-End tests
2023-06-21T21:01:53.3650135Z lit.py: D:\github\_work\llvm\llvm\llvm\llvm\utils\lit\lit\llvm\config.py:46: note: using lit tools: C:\Program Files\Git\usr\bin
2023-06-21T21:01:54.0382035Z lit.py: D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\lit.cfg.py:229: note: Targeted devices: ext_oneapi_level_zero:gpu
2023-06-21T21:01:54.0396912Z lit.py: D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\lit.cfg.py:345: warning: Couldn't find pre-installed AOT device compiler ocloc
2023-06-21T21:01:54.0402609Z lit.py: D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\lit.cfg.py:345: warning: Couldn't find pre-installed AOT device compiler opencl-aot
2023-06-21T21:01:56.5082228Z lit.py: D:\github\_work\llvm\llvm\llvm\sycl\test-e2e\lit.cfg.py:397: error: Cannot detect device aspect for ext_oneapi_level_zero:gpu
2023-06-21T21:01:56.5082565Z stdout:
2023-06-21T21:01:56.5082655Z 
2023-06-21T21:01:56.5082720Z Platforms: 0
2023-06-21T21:01:56.5085720Z default_selector()      : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5086044Z accelerator_selector()  : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5086506Z cpu_selector()          : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5086998Z gpu_selector()          : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5087299Z custom_selector(gpu)    : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5087665Z custom_selector(cpu)    : No device of requested type available. -1 (PI_ERRO...
2023-06-21T21:01:56.5087950Z custom_selector(acc)    : No device of requested type available. -1 (PI_ERRO...

@jandres742, I'm not sure what your patch is doing, but sycl-ls don't see any devices. It might be that system got broken. @intel/dpcpp-devops-reviewers, can you confirm that GPU drivers are installed correctly on intel_sycl119891 runner, please?

@jandres742 jandres742 temporarily deployed to aws June 21, 2023 23:59 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 22, 2023 00:37 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 22, 2023 05:22 — with GitHub Actions Inactive
@cperkinsintel
Copy link
Contributor

@bader - I logged into that Win machine and the GPU drivers are set up correctly.

$ sycl-ls
[opencl:gpu:0] Intel(R) OpenCL HD Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [31.0.101.4255]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.25640]

I was not using the branch from this PR, just the latest head on the SYCL branch. Note that in the output of the test run, sycl-ls is complaining that ONEAPI_DEVICE_SELECTOR was set, which severely limits it. My guess is that test driver is using ONEAPI_DEVICE_SELECTOR=level_zero:gpu when sycl-ls is run, but this PR is somehow making the level_zero plugin/loader not responsive and the result of that would be no devices listed at all.

@jandres742 jandres742 temporarily deployed to aws June 22, 2023 06:00 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 23, 2023 07:50 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws June 23, 2023 08:28 — with GitHub Actions Inactive
@@ -1358,6 +1358,10 @@ ur_result_t ur_queue_handle_t_::synchronize() {
if (ImmCmdList == Queue->CommandListMap.end())
return UR_RESULT_SUCCESS;

#if (L0_USE_ZECOMMANDLISTHOSTSYNCHRONIZE == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be a build-time decision not run-time? Can\t we check the version of the loaded L0 and run one code or another?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was to bypass while we upgrade the GPU driver, but I see now there's a way to do that. We need

ci-neo-master-026152: Update to L0 Loader v1.10.0 or higher.

Current PR for uplift in #10087

@jandres742 jandres742 force-pushed the listhostsync branch 2 times, most recently from 345ab95 to bcf389b Compare July 12, 2023 01:46
@jandres742 jandres742 temporarily deployed to aws July 12, 2023 02:01 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to aws July 12, 2023 02:39 — with GitHub Actions Inactive
@jandres742 jandres742 temporarily deployed to WindowsCILock November 1, 2023 22:23 — with GitHub Actions Inactive
Instead of creating an event, appending, and synchronizing it on it
to wait for all commands in an immediate command list, emit a
zeCommandListHostSynchronize instead.

Signed-off-by: Jaime Arteaga <[email protected]>
@jandres742 jandres742 temporarily deployed to WindowsCILock November 1, 2023 23:25 — with GitHub Actions Inactive
@jandres742
Copy link
Contributor Author

Failures on "SYCL E2E AWS CUDA" are unrelated, as changes are specific to L0. Those errors seem related to infra.

@kbenzie
Copy link
Contributor

kbenzie commented Nov 8, 2023

Superceed by #11811

@kbenzie kbenzie closed this Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants