[UR][L0] Add UR_L0_LEAKS_DEBUG key #11820

jandres742 · 2023-11-08T16:47:47Z

Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in the UR L0 adapter, instead of relying on a specific value being set in UR_L0_DEBUG.

smaslov-intel · 2023-11-08T19:09:13Z

sycl/test-e2e/Graph/Explicit/add_node_while_recording.cpp

+// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
+// RUN: %if ext_oneapi_level_zero %{env UR_L0_LEAKS_DEBUG=1 %{run} %t.out 2>&1 | FileCheck %s %}
 //
 // CHECK-NOT: LEAK


Why are we doing this explicitly in some (many) tests? I thought we run all tests twice, and one of the runs is with ZE_DEBUG=4, this way detecting leak in all tests.

@smaslov-intel : I guess that is a question for the team running or designing the tests? Here I have changed only the references to ZE_DEBUG=4 to UR_L0_LEAKS_DEBUG=1, but whether they should be being used at all is something they are better positioned to answer. Let me know what you think.

Fair point, I will approve this change. But I'd like to hear from @intel/sycl-graphs-reviewers I guess about the motivation here.

Another point is that we should synchronize with how we run this testing for leaks, it should now use UR_L0_DEBUG instead of ZE_DEBUG.

Agree, this will help us start moving to UR env vars.

I thought we run all tests twice, and one of the runs is with ZE_DEBUG=4, this way detecting leak in all tests.

We introduced this leak checking when we previously had a leak of L0 events in our UR command-buffer backend for graphs, and wanted to prevent this from regressing (or issues being introduced in new tests). Crucially, this was also at a time when we were developing on a fork of DPC++, so didn't have CI to catch non-default test configurations and wanted to catch the leaks during local development.

Could you point me to where we are running the tests a second time with ZE_DEBUG=4 (or setting ur_l0_leaks_debug to pass to lit)? I think the suspicion that the graphs tests don't need their own extra leak checks is probably correct if that's the case.

thanks @EwanC .

@smaslov-intel : i think that question would be for you:

Could you point me to where we are running the tests a second time with ZE_DEBUG=4 (or setting ur_l0_leaks_debug to pass to lit)? I think the suspicion that the graphs tests don't need their own extra leak checks is probably correct if that's the case.

To help unblock this discussion, if you know where this second run is happening @smaslov-intel, then create a GitHub Issue with that information and assign it to me, and I'll make this change to the graphs to remove the leak checking run from the graphs E2E tests.

It appears there is no "second run" currently in this CI.

Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in the UR L0 adapter, instead of relying on a specific value being set in UR_L0_DEBUG. Signed-off-by: Jaime Arteaga <[email protected]>

kbenzie · 2023-11-22T11:28:27Z

I've merged oneapi-src/unified-runtime#1053 now, due to timezones I'll update the UR tag in this PR myself so we can run testing earlier.

aarongreig

UR LGTM

kbenzie · 2023-11-22T13:14:03Z

The graph test failures here were added recently so likely need updated in the same way the other tests have been in this PR. I'll try reproducing locally and applying the fix here if I'm able to.

kbenzie · 2023-11-22T16:12:19Z

@intel/llvm-reviewers-runtime please review

jandres742

+1

jandres742 · 2023-11-22T16:15:50Z

The graph test failures here were added recently so likely need updated in the same way the other tests have been in this PR. I'll try reproducing locally and applying the fix here if I'm able to.

thanks @kbenzie !

aelovikov-intel

I think proper separation would be UR_L0_TRACE vs UR_L0_VERIFY so that validation layer could be enabled in the latter instead of UR_L0_DEBUG which would have the same issues as before this PR. I won't insist on that part though because I don't remember a single SYCL RT issue caught by such validation and I can live without it in our testing.

aelovikov-intel · 2023-11-22T17:17:32Z

sycl/test-e2e/Plugin/interop-level-zero-buffer.cpp

@@ -3,7 +3,7 @@
 // account direct calls to L0 API.
 // UNSUPPORTED: ze_debug


This and same in other files needs to be removed (or rather changed to ur_l0_debug) as part of this PR.

I can remove that.

aelovikov-intel · 2023-11-22T17:18:03Z

sycl/test-e2e/lit.cfg.py

+if lit_config.params.get('ur_l0_debug'):
+    config.ur_l0_debug = lit_config.params.get('ur_l0_debug')
+    lit_config.note("UR_L0_DEBUG: "+config.ur_l0_debug)


I don't think we need this, because it would result in the same issue we're trying to fix.

thansk @aelovikov-intel . I'm not familiar with this code, so I will remove it if wanted.

@smaslov-intel , @EwanC : what do you think?

I think it is good to have the ability to request debug traces with an option

I don't really have an opinion on this. Presumably a user could pass lit --param extra_environment="UR_L0_DEBUG=1" to request debug traces regardless, but I don't object to having a convenience option.

jandres742 · 2023-11-22T17:53:14Z

I think proper separation would be UR_L0_TRACE vs UR_L0_VERIFY so that validation layer could be enabled in the latter instead of UR_L0_DEBUG which would have the same issues as before this PR. I won't insist on that part though because I don't remember a single SYCL RT issue caught by such validation and I can live without it in our testing.

that's a good idea. I think we could do that in a follow-up PR to further separate the functionality of UR_L0_DEBUG.

kbenzie · 2023-11-23T11:32:23Z

Seems like some changes are needed on this PR. I'd really appriciate if we could get this in a ready to marge state ASAP though as this is currently a blocker for #11718 and then #11893 which are also both intended for the next release.

jandres742 · 2023-11-26T02:28:04Z

@againull, @intel/llvm-gatekeepers : could you help us with the approval and merge of this PR? We cannot bring any change from UR until this is merged.

steffenlarsen · 2023-11-27T08:46:45Z

@aelovikov-intel has open topics on this PR. I will let him decide if his concerns have been addressed.

kbenzie · 2023-11-27T10:33:29Z

@aelovikov-intel has open topics on this PR. I will let him decide if his concerns have been addressed.

If there are unresolved things could we create and assign issues to be followed up on later so we can get UR PR's moving again?

steffenlarsen

Changes look fine to me. I don't understand the exact concern in #11820 (comment) but my reading is that it doesn't cause any new issues, so in the name of unblocking UR I am okay with a post-commit review and corresponding issues.

jandres742 · 2023-11-27T23:23:55Z

thanks @steffenlarsen ! Correct, those comments can be done on follow-ups, as @kbenzie indicated.

intel-llvm CI run for adding Command Buffers to the OpenCL Adapter in Unified Runtime - oneapi-src/unified-runtime#966 Also completes follow-on work identified in #11599 to add an OpenCL section to the SYCL-Graphs docs and update the e2e Graph tests. Updating the tests has since been completed in a separate PR - #11877 Depends on #11820 merging first. --------- Co-authored-by: Pablo Reble <[email protected]> Co-authored-by: Ewan Crawford <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in the UR L0 adapter, instead of relying on a specific value being set in UR_L0_DEBUG. --------- Signed-off-by: Jaime Arteaga <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

intel-llvm CI run for adding Command Buffers to the OpenCL Adapter in Unified Runtime - oneapi-src/unified-runtime#966 Also completes follow-on work identified in intel#11599 to add an OpenCL section to the SYCL-Graphs docs and update the e2e Graph tests. Updating the tests has since been completed in a separate PR - intel#11877 Depends on intel#11820 merging first. --------- Co-authored-by: Pablo Reble <[email protected]> Co-authored-by: Ewan Crawford <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

jandres742 mentioned this pull request Nov 8, 2023

[UR][L0] Add UR_L0_LEAKS_DEBUG key oneapi-src/unified-runtime#1053

Merged

jandres742 temporarily deployed to WindowsCILock November 8, 2023 16:51 — with GitHub Actions Inactive

jandres742 had a problem deploying to WindowsCILock November 8, 2023 17:48 — with GitHub Actions Failure

smaslov-intel reviewed Nov 8, 2023

View reviewed changes

jandres742 force-pushed the url0leakkey branch from 952d48e to 5bfbbfc Compare November 8, 2023 20:37

jandres742 temporarily deployed to WindowsCILock November 8, 2023 20:39 — with GitHub Actions Inactive

jandres742 temporarily deployed to WindowsCILock November 8, 2023 21:48 — with GitHub Actions Inactive

jandres742 force-pushed the url0leakkey branch from 5bfbbfc to d4f4951 Compare November 8, 2023 23:52

jandres742 temporarily deployed to WindowsCILock November 8, 2023 23:55 — with GitHub Actions Inactive

[UR][L0] Add UR_L0_LEAKS_DEBUG key

d4f4951

Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in the UR L0 adapter, instead of relying on a specific value being set in UR_L0_DEBUG. Signed-off-by: Jaime Arteaga <[email protected]>

jandres742 temporarily deployed to WindowsCILock November 9, 2023 01:09 — with GitHub Actions Inactive

kbenzie added 2 commits November 22, 2023 11:35

Merge remote-tracking branch 'origin/sycl' into url0leakkey

7ebc30f

[UR] Bump tag to 31b654f

14031f6

kbenzie marked this pull request as ready for review November 22, 2023 11:38

kbenzie requested review from a team as code owners November 22, 2023 11:38

kbenzie requested a review from againull November 22, 2023 11:38

aarongreig approved these changes Nov 22, 2023

View reviewed changes

EwanC approved these changes Nov 22, 2023

View reviewed changes

kbenzie temporarily deployed to WindowsCILock November 22, 2023 13:30 — with GitHub Actions Inactive

[Graph] Update more tests to use UR_L0_LEAKS_DEBUG

4773297

kbenzie temporarily deployed to WindowsCILock November 22, 2023 13:59 — with GitHub Actions Inactive

kbenzie temporarily deployed to WindowsCILock November 22, 2023 14:41 — with GitHub Actions Inactive

jandres742 commented Nov 22, 2023

View reviewed changes

kbenzie mentioned this pull request Nov 22, 2023

[SYCL][OpenCL] Enable graph extension on OpenCL backend #11718

Merged

aelovikov-intel reviewed Nov 22, 2023

View reviewed changes

smaslov-intel approved these changes Nov 22, 2023

View reviewed changes

EwanC mentioned this pull request Nov 27, 2023

[SYCL][Graph] Add support for host task reble/llvm#344

Closed

steffenlarsen approved these changes Nov 27, 2023

View reviewed changes

steffenlarsen merged commit 1e1801d into intel:sycl Nov 27, 2023

kbenzie mentioned this pull request Dec 15, 2023

Unified Runtime v0.8.2 changes combine #12192

Closed

17 tasks

		@@ -3,7 +3,7 @@
		// account direct calls to L0 API.
		// UNSUPPORTED: ze_debug

[UR][L0] Add UR_L0_LEAKS_DEBUG key #11820

[UR][L0] Add UR_L0_LEAKS_DEBUG key #11820

Uh oh!

Conversation

jandres742 commented Nov 8, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbenzie commented Nov 22, 2023

Uh oh!

aarongreig left a comment

Choose a reason for hiding this comment

Uh oh!

kbenzie commented Nov 22, 2023

Uh oh!

kbenzie commented Nov 22, 2023

Uh oh!

jandres742 left a comment

Choose a reason for hiding this comment

Uh oh!

jandres742 commented Nov 22, 2023

Uh oh!

aelovikov-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jandres742 commented Nov 22, 2023

Uh oh!

kbenzie commented Nov 23, 2023

Uh oh!

jandres742 commented Nov 26, 2023

Uh oh!

steffenlarsen commented Nov 27, 2023

Uh oh!

kbenzie commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

jandres742 commented Nov 27, 2023

Uh oh!

Uh oh!

kbenzie commented Nov 27, 2023 •

edited

Loading