-
Notifications
You must be signed in to change notification settings - Fork 787
[UR][L0] Add UR_L0_LEAKS_DEBUG key #11820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG | ||
// RUN: %if ext_oneapi_level_zero %{env UR_L0_LEAKS_DEBUG=1 %{run} %t.out 2>&1 | FileCheck %s %} | ||
// | ||
// CHECK-NOT: LEAK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this explicitly in some (many) tests? I thought we run all tests twice, and one of the runs is with ZE_DEBUG=4, this way detecting leak in all tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smaslov-intel : I guess that is a question for the team running or designing the tests? Here I have changed only the references to ZE_DEBUG=4 to UR_L0_LEAKS_DEBUG=1, but whether they should be being used at all is something they are better positioned to answer. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, I will approve this change. But I'd like to hear from @intel/sycl-graphs-reviewers I guess about the motivation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point is that we should synchronize with how we run this testing for leaks, it should now use UR_L0_DEBUG instead of ZE_DEBUG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this will help us start moving to UR env vars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we run all tests twice, and one of the runs is with ZE_DEBUG=4, this way detecting leak in all tests.
We introduced this leak checking when we previously had a leak of L0 events in our UR command-buffer backend for graphs, and wanted to prevent this from regressing (or issues being introduced in new tests). Crucially, this was also at a time when we were developing on a fork of DPC++, so didn't have CI to catch non-default test configurations and wanted to catch the leaks during local development.
Could you point me to where we are running the tests a second time with ZE_DEBUG=4
(or setting ur_l0_leaks_debug
to pass to lit)? I think the suspicion that the graphs tests don't need their own extra leak checks is probably correct if that's the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @EwanC .
@smaslov-intel : i think that question would be for you:
Could you point me to where we are running the tests a second time with ZE_DEBUG=4 (or setting ur_l0_leaks_debug to pass to lit)? I think the suspicion that the graphs tests don't need their own extra leak checks is probably correct if that's the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help unblock this discussion, if you know where this second run is happening @smaslov-intel, then create a GitHub Issue with that information and assign it to me, and I'll make this change to the graphs to remove the leak checking run from the graphs E2E tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears there is no "second run" currently in this CI.
952d48e
to
5bfbbfc
Compare
5bfbbfc
to
d4f4951
Compare
Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in the UR L0 adapter, instead of relying on a specific value being set in UR_L0_DEBUG. Signed-off-by: Jaime Arteaga <[email protected]>
I've merged oneapi-src/unified-runtime#1053 now, due to timezones I'll update the UR tag in this PR myself so we can run testing earlier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UR LGTM
The graph test failures here were added recently so likely need updated in the same way the other tests have been in this PR. I'll try reproducing locally and applying the fix here if I'm able to. |
@intel/llvm-reviewers-runtime please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think proper separation would be UR_L0_TRACE
vs UR_L0_VERIFY
so that validation layer could be enabled in the latter instead of UR_L0_DEBUG
which would have the same issues as before this PR. I won't insist on that part though because I don't remember a single SYCL RT issue caught by such validation and I can live without it in our testing.
@@ -3,7 +3,7 @@ | |||
// account direct calls to L0 API. | |||
// UNSUPPORTED: ze_debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and same in other files needs to be removed (or rather changed to ur_l0_debug) as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove that.
if lit_config.params.get('ur_l0_debug'): | ||
config.ur_l0_debug = lit_config.params.get('ur_l0_debug') | ||
lit_config.note("UR_L0_DEBUG: "+config.ur_l0_debug) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this, because it would result in the same issue we're trying to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thansk @aelovikov-intel . I'm not familiar with this code, so I will remove it if wanted.
@smaslov-intel , @EwanC : what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to have the ability to request debug traces with an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have an opinion on this. Presumably a user could pass lit --param extra_environment="UR_L0_DEBUG=1"
to request debug traces regardless, but I don't object to having a convenience option.
that's a good idea. I think we could do that in a follow-up PR to further separate the functionality of UR_L0_DEBUG. |
@againull, @intel/llvm-gatekeepers : could you help us with the approval and merge of this PR? We cannot bring any change from UR until this is merged. |
@aelovikov-intel has open topics on this PR. I will let him decide if his concerns have been addressed. |
If there are unresolved things could we create and assign issues to be followed up on later so we can get UR PR's moving again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look fine to me. I don't understand the exact concern in #11820 (comment) but my reading is that it doesn't cause any new issues, so in the name of unblocking UR I am okay with a post-commit review and corresponding issues.
thanks @steffenlarsen ! Correct, those comments can be done on follow-ups, as @kbenzie indicated. |
intel-llvm CI run for adding Command Buffers to the OpenCL Adapter in Unified Runtime - oneapi-src/unified-runtime#966 Also completes follow-on work identified in #11599 to add an OpenCL section to the SYCL-Graphs docs and update the e2e Graph tests. Updating the tests has since been completed in a separate PR - #11877 Depends on #11820 merging first. --------- Co-authored-by: Pablo Reble <[email protected]> Co-authored-by: Ewan Crawford <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in the UR L0 adapter, instead of relying on a specific value being set in UR_L0_DEBUG. --------- Signed-off-by: Jaime Arteaga <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
intel-llvm CI run for adding Command Buffers to the OpenCL Adapter in Unified Runtime - oneapi-src/unified-runtime#966 Also completes follow-on work identified in intel#11599 to add an OpenCL section to the SYCL-Graphs docs and update the e2e Graph tests. Updating the tests has since been completed in a separate PR - intel#11877 Depends on intel#11820 merging first. --------- Co-authored-by: Pablo Reble <[email protected]> Co-authored-by: Ewan Crawford <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in the UR L0 adapter, instead of relying on a specific value being set in UR_L0_DEBUG.