Skip to content

[UR][L0] Add UR_L0_LEAKS_DEBUG key #11820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 27, 2023
Merged

Conversation

jandres742
Copy link
Contributor

Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in the UR L0 adapter, instead of relying on a specific value being set in UR_L0_DEBUG.

Comment on lines +4 to 7
// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
// RUN: %if ext_oneapi_level_zero %{env UR_L0_LEAKS_DEBUG=1 %{run} %t.out 2>&1 | FileCheck %s %}
//
// CHECK-NOT: LEAK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this explicitly in some (many) tests? I thought we run all tests twice, and one of the runs is with ZE_DEBUG=4, this way detecting leak in all tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smaslov-intel : I guess that is a question for the team running or designing the tests? Here I have changed only the references to ZE_DEBUG=4 to UR_L0_LEAKS_DEBUG=1, but whether they should be being used at all is something they are better positioned to answer. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I will approve this change. But I'd like to hear from @intel/sycl-graphs-reviewers I guess about the motivation here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is that we should synchronize with how we run this testing for leaks, it should now use UR_L0_DEBUG instead of ZE_DEBUG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this will help us start moving to UR env vars.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we run all tests twice, and one of the runs is with ZE_DEBUG=4, this way detecting leak in all tests.

We introduced this leak checking when we previously had a leak of L0 events in our UR command-buffer backend for graphs, and wanted to prevent this from regressing (or issues being introduced in new tests). Crucially, this was also at a time when we were developing on a fork of DPC++, so didn't have CI to catch non-default test configurations and wanted to catch the leaks during local development.

Could you point me to where we are running the tests a second time with ZE_DEBUG=4 (or setting ur_l0_leaks_debug to pass to lit)? I think the suspicion that the graphs tests don't need their own extra leak checks is probably correct if that's the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @EwanC .

@smaslov-intel : i think that question would be for you:

Could you point me to where we are running the tests a second time with ZE_DEBUG=4 (or setting ur_l0_leaks_debug to pass to lit)? I think the suspicion that the graphs tests don't need their own extra leak checks is probably correct if that's the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help unblock this discussion, if you know where this second run is happening @smaslov-intel, then create a GitHub Issue with that information and assign it to me, and I'll make this change to the graphs to remove the leak checking run from the graphs E2E tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears there is no "second run" currently in this CI.

Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in
the UR L0 adapter, instead of relying on a specific value being set in
UR_L0_DEBUG.

Signed-off-by: Jaime Arteaga <[email protected]>
@kbenzie
Copy link
Contributor

kbenzie commented Nov 22, 2023

I've merged oneapi-src/unified-runtime#1053 now, due to timezones I'll update the UR tag in this PR myself so we can run testing earlier.

@kbenzie kbenzie marked this pull request as ready for review November 22, 2023 11:38
@kbenzie kbenzie requested review from a team as code owners November 22, 2023 11:38
@kbenzie kbenzie requested a review from againull November 22, 2023 11:38
Copy link
Contributor

@aarongreig aarongreig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR LGTM

@kbenzie
Copy link
Contributor

kbenzie commented Nov 22, 2023

The graph test failures here were added recently so likely need updated in the same way the other tests have been in this PR. I'll try reproducing locally and applying the fix here if I'm able to.

@kbenzie
Copy link
Contributor

kbenzie commented Nov 22, 2023

@intel/llvm-reviewers-runtime please review

Copy link
Contributor Author

@jandres742 jandres742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@jandres742
Copy link
Contributor Author

The graph test failures here were added recently so likely need updated in the same way the other tests have been in this PR. I'll try reproducing locally and applying the fix here if I'm able to.

thanks @kbenzie !

Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think proper separation would be UR_L0_TRACE vs UR_L0_VERIFY so that validation layer could be enabled in the latter instead of UR_L0_DEBUG which would have the same issues as before this PR. I won't insist on that part though because I don't remember a single SYCL RT issue caught by such validation and I can live without it in our testing.

@@ -3,7 +3,7 @@
// account direct calls to L0 API.
// UNSUPPORTED: ze_debug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and same in other files needs to be removed (or rather changed to ur_l0_debug) as part of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove that.

Comment on lines +139 to +141
if lit_config.params.get('ur_l0_debug'):
config.ur_l0_debug = lit_config.params.get('ur_l0_debug')
lit_config.note("UR_L0_DEBUG: "+config.ur_l0_debug)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this, because it would result in the same issue we're trying to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thansk @aelovikov-intel . I'm not familiar with this code, so I will remove it if wanted.

@smaslov-intel , @EwanC : what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to have the ability to request debug traces with an option

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have an opinion on this. Presumably a user could pass lit --param extra_environment="UR_L0_DEBUG=1" to request debug traces regardless, but I don't object to having a convenience option.

@jandres742
Copy link
Contributor Author

I think proper separation would be UR_L0_TRACE vs UR_L0_VERIFY so that validation layer could be enabled in the latter instead of UR_L0_DEBUG which would have the same issues as before this PR. I won't insist on that part though because I don't remember a single SYCL RT issue caught by such validation and I can live without it in our testing.

that's a good idea. I think we could do that in a follow-up PR to further separate the functionality of UR_L0_DEBUG.

@kbenzie
Copy link
Contributor

kbenzie commented Nov 23, 2023

Seems like some changes are needed on this PR. I'd really appriciate if we could get this in a ready to marge state ASAP though as this is currently a blocker for #11718 and then #11893 which are also both intended for the next release.

@jandres742
Copy link
Contributor Author

@againull, @intel/llvm-gatekeepers : could you help us with the approval and merge of this PR? We cannot bring any change from UR until this is merged.

@steffenlarsen
Copy link
Contributor

@aelovikov-intel has open topics on this PR. I will let him decide if his concerns have been addressed.

@kbenzie
Copy link
Contributor

kbenzie commented Nov 27, 2023

@aelovikov-intel has open topics on this PR. I will let him decide if his concerns have been addressed.

If there are unresolved things could we create and assign issues to be followed up on later so we can get UR PR's moving again?

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine to me. I don't understand the exact concern in #11820 (comment) but my reading is that it doesn't cause any new issues, so in the name of unblocking UR I am okay with a post-commit review and corresponding issues.

@steffenlarsen steffenlarsen merged commit 1e1801d into intel:sycl Nov 27, 2023
@jandres742
Copy link
Contributor Author

thanks @steffenlarsen ! Correct, those comments can be done on follow-ups, as @kbenzie indicated.

sergey-semenov pushed a commit that referenced this pull request Nov 28, 2023
intel-llvm CI run for adding Command Buffers to the OpenCL Adapter in
Unified Runtime - oneapi-src/unified-runtime#966

Also completes follow-on work identified in #11599 to add an OpenCL
section to the SYCL-Graphs docs and update the e2e Graph tests. Updating
the tests has since been completed in a separate PR -
#11877

Depends on #11820 merging first.

---------

Co-authored-by: Pablo Reble <[email protected]>
Co-authored-by: Ewan Crawford <[email protected]>
Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
callumfare pushed a commit to kbenzie/intel-llvm that referenced this pull request Dec 18, 2023
Use a new environment variable, UR_L0_LEAKS_DEBUG, to check for leaks in
the UR L0 adapter, instead of relying on a specific value being set in
UR_L0_DEBUG.

---------

Signed-off-by: Jaime Arteaga <[email protected]>
Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
callumfare pushed a commit to kbenzie/intel-llvm that referenced this pull request Dec 18, 2023
intel-llvm CI run for adding Command Buffers to the OpenCL Adapter in
Unified Runtime - oneapi-src/unified-runtime#966

Also completes follow-on work identified in intel#11599 to add an OpenCL
section to the SYCL-Graphs docs and update the e2e Graph tests. Updating
the tests has since been completed in a separate PR -
intel#11877

Depends on intel#11820 merging first.

---------

Co-authored-by: Pablo Reble <[email protected]>
Co-authored-by: Ewan Crawford <[email protected]>
Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants