Skip to content

[SYCL][Docs] Move sycl_ext_oneapi_profiling_tag to experimental #14165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

steffenlarsen
Copy link
Contributor

This commit moves the sycl_ext_oneapi_profiling_tag extension to experimental. There are some known issues with this exentsion, specifically on OpenCL (fallback solution) and on CUDA and HIP backends.

This commit moves the sycl_ext_oneapi_profiling_tag extension to
experimental. There are some known issues with this exentsion,
specifically on OpenCL (fallback solution) and on CUDA and HIP backends.

Signed-off-by: Larsen, Steffen <[email protected]>
@gmlueck
Copy link
Contributor

gmlueck commented Jun 13, 2024

There are some known issues with this exentsion, specifically on OpenCL (fallback solution) and on CUDA and HIP backends.

I assume those devices just report "false" for the ext_oneapi_queue_profiling_tag aspect?

Copy link
Contributor

@dm-vodopyanov dm-vodopyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl/source/feature_test.hpp.in LGTM

@steffenlarsen
Copy link
Contributor Author

There are some known issues with this exentsion, specifically on OpenCL (fallback solution) and on CUDA and HIP backends.

I assume those devices just report "false" for the ext_oneapi_queue_profiling_tag aspect?

Not at the moment. They sporadically report invalid results for some of the queries. We can make them use the fallback solution, but given the problem seems to be with the general event timing implementation I do not have hope that it will work much better.

@gmlueck
Copy link
Contributor

gmlueck commented Jun 24, 2024

Not at the moment. They sporadically report invalid results for some of the queries. We can make them use the fallback solution, but given the problem seems to be with the general event timing implementation I do not have hope that it will work much better.

What is the fallback solution? I thought the reason we added the ext_oneapi_queue_profiling_tag aspect was specifically for OpenCL because we didn't think we could implement the profiling tags on that backend.

I'm surprised that we cannot implement this API on CUDA. I thought that any CUDA event can be created such that it records timing information. In fact, I thought this was the default unless the event was created with cudaEventDisableTiming.

@steffenlarsen
Copy link
Contributor Author

What is the fallback solution? I thought the reason we added the ext_oneapi_queue_profiling_tag aspect was specifically for OpenCL because we didn't think we could implement the profiling tags on that backend.

By fallback solution, I mean the solution it will take if profiling-tag isn't supported and profiling is enabled on the queue. We submit a barrier and use the timestamps on it as the tag.

I'm surprised that we cannot implement this API on CUDA. I thought that any CUDA event can be created such that it records timing information. In fact, I thought this was the default unless the event was created with cudaEventDisableTiming.

The problem is not that we cannot implement it, the issue is that something in the CUDA and HIP implementations of events in UR result in odd timestamp values. When these backends gather event timings, they are calculated relative to other events, so the implementations need to create some base events for this, which I suspect is at least the source of the HIP issue (see #12904.)

@steffenlarsen
Copy link
Contributor Author

@gmlueck - Are you alright with the answer above?

@gmlueck
Copy link
Contributor

gmlueck commented Jul 11, 2024

@gmlueck - Are you alright with the answer above?

I think the conversation started when I asked about the PR description:

This commit moves the sycl_ext_oneapi_profiling_tag extension to experimental. There are some known issues with this exentsion, specifically on OpenCL (fallback solution) and on CUDA and HIP backends.

From the discussion, my understanding is:

  • There is no problem with OpenCL. We expected all along that OpenCL devices would return false for the ext_oneapi_queue_profiling_tag aspect, and this is what we do.

  • There is a bug with the CUDA and HIP UR adaptors that affects the timestamp values from events. This is not directly related to this extension because it affects all events, not just the ones returned by submit_profiling_tag.

Is that a correct summary? If so, I approve.

@steffenlarsen
Copy link
Contributor Author

  • There is no problem with OpenCL. We expected all along that OpenCL devices would return false for the ext_oneapi_queue_profiling_tag aspect, and this is what we do.

Yes, but also the profiling information for some of the OpenCL backends are wrong when profiling is enabled, so it is not completely bug-free.

  • There is a bug with the CUDA and HIP UR adaptors that affects the timestamp values from events. This is not directly related to this extension because it affects all events, not just the ones returned by submit_profiling_tag.

Indeed. The bugs are unrelated, but unfortunate.

@gmlueck
Copy link
Contributor

gmlueck commented Jul 19, 2024

Yes, but also the profiling information for some of the OpenCL backends are wrong when profiling is enabled, so it is not completely bug-free.

Yes, this is unfortunate, but I think it affects all events, not just the ones returned from submit_profiling_tag.

Just curious, which OpenCL implementations are causing this problem? It sounds like these implementations are also not conforming to the OpenCL specification, right?

I'm approving this PR because I think these bugs are not directly related to this extension. The do sound like bugs we need to fix, though, and we should create an internal tracker to capture the work needed to fix them.

@steffenlarsen
Copy link
Contributor Author

Just curious, which OpenCL implementations are causing this problem? It sounds like these implementations are also not conforming to the OpenCL specification, right?

It was seen fail on both the FPGA emulator and on OpenCL GPU. The implementation uses a simple barrier for the case where profiling is enabled and the backend doesn't support profiling, so the faulty profiling information can also be reproduced by using the barrier commands.

@steffenlarsen steffenlarsen merged commit cf6f876 into intel:sycl Jul 19, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants