Skip to content

[SYCL][DOC] Add extension spec to time commands submitted to a queue #12194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 18, 2024

Conversation

gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Dec 15, 2023

Add a new proposed extension specification that allows an application to time commands submitted to the queue without enabling profiling on the entire queue.

Add a new proposed extension specification that allows an application
to time commands submitted to the queue without enabling profiling on
the entire queue.
@gmlueck gmlueck requested a review from a team as a code owner December 15, 2023 18:59
gmlueck and others added 2 commits December 18, 2023 09:54
Do not require "command_start" to be the same as "command_end".
@@ -711,12 +711,13 @@ _{endnote}_]

== Issues

. What should `submit_with_event` be called?
. Is the name `submit_and_record` confusing?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say yes, it is confusing. "Recording", in graphs and the profiling example below, implies an interval; you start and stop it. But here, it is singular. Also, here, record does not mean that the returned event has profiling information - so, even less similarity with the rest of this PR.

On a related note: if we're introducing a way to submit special (CUDA-like) events for timing without enabling profiling on the whole queue, it seems reasonable to also allow timing individual submissions. I.e., make this submit_and_record/submit_with_event/queue::submit call return an event with profiling enabled (perhaps when a special property is set, like we enable profiling on queues). Can be helpful for occasional runtime tuning / load-balancing. (yes, this can also be achieved by recording two special events around the kernel of interest, but that is at odds with how profiling queues behave).

Comment on lines 135 to 146
_Effects:_ Enqueues a command barrier to `q`.
Any commands submitted after this barrier cannot begin execution until all
previously submitted commands have completed.

_Returns:_ An event which represents the completion of the barrier.
The event's status becomes `info::event_command_status::complete` when all
commands submitted to the queue prior to the call to `record_event` have
completed.
The event's `info::event_profiling::command_submit` timestamp reflects the
time at which `record_event` is called.
The event's `info::event_profiling::command_end` timestamp reflects the time
at which the event enters the "complete" state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless ext_oneapi_submit_barrier extension is planned for retirement, it seems to me that this new event_record would better work as a property/separate function there: "a barrier, with timing", not "a new entity for timing regions, which also works as a barrier".

Otherwise, we get this nasty overlap between the two functionalities, which is never good.

Copy link
Contributor Author

@gmlueck gmlueck Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless ext_oneapi_submit_barrier extension is planned for retirement ...

I think you mean the sycl_ext_oneapi_enqueue_barrier extension here. Actually, we might be deemphasizing this extension in favor of the barrier functions in sycl_ext_oneapi_enqueue_functions. However, I think your overall comment about using properties still applies.

In fact, we did consider exposing this functionality as a property instead of adding a new API. In that alternative, we would allow the application to pass some property to either ext_oneapi_submit_barrier (sycl_ext_oneapi_enqueue_barrier) or submit_with_event (sycl_ext_oneapi_enqueue_functions), which enabled profiling timestamps in the returned event, even if the queue was not constructed with profiling enabled.

These were the arguments in favor of adding a new API like record_event:

  • It wasn't clear that anyone really needed a general way to request profiling timing for events. Rather, it seemed like the main request was for a way to insert timing markers before / after a sequence of commands.

  • A new API like this is less verbose if you just want to time a sequence of commands in a queue. If we added a property, you'd need to call either ext_oneapi_submit_barrier or submit_with_event and also pass the property value.

  • The function name record_event seems more intuitive if you want to time commands. It seems less intuitive to ask people to insert a barrier if their goal is to time commands.

  • People migrating from CUDA will find the new API more familiar because its name and functionality more closely matches cudaEventRecord.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean the sycl_ext_oneapi_enqueue_barrier extension here.

Right.

A new API like this is less verbose if you just want to time a sequence of commands in a queue. If we added a property, you'd need to call either ext_oneapi_submit_barrier or submit_with_event and also pass the property value.

Agree, but is this use case considered broad enough to warrant being less verbose? Making device.get_info<sycl::info::device::name>() or sycl::property::queue::in_order() less verbose will probably save much more keystrokes worldwide :)

We have such pattern in GROMACS, but that does not look like something used widely enough to warrant introducing a totally new concept (which takes mental space). And if someone wants to call it often, they are free to make a wrapper.

Also, if we want to be less verbose, perhaps "Throws: A synchronous exception with the errc::invalid error code if [...] the queue's device does not have the aspect ext_oneapi_queue_event_recording" should be reconsidered. Currently, if one wants to use this new functionality, every call to record_event will have to be wrapped in a conditional to check that the device supports it.

The function name record_event seems more intuitive if you want to time commands. It seems less intuitive to ask people to insert a barrier if their goal is to time commands.

If we don't look at CUDA, I wouldn't call it intuitive. In other places, events are tightly associated with operations, so using the same name here already break some intuition about events one gleaned from the rest of the standard. Also, what exactly is "record"? Do we need to pause the recording later? Why recorded events always have timing enabled, but events from operation submission don't?

Barrier is much more apt, IMO: we request a barrier denoting the completion of all the previous operations; and it might have timing enabled if we want.

If "barrier" is no-go, I think submit_marker of mark_event are better names. "Marker", unlike "event", denotes something lacking a content, just a dot on a timeline. As a verb, "mark" clearly shows that this is a one-off action: event is marked, and that's it. I still think less is more, and since this thing already acts as a barrier, we might as well make it a barrier.

People migrating from CUDA will find the new API more familiar because its name and functionality more closely matches cudaEventRecord.

The naming is already very different, and I find it strange to try to unify it here with CUDA while potentially making it less coherent with the rest of SYCL standard (graphs are recorded, barriers/operations are submitted, events are recorded, but in a different way than graphs).

As a person having migrated large codebase from CUDA, the indexing in multidimension sycl::(nd_)item was a much bigger pain then stream/queue, sub-group/warp, shift/shuffle etc naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has been an ongoing offline debate on the name of this API. The name submit_profiling_tag has gotten some traction, so I updated the PR to show what this would look like (29bb178).

If "barrier" is no-go, I think submit_marker of mark_event are better names. "Marker", unlike "event", denotes something lacking a content, just a dot on a timeline.

We liked this at first, but then we realized that the word "marker" means something different in OpenCL. In order to avoid confusion, we substituted the name "profling_tag" for "marker".

Also, if we want to be less verbose, perhaps "Throws: A synchronous exception with the errc::invalid error code if [...] the queue's device does not have the aspect ext_oneapi_queue_event_recording" should be reconsidered.

The problem is that not all backends can support this new API. For example, OpenCL cannot return profiling information in an event unless the queue is constructed in a special way. The only way we could support this extension on OpenCL would be to construct all OpenCL queues with profiling enabled, and we are unwilling to make that a requirement. Therefore, we made this an optional feature, so that places the burden on applications to test whether the feature is supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name submit_profiling_tag has gotten some traction

Agree, it's a good name.

The problem is that not all backends can support this new API. For example, OpenCL cannot return profiling information in an event unless the queue is constructed in a special way. The only way we could support this extension on OpenCL would be to construct all OpenCL queues with profiling enabled, and we are unwilling to make that a requirement. Therefore, we made this an optional feature, so that places the burden on applications to test whether the feature is supported.

Yes, I understand that. My point was, there is little reason to value brevity of the function call ("If we added a property, you'd need to call either ext_oneapi_submit_barrier or submit_with_event and also pass the property value") when it, by design, has to be surrounded with extra boilerplate checking whether it is supported (not to mention #ifdef) or to have a complex logic around queue creation.

Fix typo in example

Co-authored-by: Dmitry Vodopyanov <[email protected]>
We've been debating the name of this API, and this name seems to have
some traction.  Update to get wider feedback.
@gmlueck gmlueck changed the title [SYCL][DOC] Add extension spec for record_event [SYCL][DOC] Add extension spec to time commands submitted to a queue Dec 27, 2023
@gmlueck
Copy link
Contributor Author

gmlueck commented Jan 10, 2024

Attention reviewers: I do not plan any more changes to this PR and will ask for it to be merged soon. If you have more comments, please make them soon.

@gmlueck
Copy link
Contributor Author

gmlueck commented Jan 18, 2024

@intel/llvm-gatekeepers I think this PR is ready to merge. The open comments above are from @AerialMantis, who approved and from @al42and, who commented that they agree with the new naming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants