-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][DOC] Add extension spec to time commands submitted to a queue #12194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a new proposed extension specification that allows an application to time commands submitted to the queue without enabling profiling on the entire queue.
sycl/doc/extensions/proposed/sycl_ext_oneapi_enqueue_functions.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc
Outdated
Show resolved
Hide resolved
….asciidoc Co-authored-by: John Pennycook <[email protected]>
Do not require "command_start" to be the same as "command_end".
@@ -711,12 +711,13 @@ _{endnote}_] | |||
|
|||
== Issues | |||
|
|||
. What should `submit_with_event` be called? | |||
. Is the name `submit_and_record` confusing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say yes, it is confusing. "Recording", in graphs and the profiling example below, implies an interval; you start and stop it. But here, it is singular. Also, here, record
does not mean that the returned event has profiling information - so, even less similarity with the rest of this PR.
On a related note: if we're introducing a way to submit special (CUDA-like) events for timing without enabling profiling on the whole queue, it seems reasonable to also allow timing individual submissions. I.e., make this submit_and_record
/submit_with_event
/queue::submit
call return an event with profiling enabled (perhaps when a special property is set, like we enable profiling on queues). Can be helpful for occasional runtime tuning / load-balancing. (yes, this can also be achieved by recording two special events around the kernel of interest, but that is at odds with how profiling queues behave).
_Effects:_ Enqueues a command barrier to `q`. | ||
Any commands submitted after this barrier cannot begin execution until all | ||
previously submitted commands have completed. | ||
|
||
_Returns:_ An event which represents the completion of the barrier. | ||
The event's status becomes `info::event_command_status::complete` when all | ||
commands submitted to the queue prior to the call to `record_event` have | ||
completed. | ||
The event's `info::event_profiling::command_submit` timestamp reflects the | ||
time at which `record_event` is called. | ||
The event's `info::event_profiling::command_end` timestamp reflects the time | ||
at which the event enters the "complete" state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless ext_oneapi_submit_barrier
extension is planned for retirement, it seems to me that this new event_record
would better work as a property/separate function there: "a barrier, with timing", not "a new entity for timing regions, which also works as a barrier".
Otherwise, we get this nasty overlap between the two functionalities, which is never good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless
ext_oneapi_submit_barrier
extension is planned for retirement ...
I think you mean the sycl_ext_oneapi_enqueue_barrier
extension here. Actually, we might be deemphasizing this extension in favor of the barrier functions in sycl_ext_oneapi_enqueue_functions
. However, I think your overall comment about using properties still applies.
In fact, we did consider exposing this functionality as a property instead of adding a new API. In that alternative, we would allow the application to pass some property to either ext_oneapi_submit_barrier
(sycl_ext_oneapi_enqueue_barrier) or submit_with_event
(sycl_ext_oneapi_enqueue_functions), which enabled profiling timestamps in the returned event, even if the queue was not constructed with profiling enabled.
These were the arguments in favor of adding a new API like record_event
:
-
It wasn't clear that anyone really needed a general way to request profiling timing for events. Rather, it seemed like the main request was for a way to insert timing markers before / after a sequence of commands.
-
A new API like this is less verbose if you just want to time a sequence of commands in a queue. If we added a property, you'd need to call either
ext_oneapi_submit_barrier
orsubmit_with_event
and also pass the property value. -
The function name
record_event
seems more intuitive if you want to time commands. It seems less intuitive to ask people to insert a barrier if their goal is to time commands. -
People migrating from CUDA will find the new API more familiar because its name and functionality more closely matches
cudaEventRecord
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean the sycl_ext_oneapi_enqueue_barrier extension here.
Right.
A new API like this is less verbose if you just want to time a sequence of commands in a queue. If we added a property, you'd need to call either ext_oneapi_submit_barrier or submit_with_event and also pass the property value.
Agree, but is this use case considered broad enough to warrant being less verbose? Making device.get_info<sycl::info::device::name>()
or sycl::property::queue::in_order()
less verbose will probably save much more keystrokes worldwide :)
We have such pattern in GROMACS, but that does not look like something used widely enough to warrant introducing a totally new concept (which takes mental space). And if someone wants to call it often, they are free to make a wrapper.
Also, if we want to be less verbose, perhaps "Throws: A synchronous exception
with the errc::invalid
error code if [...] the queue's device does not have the aspect ext_oneapi_queue_event_recording
" should be reconsidered. Currently, if one wants to use this new functionality, every call to record_event
will have to be wrapped in a conditional to check that the device supports it.
The function name record_event seems more intuitive if you want to time commands. It seems less intuitive to ask people to insert a barrier if their goal is to time commands.
If we don't look at CUDA, I wouldn't call it intuitive. In other places, events are tightly associated with operations, so using the same name here already break some intuition about events one gleaned from the rest of the standard. Also, what exactly is "record"? Do we need to pause the recording later? Why recorded events always have timing enabled, but events from operation submission don't?
Barrier is much more apt, IMO: we request a barrier denoting the completion of all the previous operations; and it might have timing enabled if we want.
If "barrier" is no-go, I think submit_marker
of mark_event
are better names. "Marker", unlike "event", denotes something lacking a content, just a dot on a timeline. As a verb, "mark" clearly shows that this is a one-off action: event is marked, and that's it. I still think less is more, and since this thing already acts as a barrier, we might as well make it a barrier.
People migrating from CUDA will find the new API more familiar because its name and functionality more closely matches cudaEventRecord.
The naming is already very different, and I find it strange to try to unify it here with CUDA while potentially making it less coherent with the rest of SYCL standard (graphs are recorded, barriers/operations are submitted, events are recorded, but in a different way than graphs).
As a person having migrated large codebase from CUDA, the indexing in multidimension sycl::(nd_)item
was a much bigger pain then stream/queue, sub-group/warp, shift/shuffle etc naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has been an ongoing offline debate on the name of this API. The name submit_profiling_tag
has gotten some traction, so I updated the PR to show what this would look like (29bb178).
If "barrier" is no-go, I think
submit_marker
ofmark_event
are better names. "Marker", unlike "event", denotes something lacking a content, just a dot on a timeline.
We liked this at first, but then we realized that the word "marker" means something different in OpenCL. In order to avoid confusion, we substituted the name "profling_tag" for "marker".
Also, if we want to be less verbose, perhaps "Throws: A synchronous
exception
with theerrc::invalid error
code if [...] the queue's device does not have the aspectext_oneapi_queue_event_recording
" should be reconsidered.
The problem is that not all backends can support this new API. For example, OpenCL cannot return profiling information in an event unless the queue is constructed in a special way. The only way we could support this extension on OpenCL would be to construct all OpenCL queues with profiling enabled, and we are unwilling to make that a requirement. Therefore, we made this an optional feature, so that places the burden on applications to test whether the feature is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name submit_profiling_tag has gotten some traction
Agree, it's a good name.
The problem is that not all backends can support this new API. For example, OpenCL cannot return profiling information in an event unless the queue is constructed in a special way. The only way we could support this extension on OpenCL would be to construct all OpenCL queues with profiling enabled, and we are unwilling to make that a requirement. Therefore, we made this an optional feature, so that places the burden on applications to test whether the feature is supported.
Yes, I understand that. My point was, there is little reason to value brevity of the function call ("If we added a property, you'd need to call either ext_oneapi_submit_barrier or submit_with_event and also pass the property value") when it, by design, has to be surrounded with extra boilerplate checking whether it is supported (not to mention #ifdef
) or to have a complex logic around queue creation.
sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc
Outdated
Show resolved
Hide resolved
Fix typo in example Co-authored-by: Dmitry Vodopyanov <[email protected]>
sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc
Outdated
Show resolved
Hide resolved
We've been debating the name of this API, and this name seems to have some traction. Update to get wider feedback.
Attention reviewers: I do not plan any more changes to this PR and will ask for it to be merged soon. If you have more comments, please make them soon. |
@intel/llvm-gatekeepers I think this PR is ready to merge. The open comments above are from @AerialMantis, who approved and from @al42and, who commented that they agree with the new naming. |
Add a new proposed extension specification that allows an application to time commands submitted to the queue without enabling profiling on the entire queue.