Skip to content

Commit 7c54fd3

Browse files
committed
[SYCL][DOC] Add extension spec for record_event
Add a new proposed extension specification that allows an application to time commands submitted to the queue without enabling profiling on the entire queue.
1 parent 70f0835 commit 7c54fd3

File tree

2 files changed

+217
-6
lines changed

2 files changed

+217
-6
lines changed

sycl/doc/extensions/proposed/sycl_ext_oneapi_enqueue_functions.asciidoc

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ syclex::submit(q, [&](sycl::handler& h) {
133133
float* output = sycl::malloc_shared<int>(1, q);
134134
*output = 0;
135135
std::vector<sycl::event> depEvents = /* some dependencies */;
136-
sycl::event e = syclex::submit_with_event(q, [&](sycl::handler& h) {
136+
sycl::event e = syclex::submit_and_record(q, [&](sycl::handler& h) {
137137
h.depends_on(depEvents);
138138
syclex::nd_launch(h, sycl::nd_range<1>{1024, 16},
139139
[=](sycl::nd_item<1> it, auto& sum) {
@@ -247,7 +247,7 @@ a!
247247
namespace sycl::ext::oneapi::experimental {
248248
249249
template <typename CommandGroupFunc>
250-
sycl::event submit_with_event(sycl::queue q, CommandGroupFunc&& cgf);
250+
sycl::event submit_and_record(sycl::queue q, CommandGroupFunc&& cgf);
251251
252252
}
253253
----
@@ -711,12 +711,13 @@ _{endnote}_]
711711

712712
== Issues
713713

714-
. What should `submit_with_event` be called?
714+
. Is the name `submit_and_record` confusing?
715715
+
716716
--
717-
*UNRESOLVED*: `submit_with_event` is descriptive but verbose. Synonyms for
718-
`submit` like `enqueue` do not obviously mean "return an event". `record` may
719-
be confused with the recording functionality associated with SYCL graphs.
717+
*UNRESOLVED*: An advantage with the current name is that is sets up a naming
718+
pattern with `record_event`, where functions with the word "record" in their
719+
name return an event. However, the word "record" may be confused with the
720+
recording functionality associated with SYCL graphs.
720721
--
721722

722723
. What about `accessor` overloads and `update_host`?
Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
= sycl_ext_oneapi_record_event
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
:endnote: &#8212;{nbsp}end{nbsp}note
14+
15+
// Set the default source code type in this document to C++,
16+
// for syntax highlighting purposes. This is needed because
17+
// docbook uses c++ and html5 uses cpp.
18+
:language: {basebackend@docbook:c++:cpp}
19+
20+
21+
== Notice
22+
23+
[%hardbreaks]
24+
Copyright (C) 2023-2023 Intel Corporation. All rights reserved.
25+
26+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
27+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
28+
permission by Khronos.
29+
30+
31+
== Contact
32+
33+
To report problems with this extension, please open a new issue at:
34+
35+
https://github.com/intel/llvm/issues
36+
37+
38+
== Dependencies
39+
40+
This extension is written against the SYCL 2020 revision 8 specification.
41+
All references below to the "core SYCL specification" or to section numbers in
42+
the SYCL specification refer to that revision.
43+
44+
45+
== Status
46+
47+
This is a proposed extension specification, intended to gather community
48+
feedback.
49+
Interfaces defined in this specification may not be implemented yet or may be
50+
in a preliminary state.
51+
The specification itself may also change in incompatible ways before it is
52+
finalized.
53+
*Shipping software products should not rely on APIs defined in this
54+
specification.*
55+
56+
57+
== Overview
58+
59+
This extension provides the ability to time the execution of commands in a
60+
queue without enabling profiling on the entire queue.
61+
This is more efficient on some platforms because only a subset of the events
62+
are required to contain timestamp information.
63+
It is also more convenient for use in libraries, where the library wants to
64+
get timing information for some commands, but the library does not control the
65+
construction of the queue (which is where the `enable_profiling` property is
66+
passed).
67+
68+
This extension is structured as a free function, rather than a member function
69+
on `queue`, in order to be consistent with the API design in
70+
link:../proposed/sycl_ext_oneapi_enqueue_functions.asciidoc[
71+
sycl_ext_oneapi_enqueue_functions]
72+
73+
74+
== Specification
75+
76+
=== Feature test macro
77+
78+
This extension provides a feature-test macro as described in the core SYCL
79+
specification.
80+
An implementation supporting this extension must predefine the macro
81+
`SYCL_EXT_ONEAPI_RECORD_EVENT` to one of the values defined in the table below.
82+
Applications can test for the existence of this macro to determine if the
83+
implementation supports this feature, or applications can test the macro's
84+
value to determine which of the extension's features the implementation
85+
supports.
86+
87+
[%header,cols="1,5"]
88+
|===
89+
|Value
90+
|Description
91+
92+
|1
93+
|The APIs of this experimental extension are not versioned, so the
94+
feature-test macro always has this value.
95+
|===
96+
97+
=== New device aspect
98+
99+
This extension adds the `ext_oneapi_queue_event_recording` enumerator to the
100+
`sycl::aspect` enumeration.
101+
102+
```
103+
namespace sycl {
104+
105+
enum class aspect : /*unspecified*/ {
106+
ext_oneapi_queue_event_recording
107+
};
108+
109+
} // namespace sycl
110+
```
111+
112+
When a device has this aspect, the `record_event` function may be called for a
113+
queue on this device even if the queue is not constructed with the property
114+
`property::queue::enable_profiling`.
115+
116+
=== New free function
117+
118+
This extension adds the following free function.
119+
120+
|====
121+
a|
122+
[frame=all,grid=none]
123+
!====
124+
a!
125+
[source]
126+
----
127+
namespace sycl::ext::oneapi::experimental {
128+
129+
event record_event(const queue& q);
130+
131+
} // namespace sycl::ext::oneapi::experimental
132+
----
133+
!====
134+
135+
_Effects:_ Enqueues a command barrier to `q`.
136+
Any commands submitted after this barrier cannot begin execution until all
137+
previously submitted commands have completed.
138+
139+
_Returns:_ An event which represents the completion of the barrier.
140+
The event's status becomes `info::event_command_status::complete` when all
141+
commands submitted to the queue prior to the call to `record_event` have
142+
completed.
143+
The event's `info::event_profiling::command_submit` timestamp reflects the
144+
time at which `record_event` is called.
145+
The event's `info::event_profiling::command_end` timestamp reflects the time
146+
at which the event enters the "complete" state.
147+
The event's `info::event_profiling::command_start` timestamps is the same as
148+
the `info::event_profiling::command_end` timestamp.
149+
150+
_Throws:_ A synchronous `exception` with the `errc::invalid` error code if the
151+
queue was not constructed with the `property::queue::enable_profiling` property
152+
and if the queue's device does not have the aspect
153+
`ext_oneapi_queue_event_recording`.
154+
155+
[_Note:_ In order to understand why the "start" and "end" timestamps are the
156+
same, think of the barrier as an empty kernel with an implicit set of
157+
dependencies on all previous commands in the same queue.
158+
This theoretical kernel starts executing when the dependencies are resolved.
159+
Since the kernel is empty, the end time is the same as the start time.
160+
_{endnote}_]
161+
|====
162+
163+
164+
== Example
165+
166+
The following example demonstrates how to time a sequence of kernels that are
167+
submitted to a queue.
168+
169+
```
170+
#include <iostream>
171+
#include <sycl/sycl.hpp>
172+
namespace syclex = sycl::ext::oneapi::experimental;
173+
174+
static constexpr size_t N = 1024;
175+
176+
int main() {
177+
sycl::queue q;
178+
179+
if (!q.get_device().has(sycl::aspect::ext_oneapi_queue_event_recording)) {
180+
std::cout << "Cannot time kernels without enabling profiling on queue\n";
181+
return;
182+
}
183+
184+
// commands submitted here are not timed
185+
186+
sycl::event start = syclex::event_record(q);
187+
sycl::parallel_for(q, {N}, [=](auto i) {/* first kernel */});
188+
sycl::parallel_for(q, {N}, [=](auto i) {/* second kernel */});
189+
sycl::event end = syclex::event_record(q);
190+
191+
q.wait();
192+
193+
uint64_t elapsed =
194+
end.get_profiling_info<sycl::info::event_profiling::command_start>() -
195+
start.get_profiling_info<sycl::info::event_profiling::command_end>();
196+
std::cout << "Execution time: " << elapsed << " (nanoseconds)\n";
197+
}
198+
```
199+
200+
201+
== Issues
202+
203+
. Is the name `record_event` confusing?
204+
+
205+
--
206+
*UNRESOLVED*: The current name is similar to the CUDA API `cudaEventRecord`,
207+
which has similar functionality.
208+
However, the word "record" may be confused with the recording functionality
209+
associated with SYCL graphs.
210+
--

0 commit comments

Comments
 (0)