Skip to content

Commit b4ade42

Browse files
gmlueckPennycookdm-vodopyanov
authored
[SYCL][DOC] Add extension spec to time commands submitted to a queue (#12194)
Add a new proposed extension specification that allows an application to time commands submitted to the queue without enabling profiling on the entire queue. --------- Co-authored-by: John Pennycook <[email protected]> Co-authored-by: Dmitry Vodopyanov <[email protected]>
1 parent 74467b2 commit b4ade42

File tree

1 file changed

+214
-0
lines changed

1 file changed

+214
-0
lines changed
Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
= sycl_ext_oneapi_profiling_tag
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
:endnote: &#8212;{nbsp}end{nbsp}note
14+
15+
// Set the default source code type in this document to C++,
16+
// for syntax highlighting purposes. This is needed because
17+
// docbook uses c++ and html5 uses cpp.
18+
:language: {basebackend@docbook:c++:cpp}
19+
20+
21+
== Notice
22+
23+
[%hardbreaks]
24+
Copyright (C) 2023-2023 Intel Corporation. All rights reserved.
25+
26+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
27+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
28+
permission by Khronos.
29+
30+
31+
== Contact
32+
33+
To report problems with this extension, please open a new issue at:
34+
35+
https://github.com/intel/llvm/issues
36+
37+
38+
== Dependencies
39+
40+
This extension is written against the SYCL 2020 revision 8 specification.
41+
All references below to the "core SYCL specification" or to section numbers in
42+
the SYCL specification refer to that revision.
43+
44+
45+
== Status
46+
47+
This is a proposed extension specification, intended to gather community
48+
feedback.
49+
Interfaces defined in this specification may not be implemented yet or may be
50+
in a preliminary state.
51+
The specification itself may also change in incompatible ways before it is
52+
finalized.
53+
*Shipping software products should not rely on APIs defined in this
54+
specification.*
55+
56+
57+
== Overview
58+
59+
This extension provides the ability to time the execution of commands in a
60+
queue without enabling profiling on the entire queue.
61+
This is more efficient on some platforms because only a subset of the events
62+
are required to contain timestamp information.
63+
It is also more convenient for use in libraries, where the library wants to
64+
get timing information for some commands, but the library does not control the
65+
construction of the queue (which is where the `enable_profiling` property is
66+
passed).
67+
68+
This extension is structured as a free function, rather than a member function
69+
on `queue`, in order to be consistent with the API design in
70+
link:../proposed/sycl_ext_oneapi_enqueue_functions.asciidoc[
71+
sycl_ext_oneapi_enqueue_functions]
72+
73+
74+
== Specification
75+
76+
=== Feature test macro
77+
78+
This extension provides a feature-test macro as described in the core SYCL
79+
specification.
80+
An implementation supporting this extension must predefine the macro
81+
`SYCL_EXT_ONEAPI_PROFILING_TAG` to one of the values defined in the table
82+
below.
83+
Applications can test for the existence of this macro to determine if the
84+
implementation supports this feature, or applications can test the macro's
85+
value to determine which of the extension's features the implementation
86+
supports.
87+
88+
[%header,cols="1,5"]
89+
|===
90+
|Value
91+
|Description
92+
93+
|1
94+
|The APIs of this experimental extension are not versioned, so the
95+
feature-test macro always has this value.
96+
|===
97+
98+
=== New device aspect
99+
100+
This extension adds the `ext_oneapi_queue_profiling_tag` enumerator to the
101+
`sycl::aspect` enumeration.
102+
103+
```
104+
namespace sycl {
105+
106+
enum class aspect : /*unspecified*/ {
107+
ext_oneapi_queue_profiling_tag
108+
};
109+
110+
} // namespace sycl
111+
```
112+
113+
When a device has this aspect, the `submit_profiling_tag` function may be
114+
called for a queue on this device even if the queue is not constructed with the
115+
property `property::queue::enable_profiling`.
116+
117+
=== New free function
118+
119+
This extension adds the following free function.
120+
121+
|====
122+
a|
123+
[frame=all,grid=none]
124+
!====
125+
a!
126+
[source]
127+
----
128+
namespace sycl::ext::oneapi::experimental {
129+
130+
event submit_profiling_tag(const queue& q);
131+
132+
} // namespace sycl::ext::oneapi::experimental
133+
----
134+
!====
135+
136+
_Effects:_ If the queue `q` is out-of-order (i.e. was not constructed with
137+
`property::queue::in_order`), this function enqueues a command barrier to `q`.
138+
Any commands submitted after this barrier cannot begin execution until all
139+
previously submitted commands have completed.
140+
If this queue is in-order, this function simply enqueues a lightweight "tag"
141+
command that marks the current head of the queue.
142+
143+
_Returns:_ If the queue is out-of-order, returns an event which represents the
144+
completion of the barrier.
145+
If the queue is in-order, returns an event which represents the completion of
146+
the "tag" command.
147+
In either case, the event's status becomes
148+
`info::event_command_status::complete` when all commands submitted to the queue
149+
prior to the call to `submit_profiling_tag` have completed.
150+
The event's `info::event_profiling::command_submit` timestamp reflects the
151+
time at which `submit_profiling_tag` is called.
152+
The event's `info::event_profiling::command_end` timestamp reflects the time
153+
at which the event enters the "complete" state.
154+
155+
It is unspecified whether the event ever has the
156+
`info::event_command_status::running` status, and the meaning of the
157+
`info::event_profiling::command_start` timestamp is also unspecified.
158+
Implementations are encouraged to transition the event directly from the
159+
"submitted" status to the "complete" status and are encouraged to set the
160+
"command_start" timestamp to the same value as the "command_end" timestamp.
161+
162+
_Throws:_ A synchronous `exception` with the `errc::invalid` error code if the
163+
queue was not constructed with the `property::queue::enable_profiling` property
164+
and if the queue's device does not have the aspect
165+
`ext_oneapi_queue_profiling_tag`.
166+
167+
[_Note:_ In order to understand why the "command_start" and "command_end"
168+
timestamps are encouraged to be the same, think of the barrier as an empty
169+
kernel with an implicit set of dependencies on all previous commands in the
170+
same queue.
171+
This theoretical kernel starts executing when the dependencies are resolved.
172+
Since the kernel is empty, the end time is the same as the start time.
173+
The "command_start" and "command_end" timestamps are not required to be the
174+
same, though, in order to accommodate an implementation where the barrier is
175+
implemented by submitting an actual kernel, which has non-zero execution time.
176+
_{endnote}_]
177+
|====
178+
179+
180+
== Example
181+
182+
The following example demonstrates how to time a sequence of kernels that are
183+
submitted to a queue.
184+
185+
```
186+
#include <iostream>
187+
#include <sycl/sycl.hpp>
188+
namespace syclex = sycl::ext::oneapi::experimental;
189+
190+
static constexpr size_t N = 1024;
191+
192+
int main() {
193+
sycl::queue q;
194+
195+
if (!q.get_device().has(sycl::aspect::ext_oneapi_queue_profiling_tag)) {
196+
std::cout << "Cannot time kernels without enabling profiling on queue\n";
197+
return;
198+
}
199+
200+
// commands submitted here are not timed
201+
202+
sycl::event start = syclex::submit_profiling_tag(q);
203+
sycl::parallel_for(q, {N}, [=](auto i) {/* first kernel */});
204+
sycl::parallel_for(q, {N}, [=](auto i) {/* second kernel */});
205+
sycl::event end = syclex::submit_profiling_tag(q);
206+
207+
q.wait();
208+
209+
uint64_t elapsed =
210+
end.get_profiling_info<sycl::info::event_profiling::command_start>() -
211+
start.get_profiling_info<sycl::info::event_profiling::command_end>();
212+
std::cout << "Execution time: " << elapsed << " (nanoseconds)\n";
213+
}
214+
```

0 commit comments

Comments
 (0)