Skip to content

Commit 478b7c0

Browse files
jbrodmanbader
authored andcommitted
[SYCL][DOC] Queue Shortcuts (#1051)
Add queue simplification functions. Signed-off-by: James Brodman <[email protected]>
1 parent 1355aa6 commit 478b7c0

File tree

2 files changed

+71
-0
lines changed

2 files changed

+71
-0
lines changed
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
= SYCL Proposals: Queue Shortcuts
2+
James Brodman <james[email protected]>
3+
v0.1
4+
:source-highlighter: pygments
5+
:icons: font
6+
== Introduction
7+
This document presents an addition proposed for a future version of the SYCL Specification. The goal of this proposal is to reduce the complexity and verbosity of using SYCL for programmers.
8+
9+
== Queue Simplifications
10+
Tasks are submitted to queues for execution in SYCL. This is normally done by invoking the `submit` method of a `queue` and passing a lambda that specifies the operation to perform and its dependences. A task's dependences have traditionally been specified through the creation of `accessor` objects that tell the SYCL runtime how data in a `buffer` or `image` is used. However, new proposals for data management in SYCL, such as Unified Shared Memory, provide alternatives to the buffer and accessor model. The USM proposal specifies dependences between kernels using `event` objects. The Queue Properties proposal specifies how to create a `queue` that has in-order semantics where each operation is performed after the previous operation has finished.
11+
12+
It makes sense, in a SYCL with those proposals, to provide programmers with shortcuts to eliminate unnecessary extra code. When using in-order queues, for example, the lambda passed to `submit` does nothing except invoke `parallel_for` or `single_task`. This proposal adds the kernel specification methods directly to the `queue` class. These additional methods have two flavors. The first handles the "empty lambda" case. The second handles the USM dependence case by adding methods that take an `event` or vector of `events` that specify the dependences that must be satisfied before the kernel executes. Both flavors could be implemented in a header file by specifying the `submit` lambda for the programmer.
13+
14+
Note: These simplifications do not depend on queue order properties. They apply both for in-order and out-of-order queues.
15+
16+
.Queue Shortcuts
17+
[source,cpp]
18+
----
19+
include::queue.hpp[]
20+
----
21+
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
class queue {
2+
public:
3+
...
4+
template <typename KernelName, typename KernelType>
5+
event single_task(KernelType KernelFunc);
6+
7+
template <typename KernelName, typename KernelType>
8+
event single_task(event DepEvent, KernelType KernelFunc);
9+
10+
template <typename KernelName, typename KernelType>
11+
event single_task(const vector_class<event> &DepEvents,
12+
KernelType KernelFunc);
13+
14+
template <typename KernelName, typename KernelType, int Dims>
15+
event parallel_for(range<Dims> NumWorkItems, KernelType KernelFunc);
16+
17+
template <typename KernelName, typename KernelType, int Dims>
18+
event parallel_for(range<Dims> NumWorkItems, event DepEvent,
19+
KernelType KernelFunc);
20+
21+
template <typename KernelName, typename KernelType, int Dims>
22+
event parallel_for(range<Dims> NumWorkItems,
23+
const vector_class<event> &DepEvents,
24+
KernelType KernelFunc);
25+
26+
template <typename KernelName, typename KernelType, int Dims>
27+
event parallel_for(range<Dims> NumWorkItems, id<Dims> WorkItemOffset,
28+
KernelType KernelFunc);
29+
30+
template <typename KernelName, typename KernelType, int Dims>
31+
event parallel_for(range<Dims> NumWorkItems, id<Dims> WorkItemOffset,
32+
event DepEvent, KernelType KernelFunc);
33+
34+
template <typename KernelName, typename KernelType, int Dims>
35+
event parallel_for(range<Dims> NumWorkItems, id<Dims> WorkItemOffset,
36+
const vector_class<event> &DepEvents,
37+
KernelType KernelFunc);
38+
39+
template <typename KernelName, typename KernelType, int Dims>
40+
event parallel_for(nd_range<Dims> ExecutionRange, KernelType KernelFunc);
41+
42+
template <typename KernelName, typename KernelType, int Dims>
43+
event parallel_for(nd_range<Dims> ExecutionRange, event DepEvent,
44+
KernelType KernelFunc);
45+
46+
template <typename KernelName, typename KernelType, int Dims>
47+
event parallel_for(nd_range<Dims> ExecutionRange,
48+
const vector_class<event> &DepEvents,
49+
KernelType KernelFunc);
50+
};

0 commit comments

Comments
 (0)