Skip to content

Commit d410644

Browse files
committed
Rework formatting and introducing USM shortcuts
1 parent 86436c3 commit d410644

File tree

1 file changed

+204
-23
lines changed

1 file changed

+204
-23
lines changed

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

Lines changed: 204 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ Table 1. Values of the `SYCL_EXT_ONEAPI_GRAPH` macro.
6060
== SYCL Graph Terminology
6161

6262
Table 2. Terminology.
63+
[%header,cols="1,3"]
6364
|===
6465
|Concept|Description
6566
|graph| Class that stores structured work units and their dependencies
@@ -81,8 +82,6 @@ namespace sycl::ext::oneapi::experimental {
8182
}
8283
----
8384

84-
NOTE:
85-
8685
== Edge
8786

8887
A dependency between two nodes representing a happens-before relationship. `sender` and `receiver` may be associated to different graphs.
@@ -100,7 +99,7 @@ namespace sycl::ext::oneapi::experimental {
10099

101100
Graph is a class that represents a directed acyclic graph of nodes.
102101
A graph can have different states, can be nested, can have multiple root nodes that are scheduled for execution first and multiple leaf nodes that are scheduled for execution last. The execution of a graph has been completed when all leaf node tasks have been completed.
103-
Member functions as listed in Table 2 and 3 can be used to add nodes to a graph.
102+
Member functions as listed in Table 3 to 6 can be used to add nodes to a graph.
104103

105104
[source,c++]
106105
----
@@ -123,7 +122,18 @@ namespace sycl::ext::oneapi::experimental {
123122
124123
}
125124
126-
sycl::event sycl::queue(const graph<graph_state::executable> Graph);
125+
----
126+
127+
The following member functions are added to the queue class.
128+
129+
[source,c++]
130+
----
131+
132+
namespace sycl {
133+
134+
event submit(const ext::oneapi::experimental::graph<ext::oneapi::experimental::graph_state::executable>& my_graph);
135+
136+
} // namespace sycl
127137
128138
----
129139

@@ -133,46 +143,209 @@ A `graph` object in `graph_state::executable` represents a user generated device
133143
The structure of such a `graph` object in this state is immutable and cannot be changed, so are the tasks assigned with each node.
134144
Support of submitting a graph for execution, before a previous execution has been completed is backend specific. The runtime may throw an error.
135145

146+
=== Graph member and helper functions
147+
136148
Table 3. Constructor of the `graph` class.
149+
[cols="2a,a"]
137150
|===
138151
|Constructor|Description
139152

140-
|`graph()`
141-
|Creates a `graph` object. It's default state is `graph_state::modifiable`.
153+
|
154+
[source,c++]
155+
----
156+
/* available only when graph_state == modifiable */`
157+
graph();
158+
----
159+
|Creates a `graph` object.
142160

143161
|===
144162

145163
Table 4. Member functions of the `graph` class.
164+
[cols="2a,a"]
146165
|===
147166
|Member function|Description
148167

149-
|`node add_node(const std::vector<node>& dep = {});`
150-
|This creates an empty node which is associated to no task. It's intended use is either a connection point inside a graph between groups of nodes, and can significantly reduce the number of edges ( O(n) vs. O(n^2) ). Another use-case is building the structure of a graph first and adding tasks later.
168+
|
169+
[source,c++]
170+
----
171+
node add_node(const std::vector<node>& dep = {});
172+
----
173+
|This creates an empty node which is associated to no task. Its intended use is either a connection point inside a graph between groups of nodes, and can significantly reduce the number of edges ( O(n) vs. O(n^2) ). Another use-case is building the structure of a graph first and adding tasks later.
151174

152-
|`template<typename T>
153-
node add_node(T cgf, const std::vector<node>& dep = {});`
175+
|
176+
[source,c++]
177+
----
178+
template<typename T>
179+
node add_node(T cgf, const std::vector<node>& dep = {});
180+
----
154181
|This node captures a command group function object containing host task which is scheduled by the SYCL runtime or a SYCL function for invoking kernels with all restrictions that apply as described in the spec.
155182

156183
|===
157184

185+
Memory that is allocated by the following functions is owned by the specific graph. When freed inside the graph, the memory is only accessible before the `free` node is executed and after the `malloc` node is executed.
186+
158187
Table 5. Member functions of the `graph` class (memory operations).
188+
[cols="2a,a"]
159189
|===
160190
|Member function|Description
161191

162-
|`node add_memcpy_node(void* dest, const void* src, size_t numBytes, const std::vector<node>& dep = {});`
192+
|
193+
[source,c++]
194+
----
195+
node memcpy(void* dest, const void* src, size_t numBytes, const std::vector<node>& dep = {});
196+
----
163197
|Adding a node that encapsulates a `memcpy` operation.
164198

165-
|`node add_memset_node(void* ptr, int value, size_t numBytes, const std::vector<node>& dep = {});`
199+
|
200+
[source,c++]
201+
----
202+
template<typename T> node
203+
copy(const T* src, T* dest, size_t count, const std::vector<node>& dep = {});
204+
----
205+
|Adding a node that encapsulates a `copy` operation.
206+
207+
|
208+
[source,c++]
209+
----
210+
node memset(void* ptr, int value, size_t numBytes, const std::vector<node>& dep = {});
211+
----
166212
|Adding a node that encapsulates a `memset` operation.
167213

168-
|`node add_malloc_node(void *data, size_t numBytes, usm::alloc kind, const std::vector<node>& dep = {});`
214+
|
215+
[source,c++]
216+
----
217+
template<typename T>
218+
node fill(void* ptr, const T& pattern, size_t count, const std::vector<node>& dep = {});
219+
----
220+
|Adding a node that encapsulates a `fill` operation.
221+
222+
|
223+
[source,c++]
224+
----
225+
node malloc(void *data, size_t numBytes, usm::alloc kind, const std::vector<node>& dep = {});
226+
----
227+
|Adding a node that encapsulates a `malloc` operation.
228+
229+
|
230+
[source,c++]
231+
----
232+
node malloc_shared(void *data, size_t numBytes, const std::vector<node>& dep = {});
233+
----
234+
|Adding a node that encapsulates a `malloc` operation.
235+
236+
|
237+
[source,c++]
238+
----
239+
node malloc_host(void *data, size_t numBytes, const std::vector<node>& dep = {});
240+
----
169241
|Adding a node that encapsulates a `malloc` operation.
170242

171-
|`node add_free_node(void *data, const std::vector<node>& dep = {});`
243+
|
244+
[source,c++]
245+
----
246+
node malloc_device(void *data, size_t numBytes, const std::vector<node>& dep = {});
247+
----
248+
|Adding a node that encapsulates a `malloc` operation.
249+
250+
|
251+
[source,c++]
252+
----
253+
node free(void *data, const std::vector<node>& dep = {});
254+
----
172255
|Adding a node that encapsulates a `free` operation.
173256

174257
|===
175258

259+
Table 6. Member functions of the `graph` class (convenience shortcuts).
260+
[cols="2a,a"]
261+
|===
262+
|Member function|Description
263+
264+
|
265+
[source,c++]
266+
----
267+
template <typename KernelName, typename KernelType>
268+
node single_task(const KernelType &kernelFunc, const std::vector<node>& dep = {});
269+
----
270+
|Adding a node that encapsulates a `single_task` operation.
271+
272+
|
273+
[source,c++]
274+
----
275+
template <typename KernelName, int Dims, typename... Rest>
276+
node parallel_for(range<Dims> numWorkItems, Rest&& rest, const std::vector<node>& dep = {});
277+
----
278+
|Adding a node that encapsulates a `parallel_for` operation.
279+
280+
|
281+
[source,c++]
282+
----
283+
template <typename KernelName, int Dims, typename... Rest>
284+
node parallel_for(nd_range<Dims> executionRange, Rest&& rest, const std::vector<node>& dep = {});
285+
----
286+
|Adding a node that encapsulates a `parallel_for` operation.
287+
288+
|===
289+
290+
Table 7. Helper functions of the `graph` class.
291+
[cols="a,a"]
292+
|===
293+
|Function name|Description
294+
295+
|
296+
[source,c++]
297+
----
298+
graph<graph_state::modifiable> make_graph();
299+
----
300+
|Creates a `graph` object. It's state is `graph_state::modifiable`.
301+
302+
|===
303+
304+
=== Node member functions
305+
306+
Table 8. Constructor of the `node` class.
307+
[cols="a,a"]
308+
|===
309+
|Constructor|Description
310+
311+
|
312+
[source,c++]
313+
----
314+
node();
315+
----
316+
|Creates an empty `node` object. That encapsulates no tasks and is not assigned to a graph. Prior to execution it has to be assigned to a graph.
317+
318+
|===
319+
320+
Table 9. Member functions of the `node` class.
321+
[cols="2a,a"]
322+
|===
323+
|Function name|Description
324+
325+
|
326+
[source,c++]
327+
----
328+
void set_graph(graph<graph_state::modifiable>& Graph);
329+
----
330+
|Assigns a `node` object to a `graph`.
331+
332+
|
333+
[source,c++]
334+
----
335+
template<typename T>
336+
void update(T cgf);
337+
----
338+
|Update a `node` object.
339+
340+
|
341+
[source,c++]
342+
----
343+
template<typename T>
344+
void update(T cgf, graph<graph_state::modifiable>& Graph);
345+
----
346+
|Update a `node` object and assign it to a task.
347+
348+
|===
176349

177350
== Examples
178351

@@ -196,31 +369,35 @@ int main() {
196369
197370
auto g = sycl::ext::oneapi::experimental::make_graph();
198371
199-
float *x = sycl::malloc_shared<float>(n, q);
200-
float *y = sycl::malloc_shared<float>(n, q);
201-
float *z = sycl::malloc_shared<float>(n, q);
372+
float *x , *y, *z;
373+
374+
auto n_x = g.malloc_shared<float>(x, n, q);
375+
auto n_y = g.malloc_shared<float>(y, n, q);
376+
auto n_z = g.malloc_shared<float>(z, n, q);
202377
203378
float *dotp = sycl::malloc_shared<float>(1, q);
204379
205-
for (int i = 0; i < n; i++) {
380+
/* init data by using usm shortcut */
381+
auto n_i = g.parallel_for(n, [=](sycl::id<1> it){
382+
const size_t i = it[0];
206383
x[i] = 1.0f;
207384
y[i] = 2.0f;
208385
z[i] = 3.0f;
209-
}
386+
}, {n_x, n_y, n_z});
210387
211388
auto node_a = g.add_node([&](sycl::handler &h) {
212389
h.parallel_for(sycl::range<1>{n}, [=](sycl::id<1> it) {
213390
const size_t i = it[0];
214391
x[i] = alpha * x[i] + beta * y[i];
215392
});
216-
});
393+
}, {n_i});
217394
218395
auto node_b = g.add_node([&](sycl::handler &h) {
219396
h.parallel_for(sycl::range<1>{n}, [=](sycl::id<1> it) {
220397
const size_t i = it[0];
221398
z[i] = gamma * z[i] + beta * y[i];
222399
});
223-
});
400+
}, {n_i});
224401
225402
auto node_c = g.add_node(
226403
[&](sycl::handler &h) {
@@ -232,13 +409,15 @@ int main() {
232409
});
233410
},
234411
{node_a, node_b});
412+
413+
auto node_f1 = g.free(x, {node_c});
414+
auto node_f1 = g.free(y, {node_b});
235415
236416
auto exec = compile(q);
237417
238418
q.submit(exec).wait();
239419
240-
sycl::free(x, q);
241-
sycl::free(y, q);
420+
// memory can be freed inside or outside the graph
242421
sycl::free(z, q);
243422
sycl::free(dotp, q);
244423
@@ -271,4 +450,6 @@ Please, note that the following features are not yet implemented:
271450
|Rev|Date|Author|Changes
272451
|1|2022-02-11|Pablo Reble|Initial public working draft
273452
|2|2022-03-11|Pablo Reble|Incorporate feedback from PR
453+
|3|2022-05-25|Pablo Reble|Extend API and Example
454+
|4|2022-08-10|Pablo Reble|Adding USM shortcuts
274455
|========================================

0 commit comments

Comments
 (0)