@@ -53,12 +53,16 @@ specification.*
53
53
54
54
== Backend support status
55
55
56
- This extension is currently implemented in {dpcpp} only for GPU devices and
56
+ The base extension is currently implemented in {dpcpp} only for GPU devices and
57
57
only when using the CUDA or HIP backends. Attempting to use this extension in
58
58
kernels that run on other devices or backends may result in undefined
59
59
behavior. Be aware that the compiler is not able to issue a diagnostic to
60
60
warn you if this happens.
61
61
62
+ The semantics in the <<sycl-graph-interaction, SYCL-Graph Interaction>> section
63
+ however are implemented for all of Level-Zero, OpenCL, CUDA and HIP devices.
64
+ Where support is conditional on the device reporting the
65
+ `aspect::ext_oneapi_limited_graph` aspect.
62
66
63
67
== Overview
64
68
@@ -173,13 +177,118 @@ dependencies are satisfied.
173
177
The SYCL command described above completes once all of the native asynchronous
174
178
tasks it contains have completed.
175
179
176
- The call to `interopCallable` must not submit any synchronous tasks to the
177
- native backend object, and it must not block waiting for any tasks to complete.
178
- The call also must not add tasks to backend objects that underly any other
179
- queue, aside from the queue that is associated with this handler. If it does
180
- any of these things, the behavior is undefined.
180
+ The call to `interopCallable` must not add tasks to backend objects that underly
181
+ any other queue, aside from the queue that is associated with this handler,
182
+ otherwise, the behavior is undefined.
181
183
182
- == Example
184
+ [_Note:_ The function object `interopCallable` is invoked to enqueue commands to a
185
+ native queue or graph and therefore, APIs which block or synchronize could
186
+ prolong or interfere with other commands being enqueued to the backend.
187
+ _{endnote}_]
188
+
189
+ === SYCL-Graph Interaction
190
+
191
+ This section defines the interaction with the
192
+ link:../experimental/sycl_ext_oneapi_graph.asciidoc[sycl_ext_oneapi_graph]
193
+ extension.
194
+
195
+ The `interopCallable` object will be invoked during `command_graph::finalize()`
196
+ when the backend object for the graph is available to give to the user as a
197
+ handle. The user may then add nodes using native APIs to the backend graph
198
+ object queried with `interop_handle::ext_codeplay_get_native_graph()`. The
199
+ runtime will schedule the dependencies of the user added nodes such
200
+ that they respect the graph node edges.
201
+
202
+ ==== Interop Handle Class Modifications
203
+
204
+ ```c++
205
+ // Alias is for editorial brevity in the ext_codeplay_get_native_graph
206
+ // definition, and is non-normative.
207
+ using graph = ext::oneapi::experimental::command_graph<
208
+ ext::oneapi::experimental::graph_state::executable>;
209
+
210
+ class interop_handle {
211
+ bool ext_codeplay_has_graph() const;
212
+
213
+ template <backend Backend>
214
+ backend_return_t<Backend, graph> ext_codeplay_get_native_graph() const;
215
+ };
216
+ ```
217
+
218
+ ==== New Interop Handle Member Functions
219
+
220
+ Table {counter: tableNumber}. Additional member functions of the `sycl::interop_handle` class.
221
+ [cols="2a,a"]
222
+ |===
223
+ |Member function|Description
224
+
225
+ |
226
+ [source,c++]
227
+ ----
228
+ bool interop_handle::ext_codeplay_has_graph() const;
229
+ ----
230
+
231
+ |
232
+ _Returns_: True if the `interop_handle object` was constructed and passed to
233
+ an enqueue native command function object by `ext_codeplay_enqueue_native_command`,
234
+ that was invoked when adding a graph node, either explicitly or implicitly
235
+ via graph record.
236
+
237
+ [_Note:_ that host-task nodes in a `command_graph` will return `false` from this
238
+ query, as the host-task callable is invoked during graph execution rather than
239
+ graph finalization.
240
+ _{endnote}_]
241
+
242
+ |
243
+ [source,c++]
244
+ ----
245
+ template <backend Backend>
246
+ backend_return_t<Backend, graph>
247
+ interop_handle::ext_codeplay_get_native_graph() const;
248
+ ----
249
+
250
+ |
251
+ _Returns_: The native graph object associated with the `interop_handle`.
252
+
253
+ _Throws_: An exception with the `errc::invalid` error code if
254
+ `ext_codeplay_has_graph()` returns `false`.
255
+
256
+ |===
257
+
258
+ == Implementation Notes
259
+
260
+ When `interop_handle::get_native_queue()` is invoked in a native command
261
+ function object on graph finalize, the queue that is returned to the user is an
262
+ internal queue created by the SYCL runtime, as there is no user provided queue
263
+ at the point of graph finalization. This queue has the same device and context
264
+ as the graph was created with. The only valid usage of this queue is to perform
265
+ stream capture to a graph for backend APIs that provide this functionality.
266
+
267
+ Table {counter: tableNumber}. Native types for
268
+ `template <backend Backend, class T> backend_return_t<Backend, T>` where `T` is
269
+ instantiated as `command_graph<graph_state::executable>`.
270
+
271
+ [cols="2a,a"]
272
+ |===
273
+ |Backend|Native Graph Type
274
+
275
+ | `backend::opencl`
276
+ | `cl_command_buffer_khr`
277
+
278
+ | `backend::ext_oneapi_level_zero`
279
+ | `ze_command_list_handle_t`
280
+
281
+ | `backend::ext_oneapi_cuda`
282
+ | `CUGraph`
283
+
284
+ | `backend::ext_oneapi_hip`
285
+ | `hipGraph_t`
286
+
287
+ |===
288
+
289
+ == Examples
290
+
291
+ === HIP Native Task
183
292
184
293
This example demonstrates how to use this extension to enqueue asynchronous
185
294
native tasks on the HIP backend.
@@ -197,21 +306,135 @@ q.submit([&](sycl::handler &cgh) {
197
306
198
307
// Can enqueue arbitrary backend operations. This could also be a kernel
199
308
// launch or call to a library that enqueues operations on the stream etc
200
- //
201
- // Important: Enqueuing a *synchronous* backend operation results in
202
- // undefined behavior.
203
309
hipMemcpyAsync(target_ptr, native_mem, test_size * sizeof(int),
204
310
hipMemcpyDeviceToHost, stream);
205
311
});
206
312
});
207
313
q.wait();
208
314
```
209
315
210
- == Issues
316
+ === Level-Zero Add Native Graph Node
211
317
212
- === sycl_ext_oneapi_graph
318
+ This example demonstrates how to use this extension to add a native command
319
+ to a SYCL-Graph object on the Level-Zero backend. The command is doing a memory
320
+ copy between two USM pointers.
213
321
214
- `ext_codeplay_enqueue_native_command`
215
- cannot be used in graph nodes. A synchronous exception will be thrown with error
216
- code `invalid` if a user tries to add them to a graph.
322
+ ```c++
323
+ Graph.add([&](sycl::handler &CGH) {
324
+ CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
325
+ ze_command_list_handle_t NativeGraph =
326
+ IH.ext_codeplay_get_native_graph<sycl::backend::ext_oneapi_level_zero>();
217
327
328
+ zeCommandListAppendMemoryCopy(
329
+ NativeGraph, PtrY, PtrX, Size * sizeof(int), nullptr, 0, nullptr);
330
+ });
331
+ });
332
+ ```
333
+
334
+ === OpenCL Add Native Graph Node
335
+
336
+ This example demonstrates how to use this extension to add a native command to
337
+ a SYCL-Graph object on the OpenCL backend. The command is doing a copy between
338
+ two buffer objects.
339
+
340
+ ```c++
341
+ sycl::queue Queue;
342
+ auto Platform = get_native<sycl::backend::opencl>(Queue.get_context().get_platform());
343
+ clCommandCopyBufferKHR_fn clCommandCopyBufferKHR =
344
+ reinterpret_cast<clCommandCopyBufferKHR_fn>(
345
+ clGetExtensionFunctionAddressForPlatform(Platform, "clCommandCopyBufferKHR"));
346
+
347
+ Graph.add([&](sycl::handler &CGH) {
348
+ auto AccX = BufX.get_access(CGH);
349
+ auto AccY = BufY.get_access(CGH);
350
+ CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
351
+ cl_command_buffer_khr NativeGraph =
352
+ IH.ext_codeplay_get_native_graph<sycl::backend::opencl>();
353
+ auto SrcBuffer = IH.get_native_mem<sycl::backend::opencl>(AccX);
354
+ auto DstBuffer = IH.get_native_mem<sycl::backend::opencl>(AccY);
355
+
356
+ clCommandCopyBufferKHR(
357
+ NativeGraph, nullptr, nullptr, SrcBuffer[0], DstBuffer[0], 0, 0,
358
+ Size * sizeof(int), 0, nullptr, nullptr, nullptr);
359
+ });
360
+ });
361
+ ```
362
+
363
+ === CUDA Add Native Graph Node
364
+
365
+ This example demonstrates how to use this extension to add a native command to
366
+ a SYCL-Graph object on the CUDA backend. The command is doing a memory copy
367
+ between two device USM pointers.
368
+
369
+ ```c++
370
+ Graph.add([&](sycl::handler &CGH) {
371
+ CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
372
+ CUgraph NativeGraph =
373
+ IH.ext_codeplay_get_native_graph<sycl::backend::ext_oneapi_cuda>();
374
+
375
+ CUDA_MEMCPY3D Params;
376
+ std::memset(&Params, 0, sizeof(CUDA_MEMCPY3D));
377
+ Params.srcMemoryType = CU_MEMORYTYPE_DEVICE;
378
+ Params.srcDevice = (CUdeviceptr)PtrX;
379
+ Params.srcHost = nullptr;
380
+ Params.dstMemoryType = CU_MEMORYTYPE_DEVICE;
381
+ Params.dstDevice = (CUdeviceptr)PtrY;
382
+ Params.dstHost = nullptr;
383
+ Params.WidthInBytes = Size * sizeof(int);
384
+ Params.Height = 1;
385
+ Params.Depth = 1;
386
+
387
+ CUgraphNode Node;
388
+ CUcontext Context = IH.get_native_context<sycl::backend::ext_oneapi_cuda>();
389
+ cuGraphAddMemcpyNode(&Node, NativeGraph, nullptr, 0, &Params, Context);
390
+ });
391
+ });
392
+ ```
393
+
394
+ === HIP Add Native Graph Node
395
+
396
+ This example demonstrates how to use this extension to add a native command to
397
+ a SYCL-Graph object on the HIP backend. The command is doing a memory copy
398
+ between two device USM pointers.
399
+
400
+ ```c++
401
+ Graph.add([&](sycl::handler &CGH) {
402
+ CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
403
+ HIPGraph NativeGraph =
404
+ IH.ext_codeplay_get_native_graph<sycl::backend::ext_oneapi_hip>();
405
+
406
+ HIPGraphNode Node;
407
+ hipGraphAddMemcpyNode1D(&Node, NativeGraph, nullptr, 0, PtrY, PtrX,
408
+ Size * sizeof(int), hipMemcpyDefault);
409
+ });
410
+ });
411
+ ```
412
+
413
+ === CUDA Stream Record Native Graph Nodes
414
+
415
+ This example demonstrates how to use this extension to add stream recorded
416
+ native nodes to a SYCL-Graph object on the CUDA backend.
417
+
418
+ ```c++
419
+ q.submit([&](sycl::handler &CGH) {
420
+ CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
421
+ auto NativeStream = h.get_native_queue<cuda>();
422
+ if (IH.ext_codeplay_has_graph()) {
423
+ auto NativeGraph =
424
+ IH.ext_codeplay_get_native_graph<sycl::backend::ext_oneapi_cuda>();
425
+
426
+ // Start capture stream calls into graph
427
+ cuStreamBeginCaptureToGraph(NativeStream, NativeGraph, nullptr,
428
+ nullptr, 0,
429
+ CU_STREAM_CAPTURE_MODE_GLOBAL);
430
+
431
+ myNativeLibraryCall(NativeStream);
432
+
433
+ // Stop capturing stream calls into graph
434
+ cuStreamEndCapture(NativeStream, &NativeGraph);
435
+ } else {
436
+ myNativeLibraryCall(NativeStream);
437
+ }
438
+ });
439
+ });
440
+ ```
0 commit comments