Skip to content

Commit 8bb4115

Browse files
fabiomestreEwanC
andauthored
[SYCL][Graph] Document new command-list enqueue path (#16096)
UR PR: oneapi-src/unified-runtime#1975 --------- Co-authored-by: Ewan Crawford <[email protected]>
1 parent 1873789 commit 8bb4115

File tree

2 files changed

+59
-3
lines changed

2 files changed

+59
-3
lines changed

sycl/doc/design/CommandGraph.md

Lines changed: 59 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,62 @@ Backends which are implemented currently are: [Level Zero](#level-zero),
337337

338338
### Level Zero
339339

340+
The command-buffer implementation for the level-zero adapter has 2 different
341+
implementation paths which are chosen depending on the device and level-zero
342+
version:
343+
344+
- Immediate Append path - Relies on
345+
[zeCommandListImmediateAppendCommandListsExp](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistimmediateappendcommandlistsexp)
346+
to submit the command-buffer. This function is an experimental extension to the level-zero API.
347+
- Wait event path - Relies on
348+
[zeCommandQueueExecuteCommandLists](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandqueueexecutecommandlists)
349+
to submit the command-buffer work. However, this level-zero function has
350+
limitations and, as such, this path is used only when the immediate append
351+
path is unavailable.
352+
353+
#### Immediate Append Path Implementation Details
354+
355+
This path is only available when the device supports immediate command-lists
356+
and the [zeCommandListImmediateAppendCommandListsExp](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistimmediateappendcommandlistsexp)
357+
API. This API can wait on a list of event dependencies using the `phWaitEvents`
358+
parameter and can signal a return event when finished using the `hSignalEvent`
359+
parameter. This allows for a cleaner and more efficient implementation than
360+
what can be achieved when using the wait-event path
361+
(see [this section](#wait-event-path-implementation-details) for
362+
more details about the wait-event path).
363+
364+
This path relies on 3 different command-lists in order to execute the
365+
command-buffer:
366+
367+
- `ComputeCommandList` - Used to submit command-buffer work that requires
368+
the compute engine.
369+
- `CopyCommandList` - Used to submit command-buffer work that requires the
370+
[copy engine](#copy-engine). This command-list is not created when none of the
371+
nodes require the copy engine.
372+
- `EventResetCommandList` - Used to reset the level-zero events that are
373+
needed for every submission of the command-buffer. This is executed after
374+
the compute and copy command-lists have finished executing. For the first
375+
execution, this command-list is skipped since there is no need to reset events
376+
at this point. When counter-based events are enabled (i.e. the command-buffer
377+
is in-order), this command-list is not created since counter-based events do
378+
not need to be reset.
379+
380+
The following diagram illustrates which commands are executed on
381+
each command-list when the command-buffer is enqueued:
382+
![L0 command-buffer diagram](images/diagram_immediate_append.png)
383+
384+
Additionally,
385+
[zeCommandListImmediateAppendCommandListsExp](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistimmediateappendcommandlistsexp)
386+
requires an extra command-list which is used to submit the other
387+
command-lists. This command-list has a specific engine type
388+
associated to it (i.e. compute or copy engine). Hence, for our implementation,
389+
we need 2 of these helper command-lists:
390+
- The `CommandListHelper` command-list is used to submit the
391+
`ComputeCommandList`, `CommandListResetEvents` and profiling queries.
392+
- The `ZeCopyEngineImmediateListHelper` command-list is used to submit the
393+
`CopyCommandList`
394+
395+
#### Wait event Path Implementation Details
340396
The UR `urCommandBufferEnqueueExp` interface for submitting a command-buffer
341397
takes a list of events to wait on, and returns an event representing the
342398
completion of that specific submission of the command-buffer.
@@ -364,7 +420,7 @@ is made only once (during the command-buffer finalization stage). This allows
364420
the adapter to save time when submitting the command-buffer, by executing only
365421
this command-list (i.e. without enqueuing any commands of the graph workload).
366422

367-
#### Prefix
423+
##### Prefix
368424

369425
The prefix's commands aim to:
370426
1. Handle the list of events to wait on, which is passed by the runtime
@@ -409,7 +465,7 @@ and another reset command for resetting the signal we use to signal the
409465
completion of the graph workload. This signal is called *SignalEvent* and is
410466
defined in the `ur_exp_command_buffer_handle_t` class.
411467

412-
#### Suffix
468+
##### Suffix
413469

414470
The suffix's commands aim to:
415471
1) Handle the completion of the graph workload and signal a UR return event.
@@ -435,7 +491,7 @@ with extra commands associated with *CB*, and the other after *CB*. These new
435491
command-lists are retrieved from the UR queue, which will likely reuse existing
436492
command-lists and only create a new one in the worst case.
437493

438-
#### Drawbacks
494+
##### Drawbacks
439495

440496
There are three drawbacks of this approach to implementing UR command-buffers for
441497
Level Zero:
Loading

0 commit comments

Comments
 (0)