Skip to content

Commit 8850a97

Browse files
mfrancepilloisEwanCkbenzie
authored
[SYCL][Graph] Update doc for UR PR moving reset commands to a dedicated cmd-list (#12770)
Update the design doc. Update the UR tag. --------- Co-authored-by: Ewan Crawford <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
1 parent b188783 commit 8850a97

File tree

4 files changed

+95
-49
lines changed

4 files changed

+95
-49
lines changed

sycl/doc/design/CommandGraph.md

Lines changed: 89 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -250,59 +250,107 @@ there are no parameters to take a wait-list, and the only sync primitive
250250
returned is blocking on host.
251251

252252
In order to achieve the expected UR command-buffer enqueue semantics with Level
253-
Zero, the adapter implementation adds extra commands to the Level Zero
254-
command-list representing a UR command-buffer.
255-
256-
* Prefix - Commands added to the start of the L0 command-list by L0 adapter.
257-
* Suffix - Commands added to the end of the L0 command-list by L0 adapter.
258-
259-
These extra commands operate on L0 event synchronisation primitives, used by the
260-
command-list to interact with the external UR wait-list and UR return event
261-
required for the enqueue interface.
262-
263-
The `ur_exp_command_buffer_handle_t` class for this adapter contains a
264-
*SignalEvent* which signals the completion of the command-list in the suffix,
265-
and is reset in the prefix. This signal is detected by a new UR return event
266-
created on UR command-buffer enqueue.
267-
268-
There is also a *WaitEvent* used by the `ur_exp_command_buffer_handle_t` class
269-
in the prefix to wait on any dependencies passed in the enqueue wait-list.
270-
This WaitEvent is reset in the suffix.
271-
272-
A command-buffer is expected to be submitted multiple times. Consequently,
253+
Zero, the adapter implementation needs extra commands.
254+
255+
* Prefix - Commands added **before** the graph workload.
256+
* Suffix - Commands added **after** the graph workload.
257+
258+
These extra commands operate on L0 event synchronisation primitives,
259+
used by the command-list to interact with the external UR wait-list
260+
and UR return event required for the enqueue interface.
261+
Unlike the graph workload (i.e. commands needed to perform the graph workload)
262+
the external UR wait-list and UR return event are submission dependent,
263+
which mean they can change from one submission to the next.
264+
265+
For performance concerns, the command-list that will execute the graph
266+
workload is made only once (during the command-buffer finalization stage).
267+
This allows the adapter to save time when submitting the command-buffer,
268+
by executing only this command-list (i.e. without enqueuing any commands
269+
of the graph workload).
270+
271+
#### Prefix
272+
273+
The prefix's commands aim to:
274+
1. Handle the the list on events to wait on, which is passed by the runtime
275+
when the UR command-buffer enqueue function is called.
276+
As mentioned above, this list of events changes from one submission
277+
to the next.
278+
Consequently, managing this mutable dependency in the graph-workload
279+
command-list implies rebuilding the command-list for each submission
280+
(note that this can change with mutable command-list).
281+
To avoid the signifiant time penalty of rebuilding this potentially large
282+
command-list each time, we prefer to add an extra command handling the
283+
wait list into another command-list (*wait command-list*).
284+
This command-list consists of a single L0 command: a barrier that waits for
285+
dependencies passed by the wait-list and signals a signal
286+
called *WaitEvent* when the barrier is complete.
287+
This *WaitEvent* is defined in the `ur_exp_command_buffer_handle_t` class.
288+
In the front of the graph workload command list, an extra barrier command
289+
waiting for this event is added (when the command-buffer is created).
290+
This ensures that the graph workload does not start running before
291+
the dependencies to be completed.
292+
The *WaitEvent* event is reset in the suffix.
293+
294+
295+
2. Reset events associated with the command-buffer except the
296+
*WaitEvent* event.
297+
Indeed, L0 events needs to be explicitly reset by an API call
298+
(L0 command in our case).
299+
Since a command-buffer is expected to be submitted multiple times,
273300
we need to ensure that L0 events associated with graph commands have not
274301
been signaled by a previous execution. These events are therefore reset to the
275-
non-signaled state before running the actual graph associated commands. Note
302+
non-signaled state before running the graph-workload command-list. Note
276303
that this reset is performed in the prefix and not in the suffix to avoid
277304
additional synchronization w.r.t profiling data extraction.
278-
279-
If a command-buffer is about to be submitted to a queue with the profiling
280-
property enabled, an extra command that copies timestamps of L0 events
281-
associated with graph commands into a dedicated memory which is attached to the
282-
returned UR event. This memory stores the profiling information that
283-
corresponds to the current submission of the command-buffer.
284-
285-
![L0 command-buffer diagram](images/L0_UR_command-buffer-v3.jpg)
305+
We use a new command list (*reset command-list*) for performance concerns.
306+
Indeed:
307+
* This allows the *WaitEvent* to be signaled directly on the host if
308+
the waiting list is empty, thus avoiding the need to submit a command list.
309+
* Enqueuing a reset L0 command for all events in the command-buffer is time
310+
consumming, especially for large graphs.
311+
However, this task is not needed for every submission, but only once, when the
312+
command-buffer is fixed, i.e. when the command-buffer is finalized. The
313+
decorellation between the reset command-list and the wait command-list allow us to
314+
create and enqueue the reset commands when finalizing the command-buffer,
315+
and only create the wait command-list at submission.
316+
317+
This command list is consist of a reset command for each of the graph commands
318+
and another reset command for resetting the signal we use to signal the completion
319+
of the graph workload. This signal is called *SignalEvent* and is defined in
320+
in the `ur_exp_command_buffer_handle_t` class.
321+
322+
#### Suffix
323+
324+
The suffix's commands aim to:
325+
1) Handle the completion of the graph workload and signal
326+
an UR return event.
327+
Thus, at the end of the graph workload command-list a command, which
328+
signals the *SignalEvent*, is added (when the command-buffer is finalized).
329+
In an additional command-list (*signal command-list*), a barrier waiting for
330+
this event is also added.
331+
This barrier signals, in turn, the UR return event that has be defined by
332+
the runtime layer when calling the `urCommandBufferEnqueueExp` function.
333+
334+
2) Manage the profiling. If a command-buffer is about to be submitted to
335+
a queue with the profiling property enabled, an extra command that copies
336+
timestamps of L0 events associated with graph commands into a dedicated
337+
memory which is attached to the returned UR event.
338+
This memory stores the profiling information that corresponds to
339+
the current submission of the command-buffer.
340+
341+
![L0 command-buffer diagram](images/L0_UR_command-buffer-v5.jpg)
286342

287343
For a call to `urCommandBufferEnqueueExp` with an `event_list` *EL*,
288-
command-buffer *CB*, and return event *RE* our implementation has to submit two
289-
new command-lists for the above approach to work. One before
344+
command-buffer *CB*, and return event *RE* our implementation has to submit
345+
three new command-lists for the above approach to work. Two before
290346
the command-list with extra commands associated with *CB*, and the other
291-
after *CB*. These two new command-lists are retrieved from the UR queue, which
347+
after *CB*. These new command-lists are retrieved from the UR queue, which
292348
will likely reuse existing command-lists and only create a new one in the worst
293349
case.
294350

295-
The L0 command-list created on `urCommandBufferEnqueueExp` to execute **before**
296-
*CB* contains a single command. This command is a barrier on *EL* that signals
297-
*CB*'s *WaitEvent* when completed.
298-
299-
The L0 command-list created on `urCommandBufferEnqueueExp` to execute **after**
300-
*CB* also contains a single command. This command is a barrier on *CB*'s
301-
*SignalEvent* that signals *RE* when completed.
302-
303351
#### Drawbacks
304352

305-
There are two drawbacks of this approach to implementing UR command-buffers for
353+
There are three drawbacks of this approach to implementing UR command-buffers for
306354
Level Zero:
307355

308356
1. 3x the command-list resources are used, if there are many UR command-buffers in
Binary file not shown.
Loading

sycl/plugins/unified_runtime/CMakeLists.txt

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -57,15 +57,13 @@ if(SYCL_PI_UR_USE_FETCH_CONTENT)
5757
include(FetchContent)
5858

5959
set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
60-
# commit d99d5f742cea18d7204c59c4320b8ea0329b49eb (HEAD -> main)
61-
# Merge: f17c0e91 c3809c61
60+
# commit 418ad5354ca24a6dfbd01df803949855b7a6c3dd
61+
# Merge: d99d5f74 26682290
6262
# Author: Kenneth Benzie (Benie) <[email protected]>
63-
# Date: Wed Mar 13 19:47:39 2024 +0000
64-
#
65-
# Merge pull request #1431 from zhaomaosu/fix-ocl-adapter-tear-down
66-
#
67-
# [CL] Gracefully tear down adapter in case that some globals have been released
68-
set(UNIFIED_RUNTIME_TAG d99d5f742cea18d7204c59c4320b8ea0329b49eb)
63+
# Date: Thu Mar 14 10:19:56 2024 +0000
64+
# Merge pull request #1365 from Bensuo/maxime/improve-L0-cmd-buffer-enqueing
65+
# [EXP][CMDBUF] Move event reset commands to dedicated cmd-list
66+
set(UNIFIED_RUNTIME_TAG 418ad5354ca24a6dfbd01df803949855b7a6c3dd)
6967

7068
if(SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO)
7169
set(UNIFIED_RUNTIME_REPO "${SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO}")

0 commit comments

Comments
 (0)