@@ -250,59 +250,107 @@ there are no parameters to take a wait-list, and the only sync primitive
250
250
returned is blocking on host.
251
251
252
252
In order to achieve the expected UR command-buffer enqueue semantics with Level
253
- Zero, the adapter implementation adds extra commands to the Level Zero
254
- command-list representing a UR command-buffer.
255
-
256
- * Prefix - Commands added to the start of the L0 command-list by L0 adapter.
257
- * Suffix - Commands added to the end of the L0 command-list by L0 adapter.
258
-
259
- These extra commands operate on L0 event synchronisation primitives, used by the
260
- command-list to interact with the external UR wait-list and UR return event
261
- required for the enqueue interface.
262
-
263
- The ` ur_exp_command_buffer_handle_t ` class for this adapter contains a
264
- * SignalEvent* which signals the completion of the command-list in the suffix,
265
- and is reset in the prefix. This signal is detected by a new UR return event
266
- created on UR command-buffer enqueue.
267
-
268
- There is also a * WaitEvent* used by the ` ur_exp_command_buffer_handle_t ` class
269
- in the prefix to wait on any dependencies passed in the enqueue wait-list.
270
- This WaitEvent is reset in the suffix.
271
-
272
- A command-buffer is expected to be submitted multiple times. Consequently,
253
+ Zero, the adapter implementation needs extra commands.
254
+
255
+ * Prefix - Commands added ** before** the graph workload.
256
+ * Suffix - Commands added ** after** the graph workload.
257
+
258
+ These extra commands operate on L0 event synchronisation primitives,
259
+ used by the command-list to interact with the external UR wait-list
260
+ and UR return event required for the enqueue interface.
261
+ Unlike the graph workload (i.e. commands needed to perform the graph workload)
262
+ the external UR wait-list and UR return event are submission dependent,
263
+ which mean they can change from one submission to the next.
264
+
265
+ For performance concerns, the command-list that will execute the graph
266
+ workload is made only once (during the command-buffer finalization stage).
267
+ This allows the adapter to save time when submitting the command-buffer,
268
+ by executing only this command-list (i.e. without enqueuing any commands
269
+ of the graph workload).
270
+
271
+ #### Prefix
272
+
273
+ The prefix's commands aim to:
274
+ 1 . Handle the the list on events to wait on, which is passed by the runtime
275
+ when the UR command-buffer enqueue function is called.
276
+ As mentioned above, this list of events changes from one submission
277
+ to the next.
278
+ Consequently, managing this mutable dependency in the graph-workload
279
+ command-list implies rebuilding the command-list for each submission
280
+ (note that this can change with mutable command-list).
281
+ To avoid the signifiant time penalty of rebuilding this potentially large
282
+ command-list each time, we prefer to add an extra command handling the
283
+ wait list into another command-list (* wait command-list* ).
284
+ This command-list consists of a single L0 command: a barrier that waits for
285
+ dependencies passed by the wait-list and signals a signal
286
+ called * WaitEvent* when the barrier is complete.
287
+ This * WaitEvent* is defined in the ` ur_exp_command_buffer_handle_t ` class.
288
+ In the front of the graph workload command list, an extra barrier command
289
+ waiting for this event is added (when the command-buffer is created).
290
+ This ensures that the graph workload does not start running before
291
+ the dependencies to be completed.
292
+ The * WaitEvent* event is reset in the suffix.
293
+
294
+
295
+ 2 . Reset events associated with the command-buffer except the
296
+ * WaitEvent* event.
297
+ Indeed, L0 events needs to be explicitly reset by an API call
298
+ (L0 command in our case).
299
+ Since a command-buffer is expected to be submitted multiple times,
273
300
we need to ensure that L0 events associated with graph commands have not
274
301
been signaled by a previous execution. These events are therefore reset to the
275
- non-signaled state before running the actual graph associated commands . Note
302
+ non-signaled state before running the graph-workload command-list . Note
276
303
that this reset is performed in the prefix and not in the suffix to avoid
277
304
additional synchronization w.r.t profiling data extraction.
278
-
279
- If a command-buffer is about to be submitted to a queue with the profiling
280
- property enabled, an extra command that copies timestamps of L0 events
281
- associated with graph commands into a dedicated memory which is attached to the
282
- returned UR event. This memory stores the profiling information that
283
- corresponds to the current submission of the command-buffer.
284
-
285
- ![ L0 command-buffer diagram] ( images/L0_UR_command-buffer-v3.jpg )
305
+ We use a new command list (* reset command-list* ) for performance concerns.
306
+ Indeed:
307
+ * This allows the * WaitEvent* to be signaled directly on the host if
308
+ the waiting list is empty, thus avoiding the need to submit a command list.
309
+ * Enqueuing a reset L0 command for all events in the command-buffer is time
310
+ consumming, especially for large graphs.
311
+ However, this task is not needed for every submission, but only once, when the
312
+ command-buffer is fixed, i.e. when the command-buffer is finalized. The
313
+ decorellation between the reset command-list and the wait command-list allow us to
314
+ create and enqueue the reset commands when finalizing the command-buffer,
315
+ and only create the wait command-list at submission.
316
+
317
+ This command list is consist of a reset command for each of the graph commands
318
+ and another reset command for resetting the signal we use to signal the completion
319
+ of the graph workload. This signal is called * SignalEvent* and is defined in
320
+ in the ` ur_exp_command_buffer_handle_t ` class.
321
+
322
+ #### Suffix
323
+
324
+ The suffix's commands aim to:
325
+ 1 ) Handle the completion of the graph workload and signal
326
+ an UR return event.
327
+ Thus, at the end of the graph workload command-list a command, which
328
+ signals the * SignalEvent* , is added (when the command-buffer is finalized).
329
+ In an additional command-list (* signal command-list* ), a barrier waiting for
330
+ this event is also added.
331
+ This barrier signals, in turn, the UR return event that has be defined by
332
+ the runtime layer when calling the ` urCommandBufferEnqueueExp ` function.
333
+
334
+ 2 ) Manage the profiling. If a command-buffer is about to be submitted to
335
+ a queue with the profiling property enabled, an extra command that copies
336
+ timestamps of L0 events associated with graph commands into a dedicated
337
+ memory which is attached to the returned UR event.
338
+ This memory stores the profiling information that corresponds to
339
+ the current submission of the command-buffer.
340
+
341
+ ![ L0 command-buffer diagram] ( images/L0_UR_command-buffer-v5.jpg )
286
342
287
343
For a call to ` urCommandBufferEnqueueExp ` with an ` event_list ` * EL* ,
288
- command-buffer * CB* , and return event * RE* our implementation has to submit two
289
- new command-lists for the above approach to work. One before
344
+ command-buffer * CB* , and return event * RE* our implementation has to submit
345
+ three new command-lists for the above approach to work. Two before
290
346
the command-list with extra commands associated with * CB* , and the other
291
- after * CB* . These two new command-lists are retrieved from the UR queue, which
347
+ after * CB* . These new command-lists are retrieved from the UR queue, which
292
348
will likely reuse existing command-lists and only create a new one in the worst
293
349
case.
294
350
295
- The L0 command-list created on ` urCommandBufferEnqueueExp ` to execute ** before**
296
- * CB* contains a single command. This command is a barrier on * EL* that signals
297
- * CB* 's * WaitEvent* when completed.
298
-
299
- The L0 command-list created on ` urCommandBufferEnqueueExp ` to execute ** after**
300
- * CB* also contains a single command. This command is a barrier on * CB* 's
301
- * SignalEvent* that signals * RE* when completed.
302
-
303
351
#### Drawbacks
304
352
305
- There are two drawbacks of this approach to implementing UR command-buffers for
353
+ There are three drawbacks of this approach to implementing UR command-buffers for
306
354
Level Zero:
307
355
308
356
1 . 3x the command-list resources are used, if there are many UR command-buffers in
0 commit comments