Skip to content

Commit 556102f

Browse files
Add programmers guide section on immediate command lists
Related-To: NEO-6519 Signed-off-by: Aravind Gopalakrishnan <[email protected]>
1 parent b1f622d commit 556102f

File tree

2 files changed

+106
-1
lines changed

2 files changed

+106
-1
lines changed
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
<!---
2+
3+
Copyright (C) 2022 Intel Corporation
4+
5+
SPDX-License-Identifier: MIT
6+
7+
-->
8+
9+
# Level Zero Immediate Commandlist
10+
11+
* [Introduction](#Introduction)
12+
* [Availability](#Availability)
13+
* [Debug Keys](#Debug-Keys)
14+
* [Limitations](#Limitations)
15+
* [References](#References)
16+
17+
# Introduction
18+
19+
Immediate command lists is a feature provided by Level-Zero specification to allow for very low latency submission usage models. In this scheme, commands appended on the command list such as launching a kernel or performing a memory copy are immediately submitted to the device for execution. This is different from a regular command list where multiple commands can be stitched and submitted together for execution .
20+
21+
Distinctions between an immediate command list compared to a regular command list include (but not limited to) the following:
22+
23+
* An immediate command list is an implicit command queue and is therefore created using a command queue descriptor.
24+
* Commands appended to an immediate command list are submitted for execution immediately on the device.
25+
* Immediate command lists are not required to be closed or reset.
26+
* Synchronization of immediate command lists cannot be performed by user via zeCommandQueueSynchronize as the user will not have a command queue handle or via a fence handle as fences are associated with a command queue. Proper synchronization models are detailed further below.
27+
28+
Since the intention of immediate command lists are to primarily provide a razor thin submission interface to the device, they are well suited to be used in workloads who have tendency to launch small or short running kernels and also need to run multiple iterations of such kernels. Examples of workloads with such characteristics can be found in HPC environments and also ML/DL frameworks.
29+
30+
## Programming model
31+
32+
Pseudo-code for creating immediate command list (async mode):
33+
```
34+
ze_command_queue_desc_t cmdQueueDesc = {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC};
35+
cmdQueueDesc.pNext = nullptr;
36+
cmdQueueDesc.flags = 0;
37+
cmdQueueDesc.priority = ZE_COMMAND_QUEUE_PRIORITY_NORMAL;
38+
cmdQueueDesc.ordinal = queueGroupOrdinal;
39+
cmdQueueDesc.index = 0;
40+
cmdQueueDesc.mode = ZE_COMMAND_QUEUE_MODE_ASYNCHRONOUS;
41+
zeCommandListCreateImmediate(context, device, &cmdQueueDesc, &cmdList);
42+
```
43+
44+
Submitting commands and synchronization:
45+
Launching kernels:
46+
```
47+
zeCommandListAppendLaunchKernel(cmdList, kernel, &dispatchTraits,
48+
events[0], 0, nullptr);
49+
// If Async mode, use event for sync
50+
zeEventHostSynchronize(events[0], std::numeric_limits<uint64_t>::max() - 1);
51+
```
52+
53+
Performing copies:
54+
```
55+
zeCommandListAppendMemoryCopy(cmdList, deviceBuffer, hostBuffer, allocSize,
56+
events[0],
57+
0, nullptr);
58+
59+
zeCommandListAppendMemoryCopy(cmdList, stackBuffer, deviceBuffer, allocSize,
60+
events[1],
61+
1,
62+
&events[0]);
63+
64+
zeEventHostSynchronize(events[1], std::numeric_limits<uint64_t>::max() - 1));
65+
```
66+
67+
Pseudo-code for creating immediate command list (sync mode):
68+
```
69+
ze_command_queue_desc_t cmdQueueDesc = {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC};
70+
cmdQueueDesc.pNext = nullptr;
71+
cmdQueueDesc.flags = 0;
72+
cmdQueueDesc.priority = ZE_COMMAND_QUEUE_PRIORITY_NORMAL;
73+
cmdQueueDesc.ordinal = queueGroupOrdinal;
74+
cmdQueueDesc.index = 0;
75+
cmdQueueDesc.mode = ZE_COMMAND_QUEUE_MODE_SYNCHRONOUS;
76+
zeCommandListCreateImmediate(context, device, &cmdQueueDesc, &cmdList);
77+
```
78+
79+
Launching kernel:
80+
```
81+
zeCommandListAppendLaunchKernel(cmdList, kernel, &dispatchTraits,
82+
nullptr, 0, nullptr);
83+
```
84+
85+
For sync mode immediate command lists, synchronization is performed implicitly, so it is not required to use events to synchronize.
86+
87+
For mode code samples, please refer [compute-benchmarks](https://github.com/intel/compute-benchmarks/) repo. Scenarios such as `create_command_list_immediate_l0.cpp` and `execute_command_list_immediate_l0.cpp` serve as good starting points.
88+
89+
# Availability
90+
91+
* Level-Zero support for immediate command list is available for all platforms
92+
* Optimized support for immediate command lists is currently available by default on PVC only.
93+
* Immediate command lists support both Compute and Copy engines.
94+
95+
# Debug Keys
96+
97+
* Users can force optimized immediate command list by using `EnableFlushTaskSubmission=1`.
98+
99+
# Limitations
100+
Usage of `EnableFlushTaskSubmission=1` has been verified to to work on XeHP_SDV, PVC and DG2 only. So, it may or may not work depending on platform being used. Support for optimized immediate command lists for more platforms is work in progress.
101+
102+
# References
103+
104+
* https://one-api.gitlab-pages.devtools.intel.com/level_zero/core/PROG.html#low-latency-immediate-command-lists

programmers-guide/PROGRAMMERS_GUIDE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,5 @@ SPDX-License-Identifier: MIT
1212

1313
This document provides the architectural design followed in the Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver. Implementation details and optimization guidelines are explained, as well as a description of the different features available for the different supported platforms.
1414

15-
### [Implicit scaling](IMPLICIT_SCALING.md)
15+
### [Implicit scaling](IMPLICIT_SCALING.md)
16+
### [Immediate Commandlist](IMMEDIATE_COMMANDLIST.md)

0 commit comments

Comments
 (0)