Skip to content

Commit 8ad50b9

Browse files
EwanCXewar313
authored andcommitted
Require OpenCL simultaneous use (#17658)
To support the SYCL-Graph extension on an OpenCL backend, we currently only require the presence of the `cl_khr_command_buffer` extension. This PR introduces an extra requirement on the [CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR) capability being present. This is based on the [graph execution wording](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc#765-new-handler-member-functions) on the definition of `handler::ext_oneapi_graph()` that: > Only one instance of graph will execute at any time. If graph is submitted multiple times, dependencies are automatically added by the runtime to prevent concurrent executions of an identical graph. Such usage results in multiple calls by the SYCL runtime to `urEnqueueCommandBufferExp` with the same UR command-buffer and event dependencies to prevent concurrent execution. Without support for simultaneous-use the OpenCL adapter code cannot guarantee that the first command-buffer submission has finished execution before it makes following `clEnqueueCommandBufferKHR` calls with the `cl_event` decencies. If the first submission is still executing, then an error will be reported. Workarounds like adding blocking host waits to the OpenCL UR adapter are possible, but requiring simultaneous use reflects the vendor requirements as they are for the currently implementation. I've tried to document this all in the UR spec and SYCL-Graph design docs, which also includes a couple of cleanups I found along the way. Note that the new CTS test fails for Level-Zero adapter, which I've created intel/llvm#17734 to resolve. --------- Co-authored-by: Mikołaj Komar <[email protected]>
1 parent 15d1e09 commit 8ad50b9

File tree

3 files changed

+59
-9
lines changed

3 files changed

+59
-9
lines changed

scripts/core/EXP-COMMAND-BUFFER.rst

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,11 +58,11 @@ to provide additional properties for how the command-buffer should be
5858
constructed. The members defined in ${x}_exp_command_buffer_desc_t are:
5959

6060
* ``isUpdatable``, which should be set to ``true`` to support :ref:`updating
61-
command-buffer commands`.
61+
command-buffer commands`.
6262
* ``isInOrder``, which should be set to ``true`` to enable commands enqueued to
63-
a command-buffer to be executed in an in-order fashion where possible.
63+
a command-buffer to be executed in an in-order fashion where possible.
6464
* ``enableProfiling``, which should be set to ``true`` to enable profiling of
65-
the command-buffer.
65+
the command-buffer.
6666

6767
Command-buffers are reference counted and can be retained and released by
6868
calling ${x}CommandBufferRetainExp and ${x}CommandBufferReleaseExp respectively.
@@ -226,15 +226,30 @@ Enqueueing Command-Buffers
226226
Command-buffers are submitted for execution on a ${x}_queue_handle_t with an
227227
optional list of dependent events. An event is returned which tracks the
228228
execution of the command-buffer, and will be complete when all appended commands
229-
have finished executing. It is adapter specific whether command-buffers can be
230-
enqueued or executed simultaneously, and submissions may be serialized.
229+
have finished executing.
231230

232231
.. parsed-literal::
233232
${x}_event_handle_t executionEvent;
234-
235233
${x}EnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,
236234
&executionEvent);
237235
236+
A command-buffer can be submitted for execution while a previous submission
237+
of the same command-buffer is still awaiting completion. That is, the user is not
238+
required to do a blocking wait on the completion of the first command-buffer
239+
submission before making a second submission of the command-buffer.
240+
241+
Submissions of the same command-buffer should be synchronized to prevent
242+
concurrent execution. For example, by using events, barriers, or in-order queue
243+
dependencies. The behavior of multiple submissions of the same command-buffer
244+
that can execute concurrently is undefined.
245+
246+
.. parsed-literal::
247+
// Valid usage if hQueue is in-order but undefined behavior is out-of-order
248+
${x}EnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,
249+
nullptr);
250+
${x}EnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,
251+
nullptr);
252+
238253
239254
Updating Command-Buffer Commands
240255
--------------------------------------------------------------------------------

source/adapters/opencl/device.cpp

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1531,9 +1531,19 @@ UR_APIEXPORT ur_result_t UR_APICALL urDeviceGetInfo(ur_device_handle_t hDevice,
15311531
CL_RETURN_ON_FAILURE(clGetDeviceInfo(Dev, CL_DEVICE_EXTENSIONS, ExtSize,
15321532
ExtStr.data(), nullptr));
15331533

1534-
std::string SupportedExtensions(ExtStr.c_str());
1535-
return ReturnValue(ExtStr.find("cl_khr_command_buffer") !=
1536-
std::string::npos);
1534+
// cl_khr_command_buffer is required for UR command-buffer support
1535+
cl_device_command_buffer_capabilities_khr Caps = 0;
1536+
if (ExtStr.find("cl_khr_command_buffer") != std::string::npos) {
1537+
// A UR command-buffer user needs to be able to enqueue another
1538+
// submission of the same UR command-buffer without having to manually
1539+
// check if the first submission has completed.
1540+
CL_RETURN_ON_FAILURE(
1541+
clGetDeviceInfo(Dev, CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR,
1542+
sizeof(Caps), &Caps, nullptr));
1543+
}
1544+
1545+
return ReturnValue(
1546+
0 != (Caps & CL_COMMAND_BUFFER_CAPABILITY_SIMULTANEOUS_USE_KHR));
15371547
}
15381548
case UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_CAPABILITIES_EXP: {
15391549
cl_device_id Dev = cl_adapter::cast<cl_device_id>(hDevice);

test/conformance/exp_command_buffer/fill.cpp

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,31 @@ TEST_P(urCommandBufferFillCommandsTest, Buffer) {
122122
verifyData(output, size);
123123
}
124124

125+
TEST_P(urCommandBufferFillCommandsTest, ExecuteTwice) {
126+
// TODO https://github.com/intel/llvm/issues/17734
127+
// Fail on Level-Zero due to blocking wait code in graph_impl.cpp specific
128+
// to the level-zero backend that needs moved into the Level-Zero v1 adapter.
129+
UUR_KNOWN_FAILURE_ON(uur::LevelZero{});
130+
ASSERT_SUCCESS(urCommandBufferAppendMemBufferFillExp(
131+
cmd_buf_handle, buffer, pattern.data(), pattern_size, 0, size, 0, nullptr,
132+
0, nullptr, &sync_point, nullptr, nullptr));
133+
134+
std::vector<uint8_t> output(size, 1);
135+
ASSERT_SUCCESS(urCommandBufferAppendMemBufferReadExp(
136+
cmd_buf_handle, buffer, 0, size, output.data(), 1, &sync_point, 0,
137+
nullptr, nullptr, nullptr, nullptr));
138+
139+
ASSERT_SUCCESS(urCommandBufferFinalizeExp(cmd_buf_handle));
140+
141+
ASSERT_SUCCESS(
142+
urEnqueueCommandBufferExp(queue, cmd_buf_handle, 0, nullptr, nullptr));
143+
ASSERT_SUCCESS(
144+
urEnqueueCommandBufferExp(queue, cmd_buf_handle, 0, nullptr, nullptr));
145+
ASSERT_SUCCESS(urQueueFinish(queue));
146+
147+
verifyData(output, size);
148+
}
149+
125150
TEST_P(urCommandBufferFillCommandsTest, USM) {
126151
ASSERT_SUCCESS(urCommandBufferAppendUSMFillExp(
127152
cmd_buf_handle, device_ptr, pattern.data(), pattern_size, size, 0,

0 commit comments

Comments
 (0)