Skip to content

Commit 7189586

Browse files
martygrantreble
andcommitted
[SYCL][OpenCL] Update E2E Graph tests to run using OpenCL, and new helper function to return early if Graphs are not supported by the device. Added OpenCL section to CommandGraph docs.
Co-authored-by: Pablo Reble <[email protected]>
1 parent 974cc70 commit 7189586

File tree

9 files changed

+83
-15
lines changed

9 files changed

+83
-15
lines changed

sycl/doc/design/CommandGraph.md

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ yet been implemented.
149149
Implementation of UR command-buffers
150150
for each of the supported SYCL 2020 backends.
151151

152-
Currently Level Zero and CUDA backends are implemented.
152+
Backends which are implemented currently are: Level Zero, CUDA and OpenCL.
153153
More sub-sections will be added here as other backends are supported.
154154

155155
### Level Zero
@@ -246,3 +246,62 @@ the executable CUDA Graph that represent this series of operations.
246246
An executable CUDA Graph, which contains all commands and synchronization
247247
information, is saved in the UR command-buffer to allow for efficient
248248
graph resubmission.
249+
250+
### OpenCL
251+
252+
Command Buffers are defined in the OpenCL spec in the [cl_khr_command_buffer](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer) extension.
253+
254+
There are some gaps in both the OpenCL and UR specifications for Command
255+
Buffers shown in the list below. There are implementations in the UR OpenCL
256+
adapter where there is matching support for each function in the list.
257+
258+
| UR | OpenCL | Supported |
259+
| --- | --- | --- |
260+
| urCommandBufferCreateExp | clCreateCommandBufferKHR | Yes |
261+
| urCommandBufferRetainExp | clRetainCommandBufferKHR | Yes |
262+
| urCommandBufferReleaseExp | clReleaseCommandBufferKHR | Yes |
263+
| urCommandBufferFinalizeExp | clFinalizeCommandBufferKHR | Yes |
264+
| urCommandBufferAppendKernelLaunchExp | clCommandNDRangeKernelKHR | Yes |
265+
| urCommandBufferAppendUSMMemcpyExp | | No |
266+
| urCommandBufferAppendUSMFillExp | | No |
267+
| urCommandBufferAppendMembufferCopyExp | clCommandCopyBufferKHR | Yes |
268+
| urCommandBufferAppendMemBufferWriteExp | | No |
269+
| urCommandBufferAppendMemBufferReadExp | | No |
270+
| urCommandBufferAppendMembufferCopyRectExp | clCommandCopyBufferRectKHR | Yes |
271+
| urCommandBufferAppendMemBufferWriteRectExp | | No |
272+
| urCommandBufferAppendMemBufferReadRectExp | | No |
273+
| urCommandBufferAppendMemBufferFillExp | clCommandFillBufferKHR | Yes |
274+
| urCommandBufferEnqueueExp | clEnqueueCommandBufferKHR | Yes |
275+
| | clCommandBarrierWithWaitListKHR | No |
276+
| | clCommandCopyImageKHR | No |
277+
| | clCommandCopyImageToBufferKHR | No |
278+
| | clCommandFillImageKHR | No |
279+
| | clGetCommandBufferInfoKHR | No |
280+
| | clCommandSVMMemcpyKHR | No |
281+
| | clCommandSVMMemFillKHR | No |
282+
283+
Many of the OpenCL functions take a `cl_command_queue` parameter which is not
284+
present in most of the UR functions. Instead, when a new command buffer is
285+
created in `urCommandBufferCreateExp` we also create and maintain a new
286+
internal `ur_queue_handle_t` with a reference stored inside of the
287+
`ur_exp_command_buffer_handle_t_` struct. This internal queue is then used with
288+
the various append functions. The internal queue is retained and released
289+
whenever the owning command buffer is retained or released.
290+
291+
With command buffers being an OpenCL extension, each function is accessed by
292+
loading a function pointer to its implementation. These are defined in a common
293+
header file in the UR OpenCL adapter. The symbols for the functions are however
294+
defined in [OpenCL-Headers](https://github.com/KhronosGroup/OpenCL-Headers/blob/main/CL/cl_ext.h) but it is not known at this time what version of the headers will be used in the UR GitHub CI configuration, so loading the function
295+
pointers will be used until this can be verified. A future piece of work would
296+
be replacing the custom defined symbols with the ones from OpenCL-Headers.
297+
298+
The `UR_DEVICE_INFO_EXTENSIONS` enum can be used with `urDeviceGetInfo` to
299+
query if a specified device supports OpenCL command buffers. This will append
300+
`ur_exp_command_buffer` to a string pointer passed to the function if the
301+
extension is supported.
302+
303+
Known implementations of cl_khr_command_buffer:
304+
- [OneAPI-Construction-Kit](https://github.com/codeplaysoftware/oneapi-construction-kit) (must enable `OCL_EXTENSION_cl_khr_command_buffer` when building)
305+
- [PoCL](http://portablecl.org/)
306+
- [Command-Buffer Emulation Layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu)
307+

sycl/plugins/unified_runtime/CMakeLists.txt

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -56,14 +56,14 @@ endif()
5656
if(SYCL_PI_UR_USE_FETCH_CONTENT)
5757
include(FetchContent)
5858

59-
set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
60-
# commit 31b654f981f6098936e7f04c65803395a2ea343a
61-
# Merge: 71957e84 3da21336
59+
set(UNIFIED_RUNTIME_REPO "https://github.com/martygrant/unified-runtime.git")
60+
# commit ec7982bac6cb3a6b9ed610cd6b7cb41fcbc780dc
61+
# Merge: 62e6d2f9 5fb82924
6262
# Author: Kenneth Benzie (Benie) <[email protected]>
63-
# Date: Wed Nov 22 11:27:33 2023 +0000
64-
# Merge pull request #1053 from jandres742/url0leakkey
65-
# [UR][L0] Add UR_L0_LEAKS_DEBUG key
66-
set(UNIFIED_RUNTIME_TAG 31b654f981f6098936e7f04c65803395a2ea343a)
63+
# Date: Wed Nov 8 13:32:46 2023 +0000
64+
# Merge pull request #1022 from 0x12CC/l0_usm_error_checking_2
65+
# [UR][L0] Propagate OOM errors from `USMAllocationMakeResident`
66+
set(UNIFIED_RUNTIME_TAG martin/openclCommandBuffers)
6767

6868
if(SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO)
6969
set(UNIFIED_RUNTIME_REPO "${SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO}")

sycl/plugins/unified_runtime/pi2ur.hpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,6 @@ static pi_result ur2piResult(ur_result_t urResult) {
127127
return PI_ERROR_INVALID_WORK_DIMENSION;
128128
case UR_RESULT_ERROR_INVALID_GLOBAL_WIDTH_DIMENSION:
129129
return PI_ERROR_INVALID_VALUE;
130-
131130
case UR_RESULT_ERROR_PROGRAM_UNLINKED:
132131
return PI_ERROR_INVALID_PROGRAM_EXECUTABLE;
133132
case UR_RESULT_ERROR_OVERLAPPING_REGIONS:
@@ -140,6 +139,10 @@ static pi_result ur2piResult(ur_result_t urResult) {
140139
return PI_ERROR_OUT_OF_RESOURCES;
141140
case UR_RESULT_ERROR_ADAPTER_SPECIFIC:
142141
return PI_ERROR_PLUGIN_SPECIFIC_ERROR;
142+
case UR_RESULT_ERROR_INVALID_COMMAND_BUFFER_EXP:
143+
return PI_ERROR_INVALID_COMMAND_BUFFER_KHR;
144+
case UR_RESULT_ERROR_INVALID_COMMAND_BUFFER_SYNC_POINT_WAIT_LIST_EXP:
145+
return PI_ERROR_INVALID_SYNC_POINT_WAIT_LIST_KHR;
143146
case UR_RESULT_ERROR_UNKNOWN:
144147
default:
145148
return PI_ERROR_UNKNOWN;

sycl/test-e2e/Graph/Explicit/cycle_error.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,14 @@ void CreateGraphWithCyclesTest(bool DisableCycleChecks) {
8080
}
8181

8282
int main() {
83+
{
84+
queue Queue;
85+
86+
if (!are_graphs_supported(Queue)) {
87+
return 0;
88+
}
89+
}
90+
8391
// Test with cycle checks
8492
CreateGraphWithCyclesTest(false);
8593
// Test without cycle checks

sycl/test-e2e/Graph/Explicit/executable_graph_update_ordering.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,4 @@
1111

1212
#define GRAPH_E2E_EXPLICIT
1313

14-
#include "../Inputs/executable_graph_update_ordering"
14+
#include "../Inputs/executable_graph_update_ordering.cpp"

sycl/test-e2e/Graph/Inputs/buffer_ordering.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
#include "../graph_common.hpp"
1313

1414
int main() {
15-
1615
queue Queue{{sycl::ext::intel::property::queue::no_immediate_command_list{}}};
1716

1817
if (!are_graphs_supported(Queue)) {

sycl/test-e2e/Graph/RecordReplay/dotp_multiple_queues.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
#include "../graph_common.hpp"
1111

1212
int main() {
13-
1413
property_list Properties{
1514
property::queue::in_order{},
1615
sycl::ext::intel::property::queue::no_immediate_command_list{}};

sycl/test-e2e/Graph/RecordReplay/executable_graph_update_ordering.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,4 @@
1111

1212
#define GRAPH_E2E_RECORD_REPLAY
1313

14-
#include "../Inputs/executable_graph_update_ordering"
14+
#include "../Inputs/executable_graph_update_ordering.cpp"

sycl/test-e2e/Graph/device_query.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// REQUIRES: cuda || level_zero, gpu
1+
// REQUIRES: opencl || cuda || level_zero
22
// RUN: %{build} -o %t.out
33
// RUN: %{run} %t.out
44

@@ -21,7 +21,7 @@ int main() {
2121
auto Backend = Device.get_backend();
2222

2323
if ((Backend == backend::ext_oneapi_level_zero) ||
24-
(Backend == backend::ext_oneapi_cuda)) {
24+
(Backend == backend::ext_oneapi_cuda) || (Backend == backend::opencl)) {
2525
assert(SupportsGraphs == exp_ext::graph_support_level::native);
2626
} else {
2727
assert(SupportsGraphs == exp_ext::graph_support_level::unsupported);

0 commit comments

Comments
 (0)