Skip to content

Commit 795871c

Browse files
martygrantrebleEwanCkbenzie
authored andcommitted
[SYCL][OpenCL] Enable graph extension on OpenCL backend (intel#11718)
intel-llvm CI run for adding Command Buffers to the OpenCL Adapter in Unified Runtime - oneapi-src/unified-runtime#966 Also completes follow-on work identified in intel#11599 to add an OpenCL section to the SYCL-Graphs docs and update the e2e Graph tests. Updating the tests has since been completed in a separate PR - intel#11877 Depends on intel#11820 merging first. --------- Co-authored-by: Pablo Reble <[email protected]> Co-authored-by: Ewan Crawford <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
1 parent 96ff073 commit 795871c

26 files changed

+243
-36
lines changed

sycl/doc/design/CommandGraph.md

Lines changed: 103 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,8 +149,8 @@ yet been implemented.
149149
Implementation of UR command-buffers
150150
for each of the supported SYCL 2020 backends.
151151

152-
Currently Level Zero and CUDA backends are implemented.
153-
More sub-sections will be added here as other backends are supported.
152+
Backends which are implemented currently are: [Level Zero](#level-zero),
153+
[CUDA](#cuda), and partial support for [OpenCL](#opencl).
154154

155155
### Level Zero
156156

@@ -246,3 +246,104 @@ the executable CUDA Graph that represent this series of operations.
246246
An executable CUDA Graph, which contains all commands and synchronization
247247
information, is saved in the UR command-buffer to allow for efficient
248248
graph resubmission.
249+
250+
### OpenCL
251+
252+
SYCL-Graph is only enabled for an OpenCL backend when the
253+
[cl_khr_command_buffer](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer)
254+
extension is available, however this information isn't available until runtime
255+
due to OpenCL implementations being loaded through an ICD.
256+
257+
The `ur_exp_command_buffer` string is conditionally returned from the OpenCL
258+
command-buffer UR backend at runtime based on `cl_khr_command_buffer` support
259+
to indicate that the graph extension should be enabled. This is information
260+
is propagated to the SYCL user via the
261+
`device.get_info<info::device::graph_support>()` query for graph extension
262+
support.
263+
264+
#### Limitations
265+
266+
Due to the API mapping gaps documented in the following section, OpenCL as a
267+
SYCL backend cannot fully support the graph API. Instead, there are
268+
limitations in the types of nodes which a user can add to a graph, using
269+
an unsupported node type will cause a sycl exception to be thrown in graph
270+
finalization with error code `sycl::errc::feature_not_supported` and a message
271+
mentioning the unsupported command. For example,
272+
273+
```
274+
terminate called after throwing an instance of 'sycl::_V1::exception'
275+
what(): USM copy command not supported by graph backend
276+
```
277+
278+
The types of commands which are unsupported, and lead to this exception are:
279+
* `handler::copy(src, dest)` - Where `src` is an accessor and `dest` is a pointer.
280+
This corresponds to a memory buffer read command.
281+
* `handler::copy(src, dest)` - Where `src` is an pointer and `dest` is an accessor.
282+
This corresponds to a memory buffer write command.
283+
* `handler::copy(src, dest)` or `handler::memcpy(dest, src)` - Where both `src` and
284+
`dest` are USM pointers. This corresponds to a USM copy command.
285+
286+
Note that `handler::copy(src, dest)` where both `src` and `dest` are an accessor
287+
is supported, as a memory buffer copy command exists in the OpenCL extension.
288+
289+
#### UR API Mapping
290+
291+
There are some gaps in both the OpenCL and UR specifications for Command
292+
Buffers shown in the list below. There are implementations in the UR OpenCL
293+
adapter where there is matching support for each function in the list.
294+
295+
| UR | OpenCL | Supported |
296+
| --- | --- | --- |
297+
| urCommandBufferCreateExp | clCreateCommandBufferKHR | Yes |
298+
| urCommandBufferRetainExp | clRetainCommandBufferKHR | Yes |
299+
| urCommandBufferReleaseExp | clReleaseCommandBufferKHR | Yes |
300+
| urCommandBufferFinalizeExp | clFinalizeCommandBufferKHR | Yes |
301+
| urCommandBufferAppendKernelLaunchExp | clCommandNDRangeKernelKHR | Yes |
302+
| urCommandBufferAppendUSMMemcpyExp | | No |
303+
| urCommandBufferAppendUSMFillExp | | No |
304+
| urCommandBufferAppendMembufferCopyExp | clCommandCopyBufferKHR | Yes |
305+
| urCommandBufferAppendMemBufferWriteExp | | No |
306+
| urCommandBufferAppendMemBufferReadExp | | No |
307+
| urCommandBufferAppendMembufferCopyRectExp | clCommandCopyBufferRectKHR | Yes |
308+
| urCommandBufferAppendMemBufferWriteRectExp | | No |
309+
| urCommandBufferAppendMemBufferReadRectExp | | No |
310+
| urCommandBufferAppendMemBufferFillExp | clCommandFillBufferKHR | Yes |
311+
| urCommandBufferEnqueueExp | clEnqueueCommandBufferKHR | Yes |
312+
| | clCommandBarrierWithWaitListKHR | No |
313+
| | clCommandCopyImageKHR | No |
314+
| | clCommandCopyImageToBufferKHR | No |
315+
| | clCommandFillImageKHR | No |
316+
| | clGetCommandBufferInfoKHR | No |
317+
| | clCommandSVMMemcpyKHR | No |
318+
| | clCommandSVMMemFillKHR | No |
319+
320+
We are looking to address these gaps in the future so that SYCL-Graph can be
321+
fully supported on a `cl_khr_command_buffer` backend.
322+
323+
#### UR Command-Buffer Implementation
324+
325+
Many of the OpenCL functions take a `cl_command_queue` parameter which is not
326+
present in most of the UR functions. Instead, when a new command buffer is
327+
created in `urCommandBufferCreateExp` we also create and maintain a new
328+
internal `ur_queue_handle_t` with a reference stored inside of the
329+
`ur_exp_command_buffer_handle_t_` struct. The internal queue is retained and
330+
released whenever the owning command buffer is retained or released.
331+
332+
With command buffers being an OpenCL extension, each function is accessed by
333+
loading a function pointer to its implementation. These are defined in a common
334+
header file in the UR OpenCL adapter. The symbols for the functions are however
335+
defined in [OpenCL-Headers](https://github.com/KhronosGroup/OpenCL-Headers/blob/main/CL/cl_ext.h)
336+
but it is not known at this time what version of the headers will be used in
337+
the UR GitHub CI configuration, so loading the function pointers will be used
338+
until this can be verified. A future piece of work would be replacing the
339+
custom defined symbols with the ones from OpenCL-Headers.
340+
341+
#### Available OpenCL Command-Buffer Implementations
342+
343+
Publicly available implementations of `cl_khr_command_buffer` that can be used
344+
to enable the graph extension in OpenCL:
345+
346+
- [OneAPI Construction Kit](https://github.com/codeplaysoftware/oneapi-construction-kit) (must enable `OCL_EXTENSION_cl_khr_command_buffer` when building)
347+
- [PoCL](http://portablecl.org/)
348+
- [Command-Buffer Emulation Layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/efeae73139ddf064fafce565cc39640af10d900f/layers/10_cmdbufemu)
349+

sycl/doc/design/images/SYCL-Graph-Architecture.svg

Lines changed: 1 addition & 1 deletion
Loading

sycl/plugins/unified_runtime/CMakeLists.txt

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,13 +57,13 @@ if(SYCL_PI_UR_USE_FETCH_CONTENT)
5757
include(FetchContent)
5858

5959
set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
60-
# commit 31b654f981f6098936e7f04c65803395a2ea343a
61-
# Merge: 71957e84 3da21336
60+
# commit 109ed46ee55f41317d35b2a9a20fa7a2029e9e64
61+
# Merge: 31b654f9 23005313
6262
# Author: Kenneth Benzie (Benie) <[email protected]>
63-
# Date: Wed Nov 22 11:27:33 2023 +0000
64-
# Merge pull request #1053 from jandres742/url0leakkey
65-
# [UR][L0] Add UR_L0_LEAKS_DEBUG key
66-
set(UNIFIED_RUNTIME_TAG 31b654f981f6098936e7f04c65803395a2ea343a)
63+
# Date: Wed Nov 22 16:04:52 2023 +0000
64+
# Merge pull request #966 from martygrant/martin/openclCommandBuffers
65+
# [OpenCL] Add Command Buffer extension to OpenCL adapter.
66+
set(UNIFIED_RUNTIME_TAG 109ed46ee55f41317d35b2a9a20fa7a2029e9e64)
6767

6868
if(SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO)
6969
set(UNIFIED_RUNTIME_REPO "${SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO}")

sycl/plugins/unified_runtime/pi2ur.hpp

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ static pi_result ur2piResult(ur_result_t urResult) {
101101
return PI_ERROR_LINK_PROGRAM_FAILURE;
102102
case UR_RESULT_ERROR_UNSUPPORTED_VERSION:
103103
case UR_RESULT_ERROR_UNSUPPORTED_FEATURE:
104+
return PI_ERROR_INVALID_OPERATION;
104105
case UR_RESULT_ERROR_INVALID_ARGUMENT:
105106
case UR_RESULT_ERROR_INVALID_NULL_HANDLE:
106107
case UR_RESULT_ERROR_HANDLE_OBJECT_IN_USE:
@@ -127,7 +128,6 @@ static pi_result ur2piResult(ur_result_t urResult) {
127128
return PI_ERROR_INVALID_WORK_DIMENSION;
128129
case UR_RESULT_ERROR_INVALID_GLOBAL_WIDTH_DIMENSION:
129130
return PI_ERROR_INVALID_VALUE;
130-
131131
case UR_RESULT_ERROR_PROGRAM_UNLINKED:
132132
return PI_ERROR_INVALID_PROGRAM_EXECUTABLE;
133133
case UR_RESULT_ERROR_OVERLAPPING_REGIONS:
@@ -140,6 +140,10 @@ static pi_result ur2piResult(ur_result_t urResult) {
140140
return PI_ERROR_OUT_OF_RESOURCES;
141141
case UR_RESULT_ERROR_ADAPTER_SPECIFIC:
142142
return PI_ERROR_PLUGIN_SPECIFIC_ERROR;
143+
case UR_RESULT_ERROR_INVALID_COMMAND_BUFFER_EXP:
144+
return PI_ERROR_INVALID_COMMAND_BUFFER_KHR;
145+
case UR_RESULT_ERROR_INVALID_COMMAND_BUFFER_SYNC_POINT_WAIT_LIST_EXP:
146+
return PI_ERROR_INVALID_SYNC_POINT_WAIT_LIST_KHR;
143147
case UR_RESULT_ERROR_UNKNOWN:
144148
default:
145149
return PI_ERROR_UNKNOWN;

sycl/source/detail/memory_manager.cpp

Lines changed: 68 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1518,10 +1518,20 @@ void MemoryManager::ext_oneapi_copyD2H_cmd_buffer(
15181518
}
15191519

15201520
if (1 == DimDst && 1 == DimSrc) {
1521-
Plugin->call<PiApiKind::piextCommandBufferMemBufferRead>(
1522-
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
1523-
SrcXOffBytes, SrcAccessRangeWidthBytes, DstMem + DstXOffBytes,
1524-
Deps.size(), Deps.data(), OutSyncPoint);
1521+
pi_result Result =
1522+
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferRead>(
1523+
CommandBuffer,
1524+
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
1525+
SrcXOffBytes, SrcAccessRangeWidthBytes, DstMem + DstXOffBytes,
1526+
Deps.size(), Deps.data(), OutSyncPoint);
1527+
1528+
if (Result == PI_ERROR_INVALID_OPERATION) {
1529+
throw sycl::exception(
1530+
sycl::make_error_code(sycl::errc::feature_not_supported),
1531+
"Device-to-host buffer copy command not supported by graph backend");
1532+
} else {
1533+
Plugin->checkPiResult(Result);
1534+
}
15251535
} else {
15261536
size_t BufferRowPitch = (1 == DimSrc) ? 0 : SrcSzWidthBytes;
15271537
size_t BufferSlicePitch =
@@ -1538,11 +1548,20 @@ void MemoryManager::ext_oneapi_copyD2H_cmd_buffer(
15381548
SrcAccessRange[SrcPos.YTerm],
15391549
SrcAccessRange[SrcPos.ZTerm]};
15401550

1541-
Plugin->call<PiApiKind::piextCommandBufferMemBufferReadRect>(
1542-
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
1543-
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
1544-
BufferSlicePitch, HostRowPitch, HostSlicePitch, DstMem, Deps.size(),
1545-
Deps.data(), OutSyncPoint);
1551+
pi_result Result =
1552+
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferReadRect>(
1553+
CommandBuffer,
1554+
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
1555+
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
1556+
BufferSlicePitch, HostRowPitch, HostSlicePitch, DstMem, Deps.size(),
1557+
Deps.data(), OutSyncPoint);
1558+
if (Result == PI_ERROR_INVALID_OPERATION) {
1559+
throw sycl::exception(
1560+
sycl::make_error_code(sycl::errc::feature_not_supported),
1561+
"Device-to-host buffer copy command not supported by graph backend");
1562+
} else {
1563+
Plugin->checkPiResult(Result);
1564+
}
15461565
}
15471566
}
15481567

@@ -1576,10 +1595,20 @@ void MemoryManager::ext_oneapi_copyH2D_cmd_buffer(
15761595
}
15771596

15781597
if (1 == DimDst && 1 == DimSrc) {
1579-
Plugin->call<PiApiKind::piextCommandBufferMemBufferWrite>(
1580-
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
1581-
DstXOffBytes, DstAccessRangeWidthBytes, SrcMem + SrcXOffBytes,
1582-
Deps.size(), Deps.data(), OutSyncPoint);
1598+
pi_result Result =
1599+
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferWrite>(
1600+
CommandBuffer,
1601+
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
1602+
DstXOffBytes, DstAccessRangeWidthBytes, SrcMem + SrcXOffBytes,
1603+
Deps.size(), Deps.data(), OutSyncPoint);
1604+
1605+
if (Result == PI_ERROR_INVALID_OPERATION) {
1606+
throw sycl::exception(
1607+
sycl::make_error_code(sycl::errc::feature_not_supported),
1608+
"Host-to-device buffer copy command not supported by graph backend");
1609+
} else {
1610+
Plugin->checkPiResult(Result);
1611+
}
15831612
} else {
15841613
size_t BufferRowPitch = (1 == DimDst) ? 0 : DstSzWidthBytes;
15851614
size_t BufferSlicePitch =
@@ -1596,11 +1625,21 @@ void MemoryManager::ext_oneapi_copyH2D_cmd_buffer(
15961625
DstAccessRange[DstPos.YTerm],
15971626
DstAccessRange[DstPos.ZTerm]};
15981627

1599-
Plugin->call<PiApiKind::piextCommandBufferMemBufferWriteRect>(
1600-
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
1601-
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
1602-
BufferSlicePitch, HostRowPitch, HostSlicePitch, SrcMem, Deps.size(),
1603-
Deps.data(), OutSyncPoint);
1628+
pi_result Result =
1629+
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferWriteRect>(
1630+
CommandBuffer,
1631+
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
1632+
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
1633+
BufferSlicePitch, HostRowPitch, HostSlicePitch, SrcMem, Deps.size(),
1634+
Deps.data(), OutSyncPoint);
1635+
1636+
if (Result == PI_ERROR_INVALID_OPERATION) {
1637+
throw sycl::exception(
1638+
sycl::make_error_code(sycl::errc::feature_not_supported),
1639+
"Host-to-device buffer copy command not supported by graph backend");
1640+
} else {
1641+
Plugin->checkPiResult(Result);
1642+
}
16041643
}
16051644
}
16061645

@@ -1614,9 +1653,17 @@ void MemoryManager::ext_oneapi_copy_usm_cmd_buffer(
16141653
PI_ERROR_INVALID_VALUE);
16151654

16161655
const PluginPtr &Plugin = Context->getPlugin();
1617-
Plugin->call<PiApiKind::piextCommandBufferMemcpyUSM>(
1618-
CommandBuffer, DstMem, SrcMem, Len, Deps.size(), Deps.data(),
1619-
OutSyncPoint);
1656+
pi_result Result =
1657+
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemcpyUSM>(
1658+
CommandBuffer, DstMem, SrcMem, Len, Deps.size(), Deps.data(),
1659+
OutSyncPoint);
1660+
if (Result == PI_ERROR_INVALID_OPERATION) {
1661+
throw sycl::exception(
1662+
sycl::make_error_code(sycl::errc::feature_not_supported),
1663+
"USM copy command not supported by graph backend");
1664+
} else {
1665+
Plugin->checkPiResult(Result);
1666+
}
16201667
}
16211668

16221669
void MemoryManager::copy_image_bindless(

sycl/test-e2e/Graph/Explicit/buffer_copy_host2target.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
//
99
// TODO enable cuda once buffer issue investigated and fixed
1010
// UNSUPPORTED: cuda
11+
//
12+
// Host to device copy command not supported for OpenCL
13+
// UNSUPPORTED: opencl
1114

1215
#define GRAPH_E2E_EXPLICIT
1316

sycl/test-e2e/Graph/Explicit/buffer_copy_host2target_2d.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
//
99
// TODO enable cuda once buffer issue investigated and fixed
1010
// UNSUPPORTED: cuda
11+
//
12+
// Host to device copy command not supported for OpenCL
13+
// UNSUPPORTED: opencl
1114

1215
#define GRAPH_E2E_EXPLICIT
1316

sycl/test-e2e/Graph/Explicit/buffer_copy_host2target_offset.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
//
99
// TODO enable cuda once buffer issue investigated and fixed
1010
// UNSUPPORTED: cuda
11+
//
12+
// Host to device copy command not supported for OpenCL
13+
// UNSUPPORTED: opencl
1114

1215
#define GRAPH_E2E_EXPLICIT
1316

sycl/test-e2e/Graph/Explicit/buffer_copy_target2host.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
//
99
// TODO enable cuda once buffer issue investigated and fixed
1010
// UNSUPPORTED: cuda
11+
//
12+
// Device to host copy command not supported for OpenCL
13+
// UNSUPPORTED: opencl
1114

1215
#define GRAPH_E2E_EXPLICIT
1316

sycl/test-e2e/Graph/Explicit/buffer_copy_target2host_2d.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
//
99
// TODO enable cuda once buffer issue investigated and fixed
1010
// UNSUPPORTED: cuda
11+
//
12+
// Device to host copy command not supported for OpenCL
13+
// UNSUPPORTED: opencl
1114

1215
#define GRAPH_E2E_EXPLICIT
1316

sycl/test-e2e/Graph/Explicit/buffer_copy_target2host_offset.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
//
99
// TODO enable cuda once buffer issue investigated and fixed
1010
// UNSUPPORTED: cuda
11+
//
12+
// Device to host copy command not supported for OpenCL
13+
// UNSUPPORTED: opencl
1114

1215
#define GRAPH_E2E_EXPLICIT
1316

sycl/test-e2e/Graph/Explicit/cycle_error.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,14 @@ void CreateGraphWithCyclesTest(bool DisableCycleChecks) {
7777
}
7878

7979
int main() {
80+
{
81+
queue Queue;
82+
83+
if (!are_graphs_supported(Queue)) {
84+
return 0;
85+
}
86+
}
87+
8088
// Test with cycle checks
8189
CreateGraphWithCyclesTest(false);
8290
// Test without cycle checks

sycl/test-e2e/Graph/Explicit/executable_graph_update_ordering.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@
1212

1313
#define GRAPH_E2E_EXPLICIT
1414

15-
#include "../Inputs/executable_graph_update_ordering"
15+
#include "../Inputs/executable_graph_update_ordering.cpp"

sycl/test-e2e/Graph/Explicit/usm_copy.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@
55
// RUN: %if ext_oneapi_level_zero %{env UR_L0_LEAKS_DEBUG=1 %{run} %t.out 2>&1 | FileCheck %s %}
66
//
77
// CHECK-NOT: LEAK
8+
//
9+
// USM copy command not supported for OpenCL
10+
// UNSUPPORTED: opencl
811

912
#define GRAPH_E2E_EXPLICIT
1013

sycl/test-e2e/Graph/Inputs/buffer_ordering.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
#include "../graph_common.hpp"
1313

1414
int main() {
15-
1615
queue Queue{{sycl::ext::intel::property::queue::no_immediate_command_list{}}};
1716

1817
const size_t N = 10;

sycl/test-e2e/Graph/RecordReplay/buffer_copy_host2target.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
//
99
// TODO enable cuda once buffer issue investigated and fixed
1010
// UNSUPPORTED: cuda
11+
//
12+
// Host to device copy command not supported for OpenCL
13+
// UNSUPPORTED: opencl
1114

1215
#define GRAPH_E2E_RECORD_REPLAY
1316

0 commit comments

Comments
 (0)