Skip to content

Commit 65cc0cf

Browse files
authored
[SYCL] [L0] Add user APIs to import / release host memory from USM. (#9294)
This change adds a SYCL interface to the Level Zero APIs zexDriverImportExternalPointer and zexDriverReleaseImportedPointer. These functions are used for importing host memory into USM for the duration of data transfer to increase bandwidth.
1 parent 8fe3101 commit 65cc0cf

File tree

24 files changed

+507
-12
lines changed

24 files changed

+507
-12
lines changed
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
= sycl_ext_oneapi_copy_optimize
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
14+
// Set the default source code type in this document to C++,
15+
// for syntax highlighting purposes. This is needed because
16+
// docbook uses c++ and html5 uses cpp.
17+
:language: {basebackend@docbook:c++:cpp}
18+
19+
20+
== Notice
21+
22+
[%hardbreaks]
23+
Copyright (C) 2023-2023 Intel Corporation. All rights reserved.
24+
25+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
26+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
27+
permission by Khronos.
28+
29+
30+
== Contact
31+
32+
To report problems with this extension, please open a new issue at:
33+
34+
https://github.com/intel/llvm/issues
35+
36+
37+
== Dependencies
38+
39+
This extension is written against the SYCL 2020 revision 7 specification. All
40+
references below to the "core SYCL specification" or to section numbers in the
41+
SYCL specification refer to that revision.
42+
43+
44+
== Status
45+
46+
This is an experimental extension specification, intended to provide early
47+
access to features and gather community feedback. Interfaces defined in this
48+
specification are implemented in DPC\++ but they are not finalized and may
49+
change incompatibly in future versions of DPC++ without prior notice.
50+
Shipping software products should not rely on APIs defined in this
51+
specification.
52+
53+
54+
== Overview
55+
56+
SYCL provides explicit copy APIs that copy memory between a device and the host.
57+
Some backends can optimize these copy operations if the backend knows in
58+
advance that the host memory will be used for such a copy operation.
59+
If a SYCL application knows that it will frequently copy to or from the same
60+
host memory, it can use the APIs in this extension to inform the backend
61+
about this, which can result in a performance benefit.
62+
63+
64+
== Specification
65+
66+
=== Feature test macro
67+
68+
This extension provides a feature-test macro as described in the core SYCL
69+
specification. An implementation supporting this extension must predefine
70+
the macro `SYCL_EXT_ONEAPI_COPY_OPTIMIZE` to one of the values defined
71+
in the table below. Applications can test for the existence of this macro
72+
to determine if the implementation supports this feature, or applications
73+
can test the macro's value to determine which of the extension's features
74+
the implementation supports.
75+
76+
[%header,cols="1,5"]
77+
|===
78+
|Value
79+
|Description
80+
81+
|1
82+
|Initial version of this extension.
83+
|===
84+
85+
=== API of the extension
86+
87+
This extension adds the following free functions:
88+
89+
```c++
90+
namespace sycl::ext::oneapi::experimental {
91+
92+
void prepare_for_device_copy(const void *Ptr, size_t NumBytes,
93+
const context &Context);
94+
void prepare_for_device_copy(const void *Ptr, size_t NumBytes,
95+
const queue &Queue);
96+
97+
void release_from_device_copy(const void *Ptr, const context &Context);
98+
void release_from_device_copy(const void *Ptr, const queue &Queue);
99+
100+
} // namespace sycl::ext::oneapi::experimental
101+
```
102+
103+
Table 1. Functions added by this extension.
104+
|====
105+
| Member Function | Description
106+
a|
107+
```
108+
void prepare_for_device_copy(
109+
const void *Ptr,
110+
size_t NumBytes,
111+
const context &Context);
112+
```
113+
114+
| Informs the implementation that the host memory range starting at `Ptr` and
115+
extending for `NumBytes` bytes may be used as either the source or destination
116+
of an explicit copy operation to a queue using context Context. Calling this
117+
function may allow the implementation to accelerate these copy operations.
118+
Generally, it is only useful to call this function if the host memory range will
119+
be used for several subsequent copy operations; it is not generally useful to
120+
call it before each individual copy operation.
121+
122+
The behavior is undefined when multiple calls to `prepare_for_device_copy`
123+
specify memory ranges that overlap, even when using different
124+
SYCL contexts. No error message is issued.
125+
126+
a|
127+
```
128+
void prepare_for_device_copy(
129+
const void *Ptr,
130+
size_t NumBytes,
131+
const queue &Queue)
132+
```
133+
| Has the same effect as
134+
`prepare_for_device_copy(Ptr, NumBytes, Queue.get_context())`.
135+
136+
a|
137+
```
138+
void release_from_device_copy(
139+
const void *Ptr,
140+
const context &Context)
141+
```
142+
| Undoes the effect of a previous call to prepare_for_device_copy on `Ptr`.
143+
It is still valid to copy to or from `Ptr`, but these copies might not be
144+
optimized.
145+
146+
The `Ptr` and `Context` must match the values passed to a previous call to
147+
`prepare_for_device_copy` for a memory range that has not yet been
148+
released, otherwise the behavior is undefined.
149+
150+
a|
151+
```
152+
void release_from_device_copy(
153+
const void *Ptr,
154+
const queue &Queue)
155+
```
156+
| Has the same effect as
157+
`release_from_device_copy(Ptr, Queue.get_context())`.
158+
159+
|====

sycl/include/sycl/detail/pi.def

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,10 @@ _PI_API(piextEnablePeerAccess)
159159
_PI_API(piextDisablePeerAccess)
160160
_PI_API(piextPeerAccessGetInfo)
161161

162+
// USM import/release APIs
163+
_PI_API(piextUSMImport)
164+
_PI_API(piextUSMRelease)
165+
162166
// command-buffer Extension
163167
_PI_API(piextCommandBufferCreate)
164168
_PI_API(piextCommandBufferRetain)

sycl/include/sycl/detail/pi.h

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,9 +143,10 @@
143143
// - piextDestroyExternalSemaphore
144144
// - piextWaitExternalSemaphore
145145
// - piextSignalExternalSemaphore
146+
// 14.37 Added piextUSMImportExternalPointer and piextUSMReleaseImportedPointer.
146147

147148
#define _PI_H_VERSION_MAJOR 14
148-
#define _PI_H_VERSION_MINOR 36
149+
#define _PI_H_VERSION_MINOR 37
149150

150151
#define _PI_STRING_HELPER(a) #a
151152
#define _PI_CONCAT(a, b) _PI_STRING_HELPER(a.b)
@@ -2087,6 +2088,20 @@ __SYCL_EXPORT pi_result piextUSMEnqueueMemcpy2D(
20872088
pi_uint32 num_events_in_waitlist, const pi_event *events_waitlist,
20882089
pi_event *event);
20892090

2091+
/// Import host system memory into USM.
2092+
///
2093+
/// \param ptr start address of memory range to import
2094+
/// \param size is the number of bytes to import
2095+
/// \param context is the pi_context
2096+
__SYCL_EXPORT pi_result piextUSMImport(const void *ptr, size_t size,
2097+
pi_context context);
2098+
2099+
/// Release host system memory from USM.
2100+
///
2101+
/// \param ptr start address of imported memory range
2102+
/// \param context is the pi_context
2103+
__SYCL_EXPORT pi_result piextUSMRelease(const void *ptr, pi_context context);
2104+
20902105
///
20912106
/// Device global variable
20922107
///

sycl/include/sycl/usm.hpp

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,5 +296,23 @@ T *aligned_alloc(
296296
Kind, PropList, CodeLoc);
297297
}
298298

299+
// Device copy enhancement APIs, prepare_for and release_from USM.
300+
301+
namespace ext::oneapi::experimental {
302+
303+
__SYCL_EXPORT void prepare_for_device_copy(const void *Ptr, size_t Size,
304+
const context &Context);
305+
306+
__SYCL_EXPORT void prepare_for_device_copy(const void *Ptr, size_t Size,
307+
const queue &Queue);
308+
309+
__SYCL_EXPORT void release_from_device_copy(const void *Ptr,
310+
const context &Context);
311+
312+
__SYCL_EXPORT void release_from_device_copy(const void *Ptr,
313+
const queue &Queue);
314+
315+
} // namespace ext::oneapi::experimental
316+
299317
} // __SYCL_INLINE_VER_NAMESPACE(_V1)
300318
} // namespace sycl

sycl/plugins/cuda/pi_cuda.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,8 @@ pi_result piPluginInit(pi_plugin *PluginInit) {
168168
_PI_CL(piextUSMEnqueueMemset2D, pi2ur::piextUSMEnqueueMemset2D)
169169
_PI_CL(piextUSMEnqueueMemcpy2D, pi2ur::piextUSMEnqueueMemcpy2D)
170170
_PI_CL(piextUSMGetMemAllocInfo, pi2ur::piextUSMGetMemAllocInfo)
171+
_PI_CL(piextUSMImport, pi2ur::piextUSMImport)
172+
_PI_CL(piextUSMRelease, pi2ur::piextUSMRelease)
171173
// Device global variable
172174
_PI_CL(piextEnqueueDeviceGlobalVariableWrite,
173175
pi2ur::piextEnqueueDeviceGlobalVariableWrite)

sycl/plugins/esimd_emulator/pi_esimd_emulator.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2045,6 +2045,14 @@ pi_result piextUSMGetMemAllocInfo(pi_context, const void *, pi_mem_alloc_info,
20452045
DIE_NO_IMPLEMENTATION;
20462046
}
20472047

2048+
pi_result piextUSMImport(const void *ptr, size_t size, pi_context context) {
2049+
return PI_SUCCESS;
2050+
}
2051+
2052+
pi_result piextUSMRelease(const void *ptr, pi_context context) {
2053+
return PI_SUCCESS;
2054+
}
2055+
20482056
/// Host Pipes
20492057
pi_result piextEnqueueReadHostPipe(pi_queue, pi_program, const char *, pi_bool,
20502058
void *, size_t, pi_uint32, const pi_event *,

sycl/plugins/hip/pi_hip.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,8 @@ pi_result piPluginInit(pi_plugin *PluginInit) {
167167
_PI_CL(piextUSMEnqueueFill2D, pi2ur::piextUSMEnqueueFill2D)
168168
_PI_CL(piextUSMEnqueueMemset2D, pi2ur::piextUSMEnqueueMemset2D)
169169
_PI_CL(piextUSMGetMemAllocInfo, pi2ur::piextUSMGetMemAllocInfo)
170+
_PI_CL(piextUSMImport, pi2ur::piextUSMImport)
171+
_PI_CL(piextUSMRelease, pi2ur::piextUSMRelease)
170172
// Device global variable
171173
_PI_CL(piextEnqueueDeviceGlobalVariableWrite,
172174
pi2ur::piextEnqueueDeviceGlobalVariableWrite)

sycl/plugins/level_zero/pi_level_zero.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1065,6 +1065,14 @@ pi_result piextUSMGetMemAllocInfo(pi_context Context, const void *Ptr,
10651065
ParamValue, ParamValueSizeRet);
10661066
}
10671067

1068+
pi_result piextUSMImport(const void *HostPtr, size_t Size, pi_context Context) {
1069+
return pi2ur::piextUSMImport(HostPtr, Size, Context);
1070+
}
1071+
1072+
pi_result piextUSMRelease(const void *HostPtr, pi_context Context) {
1073+
return pi2ur::piextUSMRelease(HostPtr, Context);
1074+
}
1075+
10681076
/// API for writing data from host to a device global variable.
10691077
///
10701078
/// \param Queue is the queue

sycl/plugins/opencl/pi_opencl.cpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2041,6 +2041,14 @@ pi_result piextUSMGetMemAllocInfo(pi_context context, const void *ptr,
20412041
return RetVal;
20422042
}
20432043

2044+
pi_result piextUSMImport(const void *ptr, size_t size, pi_context context) {
2045+
return PI_SUCCESS;
2046+
}
2047+
2048+
pi_result piextUSMRelease(const void *ptr, pi_context context) {
2049+
return PI_SUCCESS;
2050+
}
2051+
20442052
/// API for writing data from host to a device global variable.
20452053
///
20462054
/// \param queue is the queue
@@ -2695,6 +2703,8 @@ pi_result piPluginInit(pi_plugin *PluginInit) {
26952703
_PI_CL(piextUSMEnqueueMemset2D, piextUSMEnqueueMemset2D)
26962704
_PI_CL(piextUSMEnqueueMemcpy2D, piextUSMEnqueueMemcpy2D)
26972705
_PI_CL(piextUSMGetMemAllocInfo, piextUSMGetMemAllocInfo)
2706+
_PI_CL(piextUSMImport, piextUSMImport)
2707+
_PI_CL(piextUSMRelease, piextUSMRelease)
26982708
// Device global variable
26992709
_PI_CL(piextEnqueueDeviceGlobalVariableWrite,
27002710
piextEnqueueDeviceGlobalVariableWrite)

sycl/plugins/unified_runtime/pi2ur.hpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3280,6 +3280,29 @@ inline pi_result piextUSMGetMemAllocInfo(pi_context Context, const void *Ptr,
32803280
return PI_SUCCESS;
32813281
}
32823282

3283+
inline pi_result piextUSMImport(const void *HostPtr, size_t Size,
3284+
pi_context Context) {
3285+
3286+
PI_ASSERT(Context, PI_ERROR_INVALID_CONTEXT);
3287+
3288+
ur_context_handle_t UrContext =
3289+
reinterpret_cast<ur_context_handle_t>(Context);
3290+
3291+
HANDLE_ERRORS(urUSMImportExp(UrContext, const_cast<void *>(HostPtr), Size));
3292+
return PI_SUCCESS;
3293+
}
3294+
3295+
inline pi_result piextUSMRelease(const void *HostPtr, pi_context Context) {
3296+
3297+
PI_ASSERT(Context, PI_ERROR_INVALID_CONTEXT);
3298+
3299+
ur_context_handle_t UrContext =
3300+
reinterpret_cast<ur_context_handle_t>(Context);
3301+
3302+
HANDLE_ERRORS(urUSMReleaseExp(UrContext, const_cast<void *>(HostPtr)));
3303+
return PI_SUCCESS;
3304+
}
3305+
32833306
inline pi_result piMemImageGetInfo(pi_mem Image, pi_image_info ParamName,
32843307
size_t ParamValueSize, void *ParamValue,
32853308
size_t *ParamValueSizeRet) {

sycl/plugins/unified_runtime/pi_unified_runtime.cpp

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,16 @@ __SYCL_EXPORT pi_result piextUSMFree(pi_context Context, void *Ptr) {
368368
return pi2ur::piextUSMFree(Context, Ptr);
369369
}
370370

371+
__SYCL_EXPORT pi_result piextUSMImport(const void *HostPtr, size_t Size,
372+
pi_context Context) {
373+
return pi2ur::piextUSMImport(HostPtr, Size, Context);
374+
}
375+
376+
__SYCL_EXPORT pi_result piextUSMRelease(const void *HostPtr,
377+
pi_context Context) {
378+
return pi2ur::piextUSMRelease(HostPtr, Context);
379+
}
380+
371381
__SYCL_EXPORT pi_result piContextRetain(pi_context Context) {
372382
return pi2ur::piContextRetain(Context);
373383
}
@@ -1360,6 +1370,9 @@ __SYCL_EXPORT pi_result piPluginInit(pi_plugin *PluginInit) {
13601370
_PI_API(piextUSMSharedAlloc)
13611371
_PI_API(piextUSMFree)
13621372

1373+
_PI_API(piextUSMImport)
1374+
_PI_API(piextUSMRelease)
1375+
13631376
_PI_API(piEnqueueKernelLaunch)
13641377
_PI_API(piEnqueueMemImageWrite)
13651378
_PI_API(piEnqueueMemImageRead)

sycl/plugins/unified_runtime/ur/adapters/cuda/usm.cpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,3 +218,18 @@ urUSMGetMemAllocInfo(ur_context_handle_t hContext, const void *pMem,
218218
}
219219
return Result;
220220
}
221+
222+
UR_APIEXPORT ur_result_t UR_APICALL urUSMImportExp(ur_context_handle_t Context,
223+
void *HostPtr, size_t Size) {
224+
UR_ASSERT(Context, UR_RESULT_ERROR_INVALID_CONTEXT);
225+
UR_ASSERT(!HostPtr, UR_RESULT_ERROR_INVALID_VALUE);
226+
UR_ASSERT(Size > 0, UR_RESULT_ERROR_INVALID_VALUE);
227+
return UR_RESULT_SUCCESS;
228+
}
229+
230+
UR_APIEXPORT ur_result_t UR_APICALL urUSMReleaseExp(ur_context_handle_t Context,
231+
void *HostPtr) {
232+
UR_ASSERT(Context, UR_RESULT_ERROR_INVALID_CONTEXT);
233+
UR_ASSERT(!HostPtr, UR_RESULT_ERROR_INVALID_VALUE);
234+
return UR_RESULT_SUCCESS;
235+
}

sycl/plugins/unified_runtime/ur/adapters/hip/usm.cpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,3 +188,18 @@ urUSMGetMemAllocInfo(ur_context_handle_t hContext, const void *pMem,
188188
}
189189
return Result;
190190
}
191+
192+
UR_APIEXPORT ur_result_t UR_APICALL urUSMImportExp(ur_context_handle_t Context,
193+
void *HostPtr, size_t Size) {
194+
UR_ASSERT(Context, UR_RESULT_ERROR_INVALID_CONTEXT);
195+
UR_ASSERT(!HostPtr, UR_RESULT_ERROR_INVALID_VALUE);
196+
UR_ASSERT(Size > 0, UR_RESULT_ERROR_INVALID_VALUE);
197+
return UR_RESULT_SUCCESS;
198+
}
199+
200+
UR_APIEXPORT ur_result_t UR_APICALL urUSMReleaseExp(ur_context_handle_t Context,
201+
void *HostPtr) {
202+
UR_ASSERT(Context, UR_RESULT_ERROR_INVALID_CONTEXT);
203+
UR_ASSERT(!HostPtr, UR_RESULT_ERROR_INVALID_VALUE);
204+
return UR_RESULT_SUCCESS;
205+
}

0 commit comments

Comments
 (0)