Skip to content

Commit 9cab559

Browse files
authored
[SYCL][DOC] Add sycl_ext_oneapi_address_cast (#9812)
Splits sycl::address_space_cast into static and dynamic casts, allowing developers to avoid the overhead of address space checks at runtime. --------- Signed-off-by: John Pennycook <[email protected]>
1 parent 23fbba1 commit 9cab559

File tree

1 file changed

+204
-0
lines changed

1 file changed

+204
-0
lines changed
Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
= sycl_ext_oneapi_address_cast
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
14+
// Set the default source code type in this document to C++,
15+
// for syntax highlighting purposes. This is needed because
16+
// docbook uses c++ and html5 uses cpp.
17+
:language: {basebackend@docbook:c++:cpp}
18+
19+
20+
== Notice
21+
22+
[%hardbreaks]
23+
Copyright (C) 2023-2023 Intel Corporation. All rights reserved.
24+
25+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
26+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
27+
permission by Khronos.
28+
29+
30+
== Contact
31+
32+
To report problems with this extension, please open a new issue at:
33+
34+
https://github.com/intel/llvm/issues
35+
36+
37+
== Dependencies
38+
39+
This extension is written against the SYCL 2020 revision 7 specification. All
40+
references below to the "core SYCL specification" or to section numbers in the
41+
SYCL specification refer to that revision.
42+
43+
44+
== Status
45+
46+
This is a proposed extension specification, intended to gather community
47+
feedback. Interfaces defined in this specification may not be implemented yet
48+
or may be in a preliminary state. The specification itself may also change in
49+
incompatible ways before it is finalized. *Shipping software products should
50+
not rely on APIs defined in this specification.*
51+
52+
53+
== Overview
54+
55+
The `sycl::address_space_cast` function in SYCL 2020 does two things: 1) checks
56+
whether a given raw pointer can be cast to a specific address space; and 2)
57+
performs the casting operation. In cases where the developer is attempting to
58+
assert that a raw pointer points to an object in a specific address space, the
59+
checks from 1) are not required and may have undesirable performance impact.
60+
61+
This extension separates `sycl::address_space_cast` into two functions:
62+
63+
- `static_address_cast`, which casts with no run-time checks.
64+
- `dynamic_address_cast`, which casts with run-time checks.
65+
66+
67+
== Specification
68+
69+
=== Feature test macro
70+
71+
This extension provides a feature-test macro as described in the core SYCL
72+
specification. An implementation supporting this extension must predefine the
73+
macro `SYCL_EXT_ONEAPI_ADDRESS_CAST` to one of the values defined in the
74+
table below. Applications can test for the existence of this macro to
75+
determine if the implementation supports this feature, or applications can test
76+
the macro's value to determine which of the extension's features the
77+
implementation supports.
78+
79+
[%header,cols="1,5"]
80+
|===
81+
|Value
82+
|Description
83+
84+
|1
85+
|The APIs of this experimental extension are not versioned, so the
86+
feature-test macro always has this value.
87+
|===
88+
89+
90+
=== Address space cast functions
91+
92+
[source,c++]
93+
----
94+
namespace sycl::ext::oneapi::experimental {
95+
96+
template <access::address_space Space, access::decorated DecorateAddress,
97+
typename ElementType>
98+
multi_ptr<ElementType, Space, DecorateAddress>
99+
static_address_cast(ElementType* ptr);
100+
101+
template <access::address_space Space, access::decorated DecorateAddress,
102+
typename ElementType>
103+
multi_ptr<ElementType, Space, DecorateAddress>
104+
dynamic_address_cast(ElementType* ptr);
105+
106+
} // namespace sycl::ext::oneapi::experimental
107+
----
108+
109+
[source,c++]
110+
----
111+
template <access::address_space Space, access::decorated DecorateAddress,
112+
typename ElementType>
113+
multi_ptr<ElementType, Space, DecorateAddress>
114+
static_address_cast(ElementType* ptr);
115+
----
116+
_Preconditions_: `ptr` points to an object allocated in the address space
117+
designated by `Space`.
118+
119+
_Returns_: A `multi_ptr` with the specified address space and decoration that
120+
points to the same object as `ptr`.
121+
122+
[NOTE]
123+
====
124+
Implementations may choose to issue a diagnostic if they can prove that `ptr`
125+
does not point to an object allocated in the address space designated by
126+
`Space`.
127+
====
128+
129+
130+
[source,c++]
131+
----
132+
template <access::address_space Space, access::decorated DecorateAddress,
133+
typename ElementType>
134+
multi_ptr<ElementType, Space, DecorateAddress>
135+
dynamic_address_cast(ElementType* ptr);
136+
----
137+
_Preconditions_: The memory at `ptr` is accessible to the calling work-item.
138+
139+
_Returns_: A `multi_ptr` with the specified address space and decoration that
140+
points to the same object as `ptr` if `ptr` points to an object allocated in
141+
the address space designated by `Space`, and `nullptr` otherwise.
142+
143+
[NOTE]
144+
====
145+
The precondition prevents `dynamic_address_cast` from being used to
146+
reason about the address space of pointers originating from another work-item
147+
(in the case of `private` pointers) or another work-group (in the case of
148+
`local` pointers). Such pointers could not be dereferenced by the calling
149+
work-item, and it is thus unclear that being able to reason about the address
150+
space would be useful. Limiting the use of `dynamic_address_cast` to
151+
accessible pointers is expected to result in simpler and faster
152+
implementations.
153+
====
154+
155+
156+
== Implementation notes
157+
158+
For SPIR-V backends, `static_address_cast` corresponds to
159+
`OpGenericCastToPtr`. `dynamic_address_cast` _may_ correspond to
160+
`OpGenericCastToPtrExplicit` -- there is currently some ambiguity regarding
161+
exactly how `OpGenericCastToPtrExplicit` is expected to behave, because the
162+
SPIR-V specification does not explain what it means for a cast to "fail".
163+
Since this extension is only experimental, we can likely implement
164+
`dynamic_address_cast` using `OpGenericCastToPtrExplicit` while we
165+
seek to clarify the SPIR-V specification.
166+
167+
Generally speaking, it is expected that a `static_address_cast` can
168+
simply attach new decoration(s) to the raw pointer (or do nothing), while
169+
a `dynamic_address_cast` will have to inspect the address of the
170+
raw pointer to determine which region of memory it points to.
171+
172+
An implementation for a CPU target could be implemented by keeping track of
173+
three pieces of information in thread-local storage:
174+
175+
- The base (highest address) of the calling thread's stack.
176+
- The low bound of the calling work-item's local memory area.
177+
- The high bound of the calling work-item's local memory area.
178+
179+
A cast to `private_space` succeeds as long as the pointer is within the calling
180+
thread's stack. A cast to `local_space` succeeds as long as the pointer is
181+
within the calling work-item's local memory area. A cast to `global_space`
182+
succeeds as long as the pointer is not within either of the above two address
183+
ranges.
184+
185+
Implementations for GPU targets may be able to leverage dedicated instructions
186+
for checking the address space.
187+
188+
189+
== Issues
190+
191+
. Some developers may expect a `dynamic_address_cast` to succeed if the
192+
pointer continues to work, irrespective of where the object the pointer points
193+
to was allocated. For example, some CPU implementations may treat global and
194+
local pointers equivalently in many situations.
195+
+
196+
--
197+
*UNRESOLVED*:
198+
The current description of `dynamic_address_cast` requires
199+
implementations to track precisely which address space a pointer is associated
200+
with, in order to ensure that using the result of a dynamic cast is always
201+
safe. If we can identify use-cases for the more relaxed behavior, it would
202+
make sense to introduce either a third type of cast or some global check that
203+
two address spaces use the same representation and are thus "compatible".
204+
--

0 commit comments

Comments
 (0)