Skip to content

Commit 8efe97f

Browse files
authored
[SYCL][Doc] Rename scope-specific variables (#11166)
The semantics of work-group-local variables are aligned with C++ thread-local variables but use a different suffix. This has led to confusion, as some readers assume the "local" in work-group-local refers to the local address space After considering many alternatives, we settled on the suffix "specific" to describe this concepts: work-group-specific variables are associated with a specific work-group. In future, we expect device-global variables to be renamed to device-specific variables for consistency. --------- Signed-off-by: John Pennycook <[email protected]>
1 parent 66a741b commit 8efe97f

File tree

1 file changed

+62
-59
lines changed

1 file changed

+62
-59
lines changed

sycl/doc/extensions/proposed/sycl_ext_oneapi_work_group_local.asciidoc renamed to sycl/doc/extensions/proposed/sycl_ext_oneapi_work_group_specific.asciidoc

Lines changed: 62 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
= sycl_ext_oneapi_work_group_local
1+
= sycl_ext_oneapi_work_group_specific
22

33
:source-highlighter: coderay
44
:coderay-linenums-mode: table
@@ -58,14 +58,17 @@ not rely on APIs defined in this specification.*
5858

5959
== Overview
6060

61-
This extension defines a `sycl::ext::oneapi::experimental::work_group_local`
61+
This extension defines a `sycl::ext::oneapi::experimental::work_group_specific`
6262
class template with behavior inspired by the {cpp} `thread_local` keyword
63-
and the CUDA `+__shared__+` keyword.
63+
and the CUDA `+__shared__+` keyword. The "specific" suffix is inspired by
64+
`tbb::enumerable_thread_specific`, and has been chosen to avoid potential
65+
confusion between the concepts of "local variables" and the "local address
66+
space".
6467

65-
`work_group_local` variables can be allocated at global or function scope,
68+
`work_group_specific` variables can be allocated at global or function scope,
6669
lifting many of the restrictions in the existing
6770
link:../supported/sycl_ext_oneapi_local_memory.asciidoc[sycl_ext_oneapi_local_memory]
68-
extension. Note, however, that `work_group_local` variables currently place
71+
extension. Note, however, that `work_group_specific` variables currently place
6972
additional limits on the types that can be allocated, owing to differences in
7073
constructor behavior.
7174

@@ -76,7 +79,7 @@ constructor behavior.
7679

7780
This extension provides a feature-test macro as described in the core SYCL
7881
specification. An implementation supporting this extension must predefine the
79-
macro `SYCL_EXT_ONEAPI_WORK_GROUP_LOCAL` to one of the values defined in the
82+
macro `SYCL_EXT_ONEAPI_WORK_GROUP_SPECIFIC` to one of the values defined in the
8083
table below. Applications can test for the existence of this macro to
8184
determine if the implementation supports this feature, or applications can test
8285
the macro's value to determine which of the extension's features the
@@ -93,27 +96,27 @@ implementation supports.
9396
|===
9497

9598

96-
=== `work_group_local` class template
99+
=== `work_group_specific` class template
97100

98-
The `work_group_local` class template acts as a view of an
99-
implementation-managed pointer to work-group local memory.
101+
The `work_group_specific` class template acts as a view of an
102+
implementation-managed pointer to work-group-specific memory.
100103

101104
[source,c++]
102105
----
103106
namespace sycl::ext::oneapi::experimental {
104107
105108
template <typename T>
106-
class work_group_local {
109+
class work_group_specific {
107110
public:
108111
109-
work_group_local() = default;
110-
work_group_local(const work_group_local&) = delete;
111-
work_group_local& operator=(const work_group_local&) = delete;
112+
work_group_specific() = default;
113+
work_group_specific(const work_group_specific&) = delete;
114+
work_group_specific& operator=(const work_group_specific&) = delete;
112115
113116
operator T&() const noexcept;
114117
115118
// Available only if: std::is_array_v<T> == false
116-
const work_group_local& operator=(const T& value) const noexcept;
119+
const work_group_specific& operator=(const T& value) const noexcept;
117120
118121
T* operator&() const noexcept;
119122
@@ -127,52 +130,52 @@ private:
127130

128131
`T` must be trivially constructible and trivially destructible.
129132

130-
The storage for the object is allocated in work-group local memory before
133+
The storage for the object is allocated in work-group-specific memory before
131134
calling the user's kernel lambda, and deallocated when all work-items
132135
in the group have completed execution of the kernel.
133136

134137
SYCL implementations conforming to the full feature set treat
135-
`work_group_local` similarly to the `thread_local` keyword, and when
136-
a `work_group_local` object is declared at block scope it behaves
138+
`work_group_specific` similarly to the `thread_local` keyword, and when
139+
a `work_group_specific` object is declared at block scope it behaves
137140
as if the `static` keyword was specified implicitly. SYCL implementations
138141
conforming to the reduced feature set require the `static` keyword to be
139142
specified explicitly.
140143

141144
[NOTE]
142145
====
143-
If a `work_group_local` object is declared at function scope, the work-group
144-
local memory associated with the object will be identical for all usages of
145-
that function within the kernel. In cases where a function is called multiple
146-
times, developers must take care to avoid race conditions (e.g., by calling
147-
`group_barrier` before and after using the memory).
146+
If a `work_group_specific` object is declared at function scope, the
147+
work-group-specific memory associated with the object will be identical for all
148+
usages of that function within the kernel. In cases where a function is called
149+
multiple times, developers must take care to avoid race conditions (e.g., by
150+
calling `group_barrier` before and after using the memory).
148151
====
149152

150153
SYCL 2020 requires that all global variables accessed by a device function are
151154
`const` or `constexpr`. This extension lifts that restriction for
152-
`work_group_local` variables.
155+
`work_group_specific` variables.
153156

154157
[NOTE]
155158
====
156-
Since `work_group_local` acts as a view, wrapping an underlying pointer, a
159+
Since `work_group_specific` acts as a view, wrapping an underlying pointer, a
157160
developer may still choose to declare variables as `const`.
158161
====
159162

160163
When `T` is a class type or bounded array, the size of the allocation is known
161164
at compile-time, and a SYCL implementation may embed the size of the allocation
162-
directly within a kernel. Each instance of `work_group_local<T>` is associated
163-
with a unique allocation in work-group local memory.
165+
directly within a kernel. Each instance of `work_group_specific<T>` is associated
166+
with a unique allocation in work-group-specific memory.
164167

165168
When `T` is an unbounded array, the size of the allocation is unknown at
166169
compile-time, and must be communicated to the SYCL implementation via the
167-
`work_group_local_memory_size` property. Every instance of `work_group_local`
170+
`work_group_specific_memory_size` property. Every instance of `work_group_specific`
168171
for which `T` is an unbounded array is associated with a single, shared,
169-
allocation in work-group local memory. For example, two instances declared as
170-
`work_group_local<int[]>` and `work_group_local<float[]>` will be associated
171-
with the same shared allocation.
172+
allocation in work-group-specific memory. For example, two instances declared
173+
as `work_group_specific<int[]>` and `work_group_specific<float[]>` will be
174+
associated with the same shared allocation.
172175

173-
If the total amount of local memory requested (i.e., the sum of all memory
174-
requested by `local_accessor`, `group_local_memory`,
175-
`group_local_memory_for_overwrite` and `work_group_local`) exceeds a device's
176+
If the total amount of work-group-specific memory requested (i.e., the sum of
177+
all memory requested by `local_accessor`, `group_local_memory`,
178+
`group_local_memory_for_overwrite` and `work_group_specific`) exceeds a device's
176179
local memory capacity (as reported by `local_mem_size`) then the implementation
177180
must throw a synchronous `exception` with the `errc::memory_allocation` error
178181
code from the kernel invocation command (e.g. `parallel_for`).
@@ -181,55 +184,55 @@ code from the kernel invocation command (e.g. `parallel_for`).
181184
----
182185
operator T&() const noexcept;
183186
----
184-
_Returns_: A reference to the object stored in the work-group local memory
185-
associated with this instance of `work_group_local`.
187+
_Returns_: A reference to the object stored in the work-group-specific memory
188+
associated with this instance of `work_group_specific`.
186189

187190
[source,c++]
188191
----
189-
const work_group_local<T>& operator=(const T& value) const noexcept;
192+
const work_group_specific<T>& operator=(const T& value) const noexcept;
190193
----
191194
_Constraints_: Available only if `std::is_array_v<T>>` is false.
192195

193196
_Effects_: Replaces the value referenced by `*ptr` with `value`.
194197

195-
_Returns_: A reference to this instance of `work_group_local`.
198+
_Returns_: A reference to this instance of `work_group_specific`.
196199

197200
[source,c++]
198201
----
199202
T* operator&() const noexcept;
200203
----
201-
_Returns_: A pointer to the work-group local memory associated with this
202-
instance of `work_group_local` (i.e., `ptr`).
204+
_Returns_: A pointer to the work-group-specific memory associated with this
205+
instance of `work_group_specific` (i.e., `ptr`).
203206

204207

205208
==== Kernel properties
206209

207-
The `work_group_local_size` property must be passed to a kernel to determine
208-
the run-time size of the work-group local memory allocation associated with
209-
all `work_group_local` variables of unbounded array type.
210+
The `work_group_specific_size` property must be passed to a kernel to determine
211+
the run-time size of the work-group-specific memory allocation associated with
212+
all `work_group_specific` variables of unbounded array type.
210213

211214
[source,c++]
212215
----
213216
namespace sycl::ext::oneapi::experimental {
214217
215-
struct work_group_local_size {
216-
constexpr work_group_local_size(size_t bytes) : value(bytes) {}
218+
struct work_group_specific_size {
219+
constexpr work_group_specific_size(size_t bytes) : value(bytes) {}
217220
size_t value;
218-
}; // work_group_local_size
221+
}; // work_group_specific_size
219222
220-
using work_group_local_size_key = work_group_local_size;
223+
using work_group_specific_size_key = work_group_specific_size;
221224
222-
template <>struct is_property_key<work_group_local_size_key> : std::true_type {};
225+
template <>struct is_property_key<work_group_specific_size_key> : std::true_type {};
223226
224227
} // namespace sycl::ext::oneapi::experimental
225228
----
226229

227230
|===
228231
|Property|Description
229232

230-
|`work_group_local_size`
231-
|The `work_group_local_size` property describes the amount of dynamic
232-
work-group local memory required by the kernel in bytes.
233+
|`work_group_specific_size`
234+
|The `work_group_specific_size` property describes the amount of dynamic
235+
work-group-specific memory required by the kernel in bytes.
233236

234237
|===
235238

@@ -242,18 +245,18 @@ work-group local memory required by the kernel in bytes.
242245
----
243246
using namespace syclex = sycl::ext::oneapi::experimental;
244247
245-
/* optional: static const */ syclex::work_group_local<int> program_scope_scalar;
246-
/* optional: static const */ syclex::work_group_local<int[16]> program_scope_array;
248+
/* optional: static const */ syclex::work_group_specific<int> program_scope_scalar;
249+
/* optional: static const */ syclex::work_group_specific<int[16]> program_scope_array;
247250
248251
void foo() {
249-
/* optional: static const */ syclex::work_group_local<int> function_scope_scalar;
252+
/* optional: static const */ syclex::work_group_specific<int> function_scope_scalar;
250253
function_scope_scalar = 1; // assignment via overloaded = operator
251254
function_scope_scalar += 2; // += operator via implicit conversion to int&
252255
int* ptr = &function_scope_scalar; // conversion to pointer via overloaded & operator
253256
}
254257
255258
void bar() {
256-
/* optional: static const */ sylex::work_group_local<int[64]> function_scope_array;
259+
/* optional: static const */ sylex::work_group_specific<int[64]> function_scope_array;
257260
function_scope_array[0] = 1; // [] operator via implicit conversion to int(&)[64]
258261
int* ptr = function_scope_array; // conversion to pointer via implicit conversion to int(&)[64]
259262
}
@@ -265,12 +268,12 @@ void bar() {
265268
----
266269
using namespace syclex = sycl::ext::oneapi::experimental;
267270
268-
/* optional: static const */ syclex::work_group_local<int[]> dynamic_program_scope_array;
271+
/* optional: static const */ syclex::work_group_specific<int[]> dynamic_program_scope_array;
269272
270273
...
271274
272275
q.parallel_for(sycl::nd_range<1>{N, M},
273-
syclex::properties{syclex::work_group_local_size(M * sizeof(int))},
276+
syclex::properties{syclex::work_group_specific_size(M * sizeof(int))},
274277
[=](sycl::nd_item<1> it) {
275278
...
276279
});
@@ -297,16 +300,16 @@ the existing `__sycl_allocateLocalMemory` intrinsic:
297300
Note, however, that implementing the correct semantics may require some
298301
adjustment to the handling of this intrinsic. A simple class as written above
299302
would create a separate allocation for every call to an inlined function.
300-
Creating work-group local allocations should be handled before inlining to
303+
Creating work-group-specific allocations should be handled before inlining to
301304
prevent this.
302305

303306
For unbounded arrays, a separate specialization of the class will be required,
304307
and the implementation may need to generate some additional code to
305-
appropriately initialize the pointer(s) wrapped by `work_group_local` objects.
308+
appropriately initialize the pointer(s) wrapped by `work_group_specific` objects.
306309
Alternatively, it may be possible to initialize the pointer to the beginning
307310
of the device's local memory region (if that value is known). Either way, the
308311
implementation must account for the existence of one or more `local_accessor`
309-
objects (which themselves may allocate a dynamic amount of work-group local
312+
objects (which themselves may allocate a dynamic amount of work-group-specific
310313
memory).
311314

312315

0 commit comments

Comments
 (0)