1
- = sycl_ext_oneapi_work_group_local
1
+ = sycl_ext_oneapi_work_group_specific
2
2
3
3
:source-highlighter: coderay
4
4
:coderay-linenums-mode: table
@@ -58,14 +58,17 @@ not rely on APIs defined in this specification.*
58
58
59
59
== Overview
60
60
61
- This extension defines a `sycl::ext::oneapi::experimental::work_group_local `
61
+ This extension defines a `sycl::ext::oneapi::experimental::work_group_specific `
62
62
class template with behavior inspired by the {cpp} `thread_local` keyword
63
- and the CUDA `+__shared__+` keyword.
63
+ and the CUDA `+__shared__+` keyword. The "specific" suffix is inspired by
64
+ `tbb::enumerable_thread_specific`, and has been chosen to avoid potential
65
+ confusion between the concepts of "local variables" and the "local address
66
+ space".
64
67
65
- `work_group_local ` variables can be allocated at global or function scope,
68
+ `work_group_specific ` variables can be allocated at global or function scope,
66
69
lifting many of the restrictions in the existing
67
70
link:../supported/sycl_ext_oneapi_local_memory.asciidoc[sycl_ext_oneapi_local_memory]
68
- extension. Note, however, that `work_group_local ` variables currently place
71
+ extension. Note, however, that `work_group_specific ` variables currently place
69
72
additional limits on the types that can be allocated, owing to differences in
70
73
constructor behavior.
71
74
@@ -76,7 +79,7 @@ constructor behavior.
76
79
77
80
This extension provides a feature-test macro as described in the core SYCL
78
81
specification. An implementation supporting this extension must predefine the
79
- macro `SYCL_EXT_ONEAPI_WORK_GROUP_LOCAL ` to one of the values defined in the
82
+ macro `SYCL_EXT_ONEAPI_WORK_GROUP_SPECIFIC ` to one of the values defined in the
80
83
table below. Applications can test for the existence of this macro to
81
84
determine if the implementation supports this feature, or applications can test
82
85
the macro's value to determine which of the extension's features the
@@ -93,27 +96,27 @@ implementation supports.
93
96
|===
94
97
95
98
96
- === `work_group_local ` class template
99
+ === `work_group_specific ` class template
97
100
98
- The `work_group_local ` class template acts as a view of an
99
- implementation-managed pointer to work-group local memory.
101
+ The `work_group_specific ` class template acts as a view of an
102
+ implementation-managed pointer to work-group-specific memory.
100
103
101
104
[source,c++]
102
105
----
103
106
namespace sycl::ext::oneapi::experimental {
104
107
105
108
template <typename T>
106
- class work_group_local {
109
+ class work_group_specific {
107
110
public:
108
111
109
- work_group_local () = default;
110
- work_group_local (const work_group_local &) = delete;
111
- work_group_local & operator=(const work_group_local &) = delete;
112
+ work_group_specific () = default;
113
+ work_group_specific (const work_group_specific &) = delete;
114
+ work_group_specific & operator=(const work_group_specific &) = delete;
112
115
113
116
operator T&() const noexcept;
114
117
115
118
// Available only if: std::is_array_v<T> == false
116
- const work_group_local & operator=(const T& value) const noexcept;
119
+ const work_group_specific & operator=(const T& value) const noexcept;
117
120
118
121
T* operator&() const noexcept;
119
122
@@ -127,52 +130,52 @@ private:
127
130
128
131
`T` must be trivially constructible and trivially destructible.
129
132
130
- The storage for the object is allocated in work-group local memory before
133
+ The storage for the object is allocated in work-group-specific memory before
131
134
calling the user's kernel lambda, and deallocated when all work-items
132
135
in the group have completed execution of the kernel.
133
136
134
137
SYCL implementations conforming to the full feature set treat
135
- `work_group_local ` similarly to the `thread_local` keyword, and when
136
- a `work_group_local ` object is declared at block scope it behaves
138
+ `work_group_specific ` similarly to the `thread_local` keyword, and when
139
+ a `work_group_specific ` object is declared at block scope it behaves
137
140
as if the `static` keyword was specified implicitly. SYCL implementations
138
141
conforming to the reduced feature set require the `static` keyword to be
139
142
specified explicitly.
140
143
141
144
[NOTE]
142
145
====
143
- If a `work_group_local ` object is declared at function scope, the work-group
144
- local memory associated with the object will be identical for all usages of
145
- that function within the kernel. In cases where a function is called multiple
146
- times, developers must take care to avoid race conditions (e.g., by calling
147
- `group_barrier` before and after using the memory).
146
+ If a `work_group_specific ` object is declared at function scope, the
147
+ work-group-specific memory associated with the object will be identical for all
148
+ usages of that function within the kernel. In cases where a function is called
149
+ multiple times, developers must take care to avoid race conditions (e.g., by
150
+ calling `group_barrier` before and after using the memory).
148
151
====
149
152
150
153
SYCL 2020 requires that all global variables accessed by a device function are
151
154
`const` or `constexpr`. This extension lifts that restriction for
152
- `work_group_local ` variables.
155
+ `work_group_specific ` variables.
153
156
154
157
[NOTE]
155
158
====
156
- Since `work_group_local ` acts as a view, wrapping an underlying pointer, a
159
+ Since `work_group_specific ` acts as a view, wrapping an underlying pointer, a
157
160
developer may still choose to declare variables as `const`.
158
161
====
159
162
160
163
When `T` is a class type or bounded array, the size of the allocation is known
161
164
at compile-time, and a SYCL implementation may embed the size of the allocation
162
- directly within a kernel. Each instance of `work_group_local <T>` is associated
163
- with a unique allocation in work-group local memory.
165
+ directly within a kernel. Each instance of `work_group_specific <T>` is associated
166
+ with a unique allocation in work-group-specific memory.
164
167
165
168
When `T` is an unbounded array, the size of the allocation is unknown at
166
169
compile-time, and must be communicated to the SYCL implementation via the
167
- `work_group_local_memory_size ` property. Every instance of `work_group_local `
170
+ `work_group_specific_memory_size ` property. Every instance of `work_group_specific `
168
171
for which `T` is an unbounded array is associated with a single, shared,
169
- allocation in work-group local memory. For example, two instances declared as
170
- `work_group_local <int[]>` and `work_group_local <float[]>` will be associated
171
- with the same shared allocation.
172
+ allocation in work-group-specific memory. For example, two instances declared
173
+ as `work_group_specific <int[]>` and `work_group_specific <float[]>` will be
174
+ associated with the same shared allocation.
172
175
173
- If the total amount of local memory requested (i.e., the sum of all memory
174
- requested by `local_accessor`, `group_local_memory`,
175
- `group_local_memory_for_overwrite` and `work_group_local `) exceeds a device's
176
+ If the total amount of work-group-specific memory requested (i.e., the sum of
177
+ all memory requested by `local_accessor`, `group_local_memory`,
178
+ `group_local_memory_for_overwrite` and `work_group_specific `) exceeds a device's
176
179
local memory capacity (as reported by `local_mem_size`) then the implementation
177
180
must throw a synchronous `exception` with the `errc::memory_allocation` error
178
181
code from the kernel invocation command (e.g. `parallel_for`).
@@ -181,55 +184,55 @@ code from the kernel invocation command (e.g. `parallel_for`).
181
184
----
182
185
operator T&() const noexcept;
183
186
----
184
- _Returns_: A reference to the object stored in the work-group local memory
185
- associated with this instance of `work_group_local `.
187
+ _Returns_: A reference to the object stored in the work-group-specific memory
188
+ associated with this instance of `work_group_specific `.
186
189
187
190
[source,c++]
188
191
----
189
- const work_group_local <T>& operator=(const T& value) const noexcept;
192
+ const work_group_specific <T>& operator=(const T& value) const noexcept;
190
193
----
191
194
_Constraints_: Available only if `std::is_array_v<T>>` is false.
192
195
193
196
_Effects_: Replaces the value referenced by `*ptr` with `value`.
194
197
195
- _Returns_: A reference to this instance of `work_group_local `.
198
+ _Returns_: A reference to this instance of `work_group_specific `.
196
199
197
200
[source,c++]
198
201
----
199
202
T* operator&() const noexcept;
200
203
----
201
- _Returns_: A pointer to the work-group local memory associated with this
202
- instance of `work_group_local ` (i.e., `ptr`).
204
+ _Returns_: A pointer to the work-group-specific memory associated with this
205
+ instance of `work_group_specific ` (i.e., `ptr`).
203
206
204
207
205
208
==== Kernel properties
206
209
207
- The `work_group_local_size ` property must be passed to a kernel to determine
208
- the run-time size of the work-group local memory allocation associated with
209
- all `work_group_local ` variables of unbounded array type.
210
+ The `work_group_specific_size ` property must be passed to a kernel to determine
211
+ the run-time size of the work-group-specific memory allocation associated with
212
+ all `work_group_specific ` variables of unbounded array type.
210
213
211
214
[source,c++]
212
215
----
213
216
namespace sycl::ext::oneapi::experimental {
214
217
215
- struct work_group_local_size {
216
- constexpr work_group_local_size (size_t bytes) : value(bytes) {}
218
+ struct work_group_specific_size {
219
+ constexpr work_group_specific_size (size_t bytes) : value(bytes) {}
217
220
size_t value;
218
- }; // work_group_local_size
221
+ }; // work_group_specific_size
219
222
220
- using work_group_local_size_key = work_group_local_size ;
223
+ using work_group_specific_size_key = work_group_specific_size ;
221
224
222
- template <>struct is_property_key<work_group_local_size_key > : std::true_type {};
225
+ template <>struct is_property_key<work_group_specific_size_key > : std::true_type {};
223
226
224
227
} // namespace sycl::ext::oneapi::experimental
225
228
----
226
229
227
230
|===
228
231
|Property|Description
229
232
230
- |`work_group_local_size `
231
- |The `work_group_local_size ` property describes the amount of dynamic
232
- work-group local memory required by the kernel in bytes.
233
+ |`work_group_specific_size `
234
+ |The `work_group_specific_size ` property describes the amount of dynamic
235
+ work-group-specific memory required by the kernel in bytes.
233
236
234
237
|===
235
238
@@ -242,18 +245,18 @@ work-group local memory required by the kernel in bytes.
242
245
----
243
246
using namespace syclex = sycl::ext::oneapi::experimental;
244
247
245
- /* optional: static const */ syclex::work_group_local <int> program_scope_scalar;
246
- /* optional: static const */ syclex::work_group_local <int[16]> program_scope_array;
248
+ /* optional: static const */ syclex::work_group_specific <int> program_scope_scalar;
249
+ /* optional: static const */ syclex::work_group_specific <int[16]> program_scope_array;
247
250
248
251
void foo() {
249
- /* optional: static const */ syclex::work_group_local <int> function_scope_scalar;
252
+ /* optional: static const */ syclex::work_group_specific <int> function_scope_scalar;
250
253
function_scope_scalar = 1; // assignment via overloaded = operator
251
254
function_scope_scalar += 2; // += operator via implicit conversion to int&
252
255
int* ptr = &function_scope_scalar; // conversion to pointer via overloaded & operator
253
256
}
254
257
255
258
void bar() {
256
- /* optional: static const */ sylex::work_group_local <int[64]> function_scope_array;
259
+ /* optional: static const */ sylex::work_group_specific <int[64]> function_scope_array;
257
260
function_scope_array[0] = 1; // [] operator via implicit conversion to int(&)[64]
258
261
int* ptr = function_scope_array; // conversion to pointer via implicit conversion to int(&)[64]
259
262
}
@@ -265,12 +268,12 @@ void bar() {
265
268
----
266
269
using namespace syclex = sycl::ext::oneapi::experimental;
267
270
268
- /* optional: static const */ syclex::work_group_local <int[]> dynamic_program_scope_array;
271
+ /* optional: static const */ syclex::work_group_specific <int[]> dynamic_program_scope_array;
269
272
270
273
...
271
274
272
275
q.parallel_for(sycl::nd_range<1>{N, M},
273
- syclex::properties{syclex::work_group_local_size (M * sizeof(int))},
276
+ syclex::properties{syclex::work_group_specific_size (M * sizeof(int))},
274
277
[=](sycl::nd_item<1> it) {
275
278
...
276
279
});
@@ -297,16 +300,16 @@ the existing `__sycl_allocateLocalMemory` intrinsic:
297
300
Note, however, that implementing the correct semantics may require some
298
301
adjustment to the handling of this intrinsic. A simple class as written above
299
302
would create a separate allocation for every call to an inlined function.
300
- Creating work-group local allocations should be handled before inlining to
303
+ Creating work-group-specific allocations should be handled before inlining to
301
304
prevent this.
302
305
303
306
For unbounded arrays, a separate specialization of the class will be required,
304
307
and the implementation may need to generate some additional code to
305
- appropriately initialize the pointer(s) wrapped by `work_group_local ` objects.
308
+ appropriately initialize the pointer(s) wrapped by `work_group_specific ` objects.
306
309
Alternatively, it may be possible to initialize the pointer to the beginning
307
310
of the device's local memory region (if that value is known). Either way, the
308
311
implementation must account for the existence of one or more `local_accessor`
309
- objects (which themselves may allocate a dynamic amount of work-group local
312
+ objects (which themselves may allocate a dynamic amount of work-group-specific
310
313
memory).
311
314
312
315
0 commit comments