Skip to content

Commit 4b5308a

Browse files
authored
[SYCL][Doc] Add draft LocalMemory extension (#2338)
Enables the declaration of local memory objects at kernel function scope. Signed-off-by: John Pennycook <[email protected]>
1 parent ec2ec99 commit 4b5308a

File tree

3 files changed

+232
-0
lines changed

3 files changed

+232
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SYCL_INTEL_local_memory
2+
3+
A free function enabling the declaration of local memory objects at kernel function scope.
Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
= SYCL_INTEL_local_memory
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
13+
:blank: pass:[ +]
14+
15+
// Set the default source code type in this document to C++,
16+
// for syntax highlighting purposes. This is needed because
17+
// docbook uses c++ and html5 uses cpp.
18+
:language: {basebackend@docbook:c++:cpp}
19+
20+
// This is necessary for asciidoc, but not for asciidoctor
21+
:cpp: C++
22+
23+
== Introduction
24+
IMPORTANT: This specification is a draft.
25+
26+
NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
27+
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
28+
used by permission by Khronos.
29+
30+
NOTE: This document is better viewed when rendered as html with asciidoctor.
31+
GitHub does not render image icons.
32+
33+
This document describes an extension enabling the declaration of local memory
34+
objects at the kernel functor scope.
35+
36+
== Name Strings
37+
38+
+SYCL_INTEL_local_memory+
39+
40+
== Notice
41+
42+
Copyright (c) 2020 Intel Corporation. All rights reserved.
43+
44+
== Status
45+
46+
Working Draft
47+
48+
This is a preview extension specification, intended to provide early access to
49+
a feature for review and community feedback. When the feature matures, this
50+
specification may be released as a formal extension.
51+
52+
Because the interfaces defined by this specification are not final and are
53+
subject to change they are not intended to be used by shipping software
54+
products.
55+
56+
== Version
57+
58+
Built On: {docdate} +
59+
Revision: 1
60+
61+
== Contact
62+
63+
John Pennycook, Intel (john 'dot' pennycook 'at' intel 'dot' com)
64+
Roland Schulz, Intel (roland 'dot' schulz 'at' intel 'dot' com)
65+
66+
== Contributors
67+
68+
Felipe de Azevedo Piovezan, Intel
69+
Michael Kinsner, Intel
70+
71+
== Dependencies
72+
73+
This extension is written against the SYCL 1.2.1 specification, revision 6.
74+
75+
== Overview
76+
77+
OpenCL provides two ways for local memory to be used in a kernel:
78+
79+
* The kernel accepts a pointer in the `local` address space as an argument,
80+
and the host passes the size of the allocation to the OpenCL runtime when
81+
the kernel is launched.
82+
* The kernel declares `local` variables in the kernel function
83+
scope.
84+
85+
In SYCL, programmers have two choices:
86+
87+
* Local accessors created by the host, analogous to the OpenCL kernel argument
88+
mechanism.
89+
* Variables declared at the kernel functor scope, in hierarchical parallelism
90+
kernels.
91+
92+
Note that SYCL currently lags behind OpenCL when it comes to local memory
93+
allocations; in particular, work-group data parallel SYCL kernels are limited
94+
to the accessor method. This is undesirable for some architectures, where
95+
allocating local memory with a compile-time known size is required for
96+
performance.
97+
98+
This limitation is also undesirable from a usability point of view, since
99+
programmers have to declare an accessor *outside* a kernel and capture it
100+
inside the kernel functor.
101+
102+
This extension introduces a concept of group-local memory, with semantics
103+
similar to OpenCL kernel-scope `local` variables and C++ `thread_local`
104+
variables.
105+
106+
== Modifications of SYCL 1.2.1 Specification
107+
108+
=== Modify sentence in Section 3.5.2.1 Access to memory
109+
110+
==== From:
111+
112+
To allocate local memory within a kernel, the user can either pass a
113+
`cl::sycl::local_accessor` object to the kernel as a parameter, or can define a
114+
variable in work-group scope inside `cl::sycl::parallel_for_work_group`.
115+
116+
==== To:
117+
118+
To allocate local memory within a kernel, the user can:
119+
120+
* Pass a `cl::sycl::local_accessor` object to the kernel as a parameter.
121+
* Define a variable in work-group scope inside `cl::sycl::parallel_for_work_group`.
122+
* Define a group-local variable at the kernel functor scope of a work-group
123+
data parallel kernel using the `group_local_memory` or
124+
`group_local_memory_for_overwrite` functions.
125+
126+
[_Note_ - The restriction that group-local variables must be defined at kernel
127+
functor scope may be lifted in a future version of this extension.]
128+
129+
==== Extend Section 4.8.5.2
130+
131+
==== Include paragraphs:
132+
133+
The `nd_range` variant of `parallel_for` also enables the declaration of
134+
group-local variables; those variables are allocated in the an address space
135+
accessible by all work-items in the group and are shared by all work-items of a
136+
work-group.
137+
138+
[source,c++]
139+
----
140+
myQueue.submit([&](handler &cgh) {
141+
cgh.parallel_for<class example_kernel>(
142+
nd_range<1>(range<1>(128), range<1>(32)), [=](nd_item<1> item) {
143+
multi_ptr<int[64], access::address_space::local_space> ptr = group_local_memory<int[64]>(item.get_group());
144+
auto& ref = *ptr;
145+
ref[2 * item.get_local_linear_id()] = 42;
146+
});
147+
});
148+
----
149+
150+
The example above creates a kernel with four work-groups, each containing 32
151+
work-items. An `int[64]` object is defined as a group-local variable, and
152+
each work-item in the work-group obtains a `multi_ptr` to the same variable.
153+
154+
There are two interfaces for defining group-local variables:
155+
156+
[source,c++]
157+
----
158+
namespace sycl {
159+
160+
template <typename T, typename Group, typename... Args>
161+
multi_ptr<T, Group::address_space> group_local_memory(Group g, Args&&... args);
162+
163+
template <typename T, typename Group>
164+
multi_ptr<T, Group::address_space> group_local_memory_for_overwrite(Group g);
165+
166+
} // namespace sycl
167+
----
168+
169+
==== Add table: Functions for defining group-local variables
170+
171+
[frame="topbot",options="header,footer"]
172+
|======================
173+
|Functions |Description
174+
175+
|`template <typename T, typename Group, typename ... Args>
176+
multi_ptr<T, Group::address_space> group_local_memory(Group g, Args&&... args)` |
177+
Constructs an object of type `T` in an address space accessible by all
178+
work-items in group _g_, forwarding _args_ to the constructor's parameter list.
179+
The constructor is called once per group, upon or before the first call to
180+
`group_local_memory`. The storage for the object is allocated upon or before
181+
the first call to `group_local_memory`, and deallocated when all work-items in
182+
the group have completed execution of the kernel.
183+
184+
All arguments in _args_ must be the same for all work-items in the group.
185+
186+
`Group` must be `sycl::group`, and `T` must be trivially destructible.
187+
[_Note_ - These restrictions may be lifted in a future version of this
188+
extension.]
189+
190+
|`template <typename T, typename Group>
191+
multi_ptr<T, Group::address_space> group_local_memory_for_overwrite(Group g)` |
192+
Constructs an object of type `T` in an address space accessible by all
193+
work-items in group _g_, using default initialization. The object is
194+
initialized pon or before the first call to `group_local_memory`. The storage
195+
for the object is allocated upon or before the first call to
196+
`group_local_memory`, and deallocated when all work-items in the group have
197+
completed execution of the kernel.
198+
199+
All arguments in _args_ must be the same for all work-items in the group.
200+
201+
`Group` must be `sycl::group`, and `T` must be trivially destructible.
202+
[_Note_ - These restrictions may be lifted in a future version of this
203+
extension.]
204+
205+
|======================
206+
207+
== Issues
208+
209+
None.
210+
211+
== Revision History
212+
213+
[cols="5,15,15,70"]
214+
[grid="rows"]
215+
[options="header"]
216+
|========================================
217+
|Rev|Date|Author|Changes
218+
|1|2020-08-18|John Pennycook|*Initial public working draft*
219+
|========================================
220+
221+
//************************************************************************
222+
//Other formatting suggestions:
223+
//
224+
//* Use *bold* text for host APIs, or [source] syntax highlighting.
225+
//* Use +mono+ text for device APIs, or [source] syntax highlighting.
226+
//* Use +mono+ text for extension names, types, or enum values.
227+
//* Use _italics_ for parameters.
228+
//************************************************************************

sycl/doc/extensions/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ DPC++ extensions status:
2727
| [SYCL_INTEL_reqd_work_group_size](ReqdWorkGroupSize/SYCL_INTEL_reqd_work_group_size.asciidoc) | Supported(OpenCL: CPU, GPU) | |
2828
| [SPV_INTEL_function_pointers](SPIRV/SPV_INTEL_function_pointers.asciidoc) | Supported(OpenCL: CPU, GPU; HOST) | |
2929
| [SPV_INTEL_inline_assembly](SPIRV/SPV_INTEL_inline_assembly.asciidoc) | Supported(OpenCL: GPU) | |
30+
| [SYCL_INTEL_local_memory](LocalMemory/SYCL_INTEL_local_memory.asciidoc) | Proposal | |
3031
| [SYCL_INTEL_static_local_memory_query](StaticLocalMemoryQuery/SYCL_INTEL_static_local_memory_query.asciidoc) | Proposal | |
3132
| [SYCL_INTEL_sub_group_algorithms](SubGroupAlgorithms/SYCL_INTEL_sub_group_algorithms.asciidoc) | Partially supported(OpenCL: CPU, GPU) | Features from SYCL_INTEL_group_algorithms extended to sub-groups |
3233
| [Sub-groups for NDRange Parallelism](SubGroupNDRange/SubGroupNDRange.md) | Deprecated(OpenCL: CPU, GPU) | |

0 commit comments

Comments
 (0)