Skip to content

Commit 15edb1b

Browse files
[SYCL][Docs] Clarify default device_global allocation behavior (#7795)
Currently the device_global extension specifies that the same allocation used by a device_global is accessible by other kernels on the same device, however memory allocations for devices are not guaranteed to be accessible across contexts. As such, this commit clarifies that the consistency between kernels is only true for kernels running on the same device and context. --------- Signed-off-by: Larsen, Steffen <[email protected]> Co-authored-by: Greg Lueck <[email protected]>
1 parent 71d7797 commit 15edb1b

File tree

1 file changed

+50
-26
lines changed

1 file changed

+50
-26
lines changed

sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc

Lines changed: 50 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,17 @@
2424
== Introduction
2525
In OpenCL 2.0 and later, a user is able to allocate program
2626
scope memory which can be accessed like a {cpp} global variable by any kernel in
27-
an OpenCL program (`cl_program`). When a program is shared between multiple devices,
28-
each device receives its own unique instance of the program scope memory allocation.
29-
30-
This extension introduces device scoped memory allocations into SYCL that can be accessed
31-
within a kernel using syntax similar to {cpp} global variables, but that have unique
32-
instances per `sycl::device`. Mechanisms are provided for the host program to enqueue
33-
copies to or from the allocations on a specific device. Restrictions are
34-
placed on the types of data that can be stored within `device_global` allocations, particularly
35-
around copyability and constructors/destructors.
27+
an OpenCL program (`cl_program`). When a program is shared between multiple
28+
devices, each device receives its own unique instance of the program scope
29+
memory allocation.
30+
31+
This extension introduces device scoped memory allocations into SYCL that can be
32+
accessed within a kernel using syntax similar to {cpp} global variables, but
33+
that have unique instances per `sycl::device` and `sycl::context`. Mechanisms
34+
are provided for the host program to enqueue copies to or from the allocations
35+
on a specific device. Restrictions are placed on the types of data that can be
36+
stored within `device_global` allocations, particularly around copyability and
37+
constructors/destructors.
3638

3739
== Notice
3840

@@ -93,10 +95,11 @@ concepts, types, and mechanisms, and to give examples and context for their usag
9395

9496
=== Motivation
9597

96-
Device scope memory allocations can provide an efficient mechanism for communication
97-
between multiple invocations of a kernel, or between kernels executing on a device.
98-
There are additional benefits and optimization opportunities when a device compiler
99-
has visibility into the allocation size (static sizing) and uses of the allocation.
98+
Device scope memory allocations can provide an efficient mechanism for
99+
communication between multiple invocations of a kernel, or between kernels
100+
executing on the same device and context. There are additional benefits and
101+
optimization opportunities when a device compiler has visibility into the
102+
allocation size (static sizing) and uses of the allocation.
100103

101104
Syntax allowing direct use of an allocation (without passing pointers or parameters
102105
through function call boundaries) can also lead to syntax simplification in some
@@ -140,7 +143,9 @@ Q.submit([&](sycl::handler& h) {
140143
});
141144
----
142145

143-
For both `dm1` and `dm2`, the `MyClass` and `int[4]` allocations on each device are zero-initialized before any non-initialization accesses occur.
146+
For both `dm1` and `dm2`, the `MyClass` and `int[4]` allocations on each device
147+
in the context associated with `Q` are zero-initialized before any
148+
non-initialization accesses occur.
144149

145150
== Proposal
146151

@@ -163,11 +168,30 @@ that the implementation supports.
163168

164169
=== Representation of device globals
165170

166-
`device_global` provides a mechanism to allocate device scope memory - memory which has unique underlying storage (of type _T_) for each `sycl::device` object. If multiple device objects are present then each device object receives its own unique underlying allocation. All kernels that reference the same `device_global` entity (either directly or via a pointer to its underlying object of type _T_) share the same allocation of that object when those kernels run on the same device.
167-
168-
`device_global` allocations are in the global address space, as are any underlying allocations of type `T` which are implicitly allocated on each device as a result of a `device_global` object. It is undefined behavior if the host program directly accesses a `device_global` or any address obtained from a `device_global` member function, and similarly it is undefined behavior if a `device_global` or address obtained on one device from a `device_global` member function is accessed on a different device. There is no mechanism to obtain addresses of or directly access a device's `device_global` allocation within the host program.
169-
170-
A `device_global` on a given device maintains its state (address of the allocation and data within the allocation) even after the application changes the value of a specialization constant via `handler::set_specialization_constant()`. Additionally, a `device_global` maintains its state even when it is referenced from a kernel in a different `kernel_bundle`.
171+
`device_global` provides a mechanism to allocate device scope memory - memory
172+
which has unique underlying storage (of type _T_) for each `sycl::device` and
173+
`sycl::context` combination. If multiple valid device and context combinations
174+
are present then each receives its own unique underlying allocation. All kernels
175+
that reference the same `device_global` entity (either directly or via a pointer
176+
to its underlying object of type _T_) share the same allocation of that object
177+
when those kernels run on the same device and context.
178+
179+
`device_global` allocations are in the global address space, as are any
180+
underlying allocations of type `T` which are implicitly allocated on each device
181+
as a result of a `device_global` object. It is undefined behavior if the host
182+
program directly accesses a `device_global` or any address obtained from a
183+
`device_global` member function, and similarly it is undefined behavior if a
184+
`device_global` or address obtained on one device from a `device_global` member
185+
function is accessed on a different device or context. There is no mechanism to
186+
obtain addresses of or directly access a device's `device_global` allocation
187+
within the host program.
188+
189+
A `device_global` on a given device and context maintains its state (address of
190+
the allocation and data within the allocation) even after the application
191+
changes the value of a specialization constant via
192+
`handler::set_specialization_constant()`. Additionally, a `device_global`
193+
maintains its state even when it is referenced from a kernel in a different
194+
`kernel_bundle`.
171195

172196
[source,c++]
173197
----
@@ -596,13 +620,13 @@ a|
596620
This property is most useful for kernels that are submitted to an FPGA device,
597621
but it may be used with any kernel. Normally, a single instance of a device
598622
global variable is allocated for each device, and that instance is shared by
599-
all kernels that are submitted to the device, regardless of which _device
600-
image_ contains the kernel. When this property is specified, it is an
601-
assertion by the user that the device global is referenced only from kernels
602-
that are contained by the same _device image_. An implementation may be able
603-
to optimize accesses to the device global when this property is specified
604-
(especially on an FPGA device), but the user must be aware of which _device
605-
image_ contains the kernels that use the variable.
623+
all kernels that belong to the same context and are submitted to the same device,
624+
regardless of which _device image_ contains the kernel. When this property is
625+
specified, it is an assertion by the user that the device global is referenced
626+
only from kernels that are contained by the same _device image_. An
627+
implementation may be able to optimize accesses to the device global when this
628+
property is specified (especially on an FPGA device), but the user must be aware
629+
of which _device image_ contains the kernels that use the variable.
606630

607631
A device global that is decorated with this property may not be accessed from
608632
kernels that reside in different _device images_, either by direct reference

0 commit comments

Comments
 (0)