1
1
.. _basic_concepts :
2
2
3
- Basic Concepts
4
- ==============
3
+ Heterogeneous Systems and Programming Concepts
4
+ ==============================================
5
5
6
- This section introduces the basic concepts for XPU management used by :py:mod: `dpctl `.
6
+ This section introduces the basic concepts defined by SYCL standard
7
+ for programming heterogeneous system, and used by :py:mod: `dpctl `.
7
8
8
9
.. note ::
9
10
For SYCL-level details, refer to a more topical SYCL reference,
10
11
such as the :sycl_spec_2020: `SYCL 2020 spec <> `.
11
12
13
+ Definitions
14
+ -----------
15
+
12
16
* **Heterogeneous computing **
13
- Refers to using multiple devices in a program.
17
+ Refers to computing on multiple devices in a program.
14
18
15
19
* **Host **
16
- Every program starts by running on a host, and most of the lines of code in
17
- a program, in particular lines of code implementing the Python interpreter
18
- itself, are usually for the host. Hosts are customarily CPUs.
20
+ Every program starts by running on a host, and most of the lines of code in
21
+ a program, in particular lines of code implementing the Python interpreter
22
+ itself, are usually for the host. Hosts are customarily CPUs.
19
23
20
24
* **Device **
21
- A device is an XPU connected to a host that is programmable with a specific
22
- device driver. Different types of devices can have different architectures
23
- (CPUs, GPUs, FPGA, ASICs, DSP) but are programmable using the same
24
- :oneapi: `oneAPI <> ` programming model.
25
+ A device is a processing unit connected to a host that is programmable
26
+ with a specific device driver. Different types of devices can have
27
+ different architectures (CPUs, GPUs, FPGA, ASICs, DSP) but are programmable
28
+ using the same :oneapi: `oneAPI <> ` programming model.
25
29
26
30
* **Platform **
27
- A device driver installed on the system is called the platform. As multiple
28
- devices of the same type can share the same device driver, a platform may
29
- contain multiple devices. The same physical hardware (for example, GPU)
30
- may be reflected as two separate devices if they can be programmed by more
31
- than one platform. For example, the same GPU hardware can be listed as an
32
- OpenCL* GPU device and a Level-Zero* GPU device.
31
+ Platform is an abstraction to represent a collection of devices addressable
32
+ by the same lower-level framework. As multiple
33
+ devices of the same type can programmed by the same framework, a platform may
34
+ contain multiple devices. The same physical hardware (for example, GPU)
35
+ may be programmable by different lower-level frameworks, and hence be enumerated
36
+ as part of different platforms. For example, the same GPU hardware can be listed
37
+ as an OpenCL* GPU device and a Level-Zero* GPU device.
33
38
34
39
* **Context **
35
40
Holds the runtime information needed to operate on a device or a
@@ -50,7 +55,7 @@ This section introduces the basic concepts for XPU management used by :py:mod:`d
50
55
for collection of such information. Events can be used to specify task
51
56
dependencies as well as to synchronize host and devices.
52
57
53
- * **USM **
58
+ * **Unified Shared Memory **
54
59
Unified Shared Memory (USM) refers to pointer-based device memory management.
55
60
USM allocations are bound to context. It means, a pointer representing
56
61
USM allocation can be unambiguously mapped to the data it represents only
@@ -73,5 +78,167 @@ Runtime manages synchronization of the host's and device's view into shared allo
73
78
The initial placement of the shared allocations is not defined.
74
79
75
80
* **Backend **
76
- Refers to the implementation of :oneapi: `oneAPI <> ` programming model exposed
77
- by the underlying runtime.
81
+ Refers to the implementation of :oneapi: `oneAPI <> ` programming model using a
82
+ lower-level heterogeneous programming API. Amongst examples of backends are
83
+ "cuda", "hip", "level_zero", "opencl". In particular backend implements a
84
+ platform abstraction.
85
+
86
+
87
+ Platform
88
+ --------
89
+
90
+ A platform abstracts one or more SYCL devices that are connected to
91
+ a host and can be programmed by the same underlying framework.
92
+
93
+ The :class: `dpctl.SyclPlatform ` class represents a platform and
94
+ abstracts the :sycl_platform: `sycl::platform <> ` SYCL runtime class.
95
+
96
+ To obtain all platforms available on a system programmatically, use
97
+ :func: `dpctl.lsplatform ` function. Refer to :ref: `Enumerating available devices <beginners_guide_enumerating_devices >`
98
+ for more information.
99
+
100
+ It is possible to select devices from spefic backend, and hence belonging to
101
+ the same platform, by :ref: `using <beginners_guide_oneapi_device_selector >`
102
+ ``ONEAPI_DEVICE_SELECTOR `` environment variable, or by using
103
+ a :ref: `filter selector string <filter_selector_string >`.
104
+
105
+
106
+ Context
107
+ -------
108
+
109
+ A context is an entity that is associated with the state of device as managed by the
110
+ backend. The context is required to map unified address space pointer to the device
111
+ where it was allocated unambiguously.
112
+
113
+ In order for two DPC++-based Python extensions to share USM allocations, e.g.
114
+ as part of :ref: `DLPack exchange <dpctl_tensor_dlpack_support >`, they each must use
115
+ the `same ` SYCL context when submitting for execution programs that would access this
116
+ allocation.
117
+
118
+ Since ``sycl::context `` is dynamically constructed by each extension sharing a USM allocation,
119
+ in general, requires sharing the ``sycl::context `` along with the USM pointer, as it is done
120
+ in ``__sycl_usm_array_interface__ `` :ref: `attribute <suai_attribute >`.
121
+
122
+ Since DLPack itself does not provide for storing of the ``sycl::context ``, the proper
123
+ working of :func: `dpctl.tensor.from_dlpack ` function is only supported for devices of those
124
+ platforms that support default platform context SYCL extension `sycl_ext_oneapi_default_platform_context `_,
125
+ and only of those allocations that are bound to this default context.
126
+
127
+ To query where a particular device ``dev `` belongs to a plaform that implements
128
+ the default context, check whether ``dev.sycl_platform.default_context `` returns an instance
129
+ of :class: `dpctl.SyclContext ` or raises an exception.
130
+
131
+
132
+ .. _sycl_ext_oneapi_default_platform_context : https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_default_context.asciidoc
133
+
134
+
135
+ .. _user_guide_queues :
136
+
137
+ Queue
138
+ -----
139
+
140
+ SYCL queue is an entity associated with scheduling computational tasks for execution
141
+ on a targeted SYCL device and using some specific SYCL context.
142
+
143
+ Queue constructor generally requires both to be specified. For platforms that support the
144
+ default platform context, a shortcut queue constuctor call that specifies only a device would
145
+ use the default platform context associated with the platform given device is a part of.
146
+
147
+ .. code-block :: python
148
+ :caption: Queues constructed from device instance or filter string that selects it have the same context
149
+
150
+ >> > import dpctl
151
+ >> > d = dpctl.SyclDevice(" gpu" )
152
+ >> > q1 = dpctl.SyclQueue(d)
153
+ >> > q2 = dpctl.SyclQueue(" gpu" )
154
+ >> > q1.sycl_context == q2.sycl_context, q1.sycl_device == q2.sycl_device
155
+ (True , True )
156
+ >> > q1 == q2
157
+ False
158
+
159
+ Even through ``q1 `` and ``q2 `` instances of :class: `dpctl.SyclQueue ` target the same device and use the same context
160
+ they do not compare equal, since they correspond to two independent scheduling entities.
161
+
162
+ .. note ::
163
+ :class: `dpctl.tensor.usm_ndarray ` objects one associated with ``q1 `` and another associated with ``q2 ``
164
+ could not be combined in a call to the same function that implementes
165
+ :ref: `compute-followed-data programming model <dpctl_tensor_compute_follows_data >` in :mod: `dpctl.tensor `.
166
+
167
+
168
+ Event
169
+ -----
170
+
171
+ SYCL event is an entity created when a task is submitted to SYCL queue for execution. The event are be used to
172
+ order execution of computational tasks by the DPC++ runtime. They may also contain profiling information associated
173
+ with the submitted task, provided the queue was created with "enable_profiling" property.
174
+
175
+ SYCL event can be used to synchronize execution of the associated task with execution on host by using
176
+ :meth: `dpctl.SyclEvent.wait `.
177
+
178
+ Methods :meth: `dpctl.SyclQueue.submit_async ` and :meth: `dpctl.SyclQueue.memcpy_async ` return
179
+ :class: `dpctl.SyclEvent ` instances.
180
+
181
+ .. note ::
182
+ At this point, :mod: `dpctl.tensor ` does not provide public API for accessing SYCL events associated with
183
+ submission of computation tasks implementing operations on :class: `dpctl.tensor.usm_ndarray ` objects.
184
+
185
+
186
+ Unified Shared Memory
187
+ ---------------------
188
+
189
+ Unified Shared Memory allocations of each kind are represented through Python classes
190
+ :class: `dpctl.memory.MemoryUSMDevice `, :class: `dpctl.memory.MemoryUSMShared `, and
191
+ :class: `dpctl.memory.MemoryUSMHost `.
192
+
193
+ These class constructors allow to make USM allocations of requested size in bytes
194
+ on the devices targeted by given SYCL queue, and are bound to the context from that
195
+ queue. This queue argument is stored the instance of the class and is used to submit
196
+ tasks to when performing copying of elements from or to this allocation or when filling
197
+ the allocation with values.
198
+
199
+ Classes that represent host-accessible USM allocations, i.e. types USM-shared and USM-host,
200
+ expose Python buffer interface.
201
+
202
+ .. code-block :: python
203
+
204
+ >> > import dpctl.memory as dpm
205
+ >> > import numpy as np
206
+
207
+ >> > # allocate USM-shared memory for 6 32-bit integers
208
+ >> > mem_d = dpm.MemoryUSMDevice(26 )
209
+ >> > mem_d.copy_from_host(b " abcdefghijklmnopqrstuvwxyz" )
210
+
211
+ >> > mem_s = dpm.MemoryUSMShared(30 )
212
+ >> > mem_s.memset(value = ord (b " -" " ))
213
+ >> > mem_s.copy_from_device(mem_d)
214
+
215
+ >> > # since USM-shared is host-accessible,
216
+ >> > # it implements Python buffer protocol that allows
217
+ >> > # for Python objects to read this USM allocation
218
+ >> > bytes (mem_s)
219
+ b ' abcdefghijklmnopqrstuvwxyz--'
220
+
221
+
222
+ Backend
223
+ ------ -
224
+
225
+ Intel(R) oneAPI Data Parallel C++ compiler ships with two backends:
226
+
227
+ # . OpenCL backend
228
+ # . Level-Zero backend
229
+
230
+ Additional backends can be added to the compiler by installing CodePlay' s plugins:
231
+
232
+ # . CUDA backend: provided by `oneAPI for NVIDIA(R) GPUs <codeplay_nv_plugin_>`_ from `CodePlay`_
233
+ # . HIP backend: provided by `oneAPI for AMD GPUs <codeplay_amd_plugin_>`_ from `CodePlay`_
234
+
235
+ .. _codeplay_nv_plugin: https:// developer.codeplay.com/ products/ oneapi/ nvidia/
236
+ .. _codeplay_amd_plugin: https:// developer.codeplay.com/ products/ oneapi/ amd/
237
+ .. _CodePlay: https:// codeplay.com/
238
+
239
+ When building open source `Intel LLVM < InteLlVmGh_> ` _ compiler from source the project can be
240
+ configured to enable different backends (see `Get Started Guide < GetStartedGuide_> ` _ for
241
+ further details).
242
+
243
+ .. _GetStartedGuide: https:// github.com/ intel/ llvm/ blob/ sycl/ sycl/ doc/ GetStartedGuide.md
244
+ .. _InteLlVmGh: https:// github.com/ intel/ llvm
0 commit comments