Skip to content

Commit 1f1dd09

Browse files
Consolidated many pages into "Heterogenous Systems and Programming Concepts"
The page was extended with examples and nuances for SyclQueue, SyclContext, USM allocations and Backends. Added entry for SYCL_PI_TRACE env. variable.
1 parent 9c541e0 commit 1f1dd09

File tree

13 files changed

+236
-522
lines changed

13 files changed

+236
-522
lines changed

docs/doc_sources/api_reference/dpctl/memory.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
Subpackage :py:mod:`dpctl.memory` exposes Unified Shared Memory(USM) operations.
88

99
Unified Shared Memory is a pointer-based memory management in SYCL guaranteeing that
10-
all devices use a `unified address space <sycl_unified_address_space_>`_.
10+
the host and all devices use a `unified address space <sycl_unified_address_space_>`_.
1111
Quoting from the SYCL specification:
1212

1313
.. _sycl_unified_address_space: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_unified_addressing

docs/doc_sources/beginners_guides/managing_devices.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,8 @@ as argument to the class constructor:
226226
# create GPU device, or CPU if GPU is not available
227227
dev_gpu_or_cpu = dpctl.SyclDevice("gpu,cpu")
228228
229+
.. _beginners_guide_oneapi_device_selector_usecase:
230+
229231
Selecting device using ``ONEAPI_DEVICE_SELECTOR``
230232
-------------------------------------------------
231233

docs/doc_sources/index.rst

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,6 @@ computers using industry-standard :sycl_execution_model:`SYCL execution model <>
1010
facilitated by Intel(R) oneAPI :dpcpp_compiler:`DPC++ compiler <>` implementing
1111
:sycl_spec_2020:`SYCL 2020 standard <>`.
1212

13-
..
14-
:mod:`dpctl` leverages `oneAPI DPC++ compiler runtime <dpcpp_compiler>`_ to
15-
answer the following three questions users of heterogenous platforms ask:
16-
17-
1. What are available compute devices?
18-
2. How to specify the device a computation is to be offloaded to?
19-
3. How to manage sharing of data between devices and Python?
20-
21-
:mod:`dpctl` implements Python classes and free functions mapping to DPC++
22-
entities to answer these questions.
23-
2413
:py:mod:`dpctl` provides a reference data-parallel implementation of
2514
array library :py:mod:`dpctl.tensor` conforming to Python Array API specification.
2615
The implementation adheres to a programming model affording clear control

docs/doc_sources/user_guides/basic_concepts.rst

Lines changed: 187 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,40 @@
11
.. _basic_concepts:
22

3-
Basic Concepts
4-
==============
3+
Heterogeneous Systems and Programming Concepts
4+
==============================================
55

6-
This section introduces the basic concepts for XPU management used by :py:mod:`dpctl`.
6+
This section introduces the basic concepts defined by SYCL standard
7+
for programming heterogeneous system, and used by :py:mod:`dpctl`.
78

89
.. note::
910
For SYCL-level details, refer to a more topical SYCL reference,
1011
such as the :sycl_spec_2020:`SYCL 2020 spec <>`.
1112

13+
Definitions
14+
-----------
15+
1216
* **Heterogeneous computing**
13-
Refers to using multiple devices in a program.
17+
Refers to computing on multiple devices in a program.
1418

1519
* **Host**
16-
Every program starts by running on a host, and most of the lines of code in
17-
a program, in particular lines of code implementing the Python interpreter
18-
itself, are usually for the host. Hosts are customarily CPUs.
20+
Every program starts by running on a host, and most of the lines of code in
21+
a program, in particular lines of code implementing the Python interpreter
22+
itself, are usually for the host. Hosts are customarily CPUs.
1923

2024
* **Device**
21-
A device is an XPU connected to a host that is programmable with a specific
22-
device driver. Different types of devices can have different architectures
23-
(CPUs, GPUs, FPGA, ASICs, DSP) but are programmable using the same
24-
:oneapi:`oneAPI <>` programming model.
25+
A device is a processing unit connected to a host that is programmable
26+
with a specific device driver. Different types of devices can have
27+
different architectures (CPUs, GPUs, FPGA, ASICs, DSP) but are programmable
28+
using the same :oneapi:`oneAPI <>` programming model.
2529

2630
* **Platform**
27-
A device driver installed on the system is called the platform. As multiple
28-
devices of the same type can share the same device driver, a platform may
29-
contain multiple devices. The same physical hardware (for example, GPU)
30-
may be reflected as two separate devices if they can be programmed by more
31-
than one platform. For example, the same GPU hardware can be listed as an
32-
OpenCL* GPU device and a Level-Zero* GPU device.
31+
Platform is an abstraction to represent a collection of devices addressable
32+
by the same lower-level framework. As multiple
33+
devices of the same type can programmed by the same framework, a platform may
34+
contain multiple devices. The same physical hardware (for example, GPU)
35+
may be programmable by different lower-level frameworks, and hence be enumerated
36+
as part of different platforms. For example, the same GPU hardware can be listed
37+
as an OpenCL* GPU device and a Level-Zero* GPU device.
3338

3439
* **Context**
3540
Holds the runtime information needed to operate on a device or a
@@ -50,7 +55,7 @@ This section introduces the basic concepts for XPU management used by :py:mod:`d
5055
for collection of such information. Events can be used to specify task
5156
dependencies as well as to synchronize host and devices.
5257

53-
* **USM**
58+
* **Unified Shared Memory**
5459
Unified Shared Memory (USM) refers to pointer-based device memory management.
5560
USM allocations are bound to context. It means, a pointer representing
5661
USM allocation can be unambiguously mapped to the data it represents only
@@ -73,5 +78,167 @@ Runtime manages synchronization of the host's and device's view into shared allo
7378
The initial placement of the shared allocations is not defined.
7479

7580
* **Backend**
76-
Refers to the implementation of :oneapi:`oneAPI <>` programming model exposed
77-
by the underlying runtime.
81+
Refers to the implementation of :oneapi:`oneAPI <>` programming model using a
82+
lower-level heterogeneous programming API. Amongst examples of backends are
83+
"cuda", "hip", "level_zero", "opencl". In particular backend implements a
84+
platform abstraction.
85+
86+
87+
Platform
88+
--------
89+
90+
A platform abstracts one or more SYCL devices that are connected to
91+
a host and can be programmed by the same underlying framework.
92+
93+
The :class:`dpctl.SyclPlatform` class represents a platform and
94+
abstracts the :sycl_platform:`sycl::platform <>` SYCL runtime class.
95+
96+
To obtain all platforms available on a system programmatically, use
97+
:func:`dpctl.lsplatform` function. Refer to :ref:`Enumerating available devices <beginners_guide_enumerating_devices>`
98+
for more information.
99+
100+
It is possible to select devices from spefic backend, and hence belonging to
101+
the same platform, by :ref:`using <beginners_guide_oneapi_device_selector>`
102+
``ONEAPI_DEVICE_SELECTOR`` environment variable, or by using
103+
a :ref:`filter selector string <filter_selector_string>`.
104+
105+
106+
Context
107+
-------
108+
109+
A context is an entity that is associated with the state of device as managed by the
110+
backend. The context is required to map unified address space pointer to the device
111+
where it was allocated unambiguously.
112+
113+
In order for two DPC++-based Python extensions to share USM allocations, e.g.
114+
as part of :ref:`DLPack exchange <dpctl_tensor_dlpack_support>`, they each must use
115+
the `same` SYCL context when submitting for execution programs that would access this
116+
allocation.
117+
118+
Since ``sycl::context`` is dynamically constructed by each extension sharing a USM allocation,
119+
in general, requires sharing the ``sycl::context`` along with the USM pointer, as it is done
120+
in ``__sycl_usm_array_interface__`` :ref:`attribute <suai_attribute>`.
121+
122+
Since DLPack itself does not provide for storing of the ``sycl::context``, the proper
123+
working of :func:`dpctl.tensor.from_dlpack` function is only supported for devices of those
124+
platforms that support default platform context SYCL extension `sycl_ext_oneapi_default_platform_context`_,
125+
and only of those allocations that are bound to this default context.
126+
127+
To query where a particular device ``dev`` belongs to a plaform that implements
128+
the default context, check whether ``dev.sycl_platform.default_context`` returns an instance
129+
of :class:`dpctl.SyclContext` or raises an exception.
130+
131+
132+
.. _sycl_ext_oneapi_default_platform_context: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_default_context.asciidoc
133+
134+
135+
.. _user_guide_queues:
136+
137+
Queue
138+
-----
139+
140+
SYCL queue is an entity associated with scheduling computational tasks for execution
141+
on a targeted SYCL device and using some specific SYCL context.
142+
143+
Queue constructor generally requires both to be specified. For platforms that support the
144+
default platform context, a shortcut queue constuctor call that specifies only a device would
145+
use the default platform context associated with the platform given device is a part of.
146+
147+
.. code-block:: python
148+
:caption: Queues constructed from device instance or filter string that selects it have the same context
149+
150+
>>> import dpctl
151+
>>> d = dpctl.SyclDevice("gpu")
152+
>>> q1 = dpctl.SyclQueue(d)
153+
>>> q2 = dpctl.SyclQueue("gpu")
154+
>>> q1.sycl_context == q2.sycl_context, q1.sycl_device == q2.sycl_device
155+
(True, True)
156+
>>> q1 == q2
157+
False
158+
159+
Even through ``q1`` and ``q2`` instances of :class:`dpctl.SyclQueue` target the same device and use the same context
160+
they do not compare equal, since they correspond to two independent scheduling entities.
161+
162+
.. note::
163+
:class:`dpctl.tensor.usm_ndarray` objects one associated with ``q1`` and another associated with ``q2``
164+
could not be combined in a call to the same function that implementes
165+
:ref:`compute-followed-data programming model <dpctl_tensor_compute_follows_data>` in :mod:`dpctl.tensor`.
166+
167+
168+
Event
169+
-----
170+
171+
SYCL event is an entity created when a task is submitted to SYCL queue for execution. The event are be used to
172+
order execution of computational tasks by the DPC++ runtime. They may also contain profiling information associated
173+
with the submitted task, provided the queue was created with "enable_profiling" property.
174+
175+
SYCL event can be used to synchronize execution of the associated task with execution on host by using
176+
:meth:`dpctl.SyclEvent.wait`.
177+
178+
Methods :meth:`dpctl.SyclQueue.submit_async` and :meth:`dpctl.SyclQueue.memcpy_async` return
179+
:class:`dpctl.SyclEvent` instances.
180+
181+
.. note::
182+
At this point, :mod:`dpctl.tensor` does not provide public API for accessing SYCL events associated with
183+
submission of computation tasks implementing operations on :class:`dpctl.tensor.usm_ndarray` objects.
184+
185+
186+
Unified Shared Memory
187+
---------------------
188+
189+
Unified Shared Memory allocations of each kind are represented through Python classes
190+
:class:`dpctl.memory.MemoryUSMDevice`, :class:`dpctl.memory.MemoryUSMShared`, and
191+
:class:`dpctl.memory.MemoryUSMHost`.
192+
193+
These class constructors allow to make USM allocations of requested size in bytes
194+
on the devices targeted by given SYCL queue, and are bound to the context from that
195+
queue. This queue argument is stored the instance of the class and is used to submit
196+
tasks to when performing copying of elements from or to this allocation or when filling
197+
the allocation with values.
198+
199+
Classes that represent host-accessible USM allocations, i.e. types USM-shared and USM-host,
200+
expose Python buffer interface.
201+
202+
.. code-block:: python
203+
204+
>>> import dpctl.memory as dpm
205+
>>> import numpy as np
206+
207+
>>> # allocate USM-shared memory for 6 32-bit integers
208+
>>> mem_d = dpm.MemoryUSMDevice(26)
209+
>>> mem_d.copy_from_host(b"abcdefghijklmnopqrstuvwxyz")
210+
211+
>>> mem_s = dpm.MemoryUSMShared(30)
212+
>>> mem_s.memset(value=ord(b"-""))
213+
>>> mem_s.copy_from_device(mem_d)
214+
215+
>>> # since USM-shared is host-accessible,
216+
>>> # it implements Python buffer protocol that allows
217+
>>> # for Python objects to read this USM allocation
218+
>>> bytes(mem_s)
219+
b'abcdefghijklmnopqrstuvwxyz--'
220+
221+
222+
Backend
223+
-------
224+
225+
Intel(R) oneAPI Data Parallel C++ compiler ships with two backends:
226+
227+
#. OpenCL backend
228+
#. Level-Zero backend
229+
230+
Additional backends can be added to the compiler by installing CodePlay's plugins:
231+
232+
#. CUDA backend: provided by `oneAPI for NVIDIA(R) GPUs <codeplay_nv_plugin_>`_ from `CodePlay`_
233+
#. HIP backend: provided by `oneAPI for AMD GPUs <codeplay_amd_plugin_>`_ from `CodePlay`_
234+
235+
.. _codeplay_nv_plugin: https://developer.codeplay.com/products/oneapi/nvidia/
236+
.. _codeplay_amd_plugin: https://developer.codeplay.com/products/oneapi/amd/
237+
.. _CodePlay: https://codeplay.com/
238+
239+
When building open source `Intel LLVM <InteLlVmGh_>`_ compiler from source the project can be
240+
configured to enable different backends (see `Get Started Guide <GetStartedGuide_>`_ for
241+
further details).
242+
243+
.. _GetStartedGuide: https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md
244+
.. _InteLlVmGh: https://github.com/intel/llvm

0 commit comments

Comments
 (0)