Skip to content

Commit 51df933

Browse files
Expanded tensor_intro to cover moving data between host and device
Added user_guides/execution_model Moved license from user_guides/ to top-level.
1 parent 1f1dd09 commit 51df933

File tree

9 files changed

+404
-91
lines changed

9 files changed

+404
-91
lines changed

docs/doc_sources/beginners_guides/index.rst

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,18 @@
44
Beginner's guides
55
=================
66

7+
Introduction
8+
------------
9+
10+
:mod:`dpctl` brings the standard-based execution model to program a heterogeneous system
11+
to Python through invocations of oneAPI-based native libraries, their Python interfaces,
12+
or by using DPC++-based Python native extensions built using :mod:`dpctl` integration with
13+
Python native extension generators.
14+
15+
The :py:mod:`dpctl` runtime is built on top of the C++ SYCL-2020 standard as implemented in
16+
`Intel(R) oneAPI DPC++ compiler <dpcpp_compiler>`_ and is designed to be both vendor and
17+
architecture agnostic.
18+
719
Installation
820
------------
921

@@ -15,12 +27,6 @@ Working with devices
1527

1628
* :ref:`Managing devices <beginners_guide_managing_devices>`
1729

18-
..
19-
* :ref:`Enumerating available devices <beginners_guide_enumerating_devices>`
20-
* :ref:`Selecting a device <beginners_guide_device_selection>`
21-
* :ref:`Querying information about device <beginners_guide_device_info>`
22-
* :ref:`Can I influence which device is the default one? <beginners_guide_env_variables>`
23-
2430
Introduction to array library
2531
-----------------------------
2632

@@ -29,7 +35,8 @@ Introduction to array library
2935
Miscellaneous
3036
-------------
3137

32-
* History of ``"dpctl"`` :ref:`name <beginners_guide_why_dpctl>`?
38+
* History of ``"dpctl"`` :ref:`name <beginners_guide_why_dpctl>`
39+
* Frequenty asked questions
3340

3441
.. toctree::
3542
:hidden:

docs/doc_sources/beginners_guides/tensor_intro.rst

Lines changed: 60 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,12 +70,11 @@ A created instance of :class:`usm_ndarray` has an associated :class:`dpctl.SyclQ
7070
using :attr:`dpctl.tensor.usm_ndarray.sycl_queue` property. The underlying USM allocation
7171
is allocated on :class:`dpctl.SyclDevice` and is bound to :class:`dpctl.SyclContext` targeted by this queue.
7272

73+
.. _dpctl_tensor_compute_follows_data:
7374

7475
Execution model
7576
---------------
7677

77-
.. _dpctl_tensor_compute_follows_data:
78-
7978
When one of more instances of ``usm_ndarray`` objects are passed to a function in :py:mod:`dpctl.tensor` other than creation function,
8079
a "compute follows data" execution model is followed.
8180

@@ -92,6 +91,7 @@ each one corresponds to the same underlying ``sycl::queue`` object. In such a ca
9291
If input arrays do not conform to the compute-follows-data requirements, :py:exc:`dpctl.utils.ExecutionPlacementError` is raised.
9392
User must explicitly migrate the data to unambiguously control the execution placement.
9493

94+
.. _dpctl_tensor_array_migration:
9595

9696
Migrating arrays
9797
----------------
@@ -227,3 +227,61 @@ following this convention:
227227
228228
# r3 has value "host"
229229
r3 = get_coerced_usm_type(["host", "host", "host"])
230+
231+
Sharing data between devices and Python
232+
---------------------------------------
233+
234+
Python objects, such as sequences of :class:`int`, :class:`float`, or :class:`complex` objects,
235+
or NumPy arrays can be converted to :class:`dpctl.tensor.usm_ndarray` using :func:`dpctl.tensor.asarray`
236+
function.
237+
238+
.. code-block:: python
239+
240+
>>> from dpctl import tensor as dpt
241+
>>> import numpy as np
242+
>>> import mkl_random
243+
244+
>>> # Sample from true random number generator
245+
>>> rs = mkl_random.RandomState(brng="nondeterm")
246+
>>> x_np = rs.uniform(-1, 1, size=(6, 512)).astype(np.float32)
247+
248+
>>> # copy data to USM-device (default) allocated array
249+
>>> x_usm = dpt.asarray(x_np)
250+
>>> dpt.max(x_usm, axis=1)
251+
usm_ndarray([0.9998379 , 0.9963589 , 0.99818915, 0.9975991 , 0.9999802 ,
252+
0.99851537], dtype=float32)
253+
>>> np.max(x_np, axis=1)
254+
array([0.9998379 , 0.9963589 , 0.99818915, 0.9975991 , 0.9999802 ,
255+
0.99851537], dtype=float32)
256+
257+
The content of :class:`dpctl.tensor.usm_ndarray` may be copied into
258+
a NumPy array using :func:`dpctl.tensor.asnumpy` function:
259+
260+
.. code-block:: python
261+
262+
from dpctl import tensor as dpt
263+
import numpy as np
264+
265+
def sieve_pass(r : dpt.usm_ndarray, v : dpt.usm_ndarray) -> dpt.usm_ndarray:
266+
"Single pass of sieve of Eratosthenes"
267+
m = dpt.min(r[r > v])
268+
r[ (r > m) & (r % m == 0) ] = 0
269+
return m
270+
271+
def sieve(n : int) -> dpt.usm_ndarray:
272+
"Find primes <=n using sieve of Erathosthenes"
273+
idt = dpt.int32
274+
s = dpt.concat((
275+
dpt.arange(2, 3, dtype=idt),
276+
dpt.arange(3, n + 1, 2, dtype=idt)
277+
))
278+
lb = dpt.zeros(tuple(), dtype=idt)
279+
while lb * lb < n + 1:
280+
lb = sieve_pass(s, lb)
281+
return s[s > 0]
282+
283+
# get prime numbers <= a million into NumPy array
284+
# to save to disk
285+
ps_np = dpt.asnumpy(sieve(10**6))
286+
287+
np.savetxt("primes.txt", ps_np, fmt="%d")

docs/doc_sources/contributor_guides/building.rst

Lines changed: 52 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -57,58 +57,86 @@ After building the Conda package, install it by executing:
5757
5858
conda install dpctl
5959
60-
.. note::
61-
62-
You can face issues with conda-build version 3.20. Use conda-build
63-
3.18 instead.
64-
6560
6661
Build and Install with scikit-build
6762
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6863

6964
To build using Python ``setuptools`` and ``scikit-build``, install the following Python packages:
7065

71-
- ``cython``
72-
- ``numpy``
73-
- ``cmake``
74-
- ``scikit-build``
75-
- ``ninja``
76-
- ``gtest`` (optional to run C API tests)
77-
- ``gmock`` (optional to run C API tests)
78-
- ``pytest`` (optional to run Python API tests)
66+
- ``cython``
67+
- ``numpy``
68+
- ``cmake``
69+
- ``scikit-build``
70+
- ``ninja``
71+
- ``gtest`` (optional to run C API tests)
72+
- ``gmock`` (optional to run C API tests)
73+
- ``pytest`` (optional to run Python API tests)
7974

8075
Once the prerequisites are installed, building using ``scikit-build`` involves the usual steps.
8176

8277
To build and install, run:
8378

84-
.. code-block:: bash
79+
.. tab-set::
80+
81+
.. tab-item:: Linux
82+
:sync: lnx
83+
84+
.. code-block:: bash
85+
86+
python setup.py install -- -G Ninja -DCMAKE_C_COMPILER:PATH=icx -DCMAKE_CXX_COMPILER:PATH=icpx
8587
86-
python setup.py install -- -G Ninja -DCMAKE_C_COMPILER:PATH=icx -DCMAKE_CXX_COMPILER:PATH=icpx
88+
.. tab-item:: Windows
89+
:sync: win
90+
91+
.. code-block:: bat
92+
93+
python setup.py install -- -G Ninja -DCMAKE_C_COMPILER:PATH=icx -DCMAKE_CXX_COMPILER:PATH=icx
8794
8895
8996
To develop, run:
9097

91-
.. code-block:: bash
98+
.. tab-set::
9299

93-
python setup.py develop -G Ninja -DCMAKE_C_COMPILER:PATH=icx -DCMAKE_CXX_COMPILER:PATH=icpx
100+
.. tab-item:: Linux
101+
:sync: lnx
94102

95-
On Windows OS, use ``icx`` for both C and CXX compilers.
103+
.. code-block:: bash
96104
97-
To develop on Linux OS, use the driver script:
105+
python setup.py develop -G Ninja -DCMAKE_C_COMPILER:PATH=icx -DCMAKE_CXX_COMPILER:PATH=icpx
98106
99-
.. code-block:: bash
107+
.. tab-item:: Windows
108+
:sync: win
109+
110+
.. code-block:: bat
100111
101-
python scripts/build_locally.py
112+
python setup.py develop -G Ninja -DCMAKE_C_COMPILER:PATH=icx -DCMAKE_CXX_COMPILER:PATH=icx
102113
103114
104-
Building Using Custom dpcpp
115+
Developing can be streamlined using the driver script:
116+
117+
.. tab-set::
118+
119+
.. tab-item:: Linux
120+
:sync: lnx
121+
122+
.. code-block:: bash
123+
124+
python scripts/build_locally.py --verbose
125+
126+
.. tab-item:: Windows
127+
:sync: win
128+
129+
.. code-block:: bat
130+
131+
python scripts/build_locally.py --verbose
132+
133+
134+
Building Using Custom DPC++
105135
---------------------------
106136

107137
You can build dpctl from the source using the `DPC++ toolchain <https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md>`_
108138
instead of the DPC++ compiler that comes with oneAPI.
109139

110-
Do this, to enable support for CUDA devices.
111-
112140
Following steps in the `Build and install with scikit-build`_ use a command-line option to set
113141
the relevant CMake variables, for example:
114142

docs/doc_sources/index.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ Data Parallel Control
44

55
.. _DpctlIntroduction:
66

7-
Python package :py:mod:`dpctl` enables Python users to engage with multiple
7+
Python package :py:mod:`dpctl` enables Python users to engage multiple
88
compute devices commonly available in modern consumer- and server-grade
99
computers using industry-standard :sycl_execution_model:`SYCL execution model <>`
10-
facilitated by Intel(R) oneAPI :dpcpp_compiler:`DPC++ compiler <>` implementing
11-
:sycl_spec_2020:`SYCL 2020 standard <>`.
10+
facilitated by :sycl_spec_2020:`SYCL 2020 standard <>`-compliant
11+
Intel(R) oneAPI :dpcpp_compiler:`DPC++ compiler <>`.
1212

1313
:py:mod:`dpctl` provides a reference data-parallel implementation of
1414
array library :py:mod:`dpctl.tensor` conforming to Python Array API specification.
@@ -86,3 +86,4 @@ take place.
8686
user_guides/index
8787
api_reference/index
8888
contributor_guides/index
89+
license

docs/doc_sources/user_guides/basic_concepts.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,8 @@ Definitions
5858
* **Unified Shared Memory**
5959
Unified Shared Memory (USM) refers to pointer-based device memory management.
6060
USM allocations are bound to context. It means, a pointer representing
61-
USM allocation can be unambiguously mapped to the data it represents only
62-
if the associated context is known. USM allocations are accessible by
61+
USM allocation can be unambiguously mapped to the data it represents *only
62+
if* the associated context is known. USM allocations are accessible by
6363
computational kernels that are executed on a device, provided that the
6464
allocation is bound to the same context that is used to construct the queue
6565
where the kernel is scheduled for execution.

0 commit comments

Comments
 (0)