|
3 | 3 | Getting started with :py:mod:`dpctl.tensor`
|
4 | 4 | ===========================================
|
5 | 5 |
|
6 |
| -To be written |
| 6 | +The tensor submodule provides an N-dimensional array object for a tensor whose values have the same data type |
| 7 | +from the :ref:`following list <dpctl_tensor_data_types>`: |
| 8 | + |
| 9 | +.. currentmodule:: dpctl.tensor |
| 10 | + |
| 11 | +.. list-table:: |
| 12 | + |
| 13 | + * - |
| 14 | + - :attr:`int8` |
| 15 | + - :attr:`int16` |
| 16 | + - :attr:`int32` |
| 17 | + - :attr:`int64` |
| 18 | + - |
| 19 | + - :attr:`float16` |
| 20 | + - :attr:`float32` |
| 21 | + - :attr:`complex64` |
| 22 | + |
| 23 | + * - :attr:`bool` |
| 24 | + - :attr:`uint8` |
| 25 | + - :attr:`uint16` |
| 26 | + - :attr:`uint32` |
| 27 | + - :attr:`uint64` |
| 28 | + - |
| 29 | + - |
| 30 | + - :attr:`float64` |
| 31 | + - :attr:`complex128` |
| 32 | + |
| 33 | + |
| 34 | +Creating an array |
| 35 | +----------------- |
| 36 | + |
| 37 | +Array :ref:`creation functions <dpctl_tensor_creation_functions>` support keyword arguments that |
| 38 | +control the device where the array is allocated as well as aspects of |
| 39 | +:ref:`Unified Shared Memory allocation <dpctl_memory_pyapi>` for the array. |
| 40 | + |
| 41 | +These three keywords are: |
| 42 | + |
| 43 | +.. list-table:: |
| 44 | + :header-rows: 1 |
| 45 | + |
| 46 | + * - Keyword arguments |
| 47 | + - Default value |
| 48 | + - Description |
| 49 | + * - ``usm_type`` |
| 50 | + - ``"device"`` |
| 51 | + - type of USM allocation to make |
| 52 | + * - ``device`` |
| 53 | + - ``None`` |
| 54 | + - :py:class:`dpctl.tensor.Device` instance |
| 55 | + * - ``sycl_queue`` |
| 56 | + - ``None`` |
| 57 | + - Instance of :class:`dpctl.SyclQueue` associated with array |
| 58 | + |
| 59 | +Arguments ``sycl_queue`` and ``device`` are complementary to each other, and |
| 60 | +a user need only provide one of these. |
| 61 | + |
| 62 | +A valid setting for the ``device`` keyword argument is any object that can be passed to :py:meth:`dpctl.tensor.Device.create_device`. |
| 63 | +If both ``device`` and ``sycl_queue`` keyword arguments are specified, they must correspond to :class:`dpctl.SyclQueue` instances which |
| 64 | +compare equal to one another. |
| 65 | + |
| 66 | +A created instance of :class:`usm_ndarray` has an associated :class:`dpctl.SyclQueue` instance that can be retrieved |
| 67 | +using :attr:`dpctl.tensor.usm_ndarray.sycl_queue` property. The underlying USM allocation |
| 68 | +is allocated on :class:`dpctl.SyclDevice` and is bound to :class:`dpctl.SyclContext` targeted by this queue. |
| 69 | + |
| 70 | + |
| 71 | +Execution model |
| 72 | +--------------- |
| 73 | + |
| 74 | +.. _dpctl_tensor_compute_follows_data: |
| 75 | + |
| 76 | +When one of more instances of ``usm_ndarray`` objects are passed to a function in :py:mod:`dpctl.tensor` other than creation function, |
| 77 | +a "compute follows data" execution model is followed. |
| 78 | + |
| 79 | +The model requires that :class:`dpctl.SyclQueue` instances associated with each array compared equal to one another, signifying that |
| 80 | +each one corresponds to the same underlying ``sycl::queue`` object. In such a case, the output array is associated with the same |
| 81 | +``sycl::queue`` and computations are scheduled for execution using this ``sycl::queue``. |
| 82 | + |
| 83 | +.. note:: |
| 84 | + Two instances :class:`dpctl.SyclQueue` may target the same ``sycl::device`` and be using the same ``sycl::context``, but correspond |
| 85 | + to different scheduling enties, and hence be in violation of the compute-follows-data requirement. One common example of this are |
| 86 | + ``SyclQueue`` corresponding to default-selected device and using platform default context but created using different properties, e.g. |
| 87 | + one with `"enable_profiling"` set and another without it. |
| 88 | + |
| 89 | +If input arrays do not conform to the compute-follows-data requirements, :py:exc:`dpctl.utils.ExecutionPlacementError` is raised. |
| 90 | +User must explicitly migrate the data to unambiguously control the execution placement. |
| 91 | + |
| 92 | + |
| 93 | +Migrating arrays |
| 94 | +---------------- |
| 95 | + |
| 96 | +Array content can be migrated to a different device :ref:`using <dpctl_tensor_usm_ndarray_to_device_example>` |
| 97 | +either :meth:`dpctl.tensor.usm_ndarray.to_device` method, or by using :func:`dpctl.tensor.asarray` function. |
| 98 | + |
| 99 | +The ``arr.to_device(device=target_device)`` method will be zero-copy if the ``arr.sycl_queue`` and the :class:`dpctl.SyclQueue` |
| 100 | +instance associated with new target device have the same underlying ``sycl::device`` and ``sycl::context`` instances. |
| 101 | + |
| 102 | +Here is an example of migration without a copy: |
| 103 | + |
| 104 | +.. code-block:: python |
| 105 | + :caption: Using ``to_device`` to zero-copy migrate array content to be associated with a different ``sycl::queue`` |
| 106 | +
|
| 107 | + import dpctl |
| 108 | + from dpctl import tensor |
| 109 | +
|
| 110 | + x = tensor.linspace(0, 1, num=10**8) |
| 111 | + q_prof = dpctl.SyclQueue(x.sycl_context, x.sycl_device, property="enable_profiling") |
| 112 | +
|
| 113 | + timer = dpctl.SyclTimer() |
| 114 | + # no data migration takes place here, |
| 115 | + # but x and x1 arrays do not satify compute-follows-data requirements |
| 116 | + x1 = x.to_device(q_prof) |
| 117 | +
|
| 118 | + with timer(q_prof): |
| 119 | + y = tensor.sin(2*x1)*tensor.exp(-tensor.square(x1)) |
| 120 | +
|
| 121 | + host_dt, device_dt = timer.dt |
| 122 | + print(f"Execution on device {x.sycl_device.name} took {device_dt} seconds, on host {host_dt} seconds") |
0 commit comments