Releases · IntelPython/dpctl

06 Jun 21:05

0.20.1

31d4c10

v0.20.1 Latest

Latest

This is a bug fix release which fixes missing event dependencies in roll and reshape Python bindings for size-1 input arrays, see gh-2095

Assets 4

06 Jun 20:55

ndgrigorian

0.20.0

de4b977

v0.20.0

This release achieves compliance of dpctl.tensor with the Python Array API 2024.12 standard.

The dpctl namespace has also received a number of new features, including new Python classes dpctl.LocalAccessor, dpctl.WorkGroupMemory, and dpctl.RawKernelArg to be used as kernel argument types, support for peer access between dpctl.SyclDevice instances, and support for composite Level Zero devices.

Added

Added dpctl.WorkGroupMemory class representing sycl::ext::oneapi::experimental::work_group_memory, to be used as a kernel argument type gh-1984
Added dpctl.LocalAccessor class representing sycl::local_accessor, to be used as a kernel argument type gh-1991
Added dpctl.SyclPlatform.get_devices method for getting all dpctl.SyclDevices for the platform gh-1992
Added support for the composite devices extension for Level Zero devices, usable with some devices when setting ZE_FLAT_DEVICE_HIERARCHY=COMBINED gh-1993
Added out keyword to tensor.take gh-2010
Added dpctl.RawKernelArg class representing sycl::ext::oneapi::experimental::raw_kernal_arg, to be used as a kernel argument type gh-2038
Added dpctl.SyclDevice methods for querying, enabling, and disabling peer access between devices gh-2077, gh-2082

Changed

Updated Level Zero loader detection to no longer rely on reading libur_adapter_level_zero.so for the loader filename gh-2025
Updated integer array indexing to align with the 2024.12 array API specification gh-2032
Support for Boolean data-type is added to dpctl.tensor.ceil, dpctl.tensor.floor, and dpctl.tensor.trunc gh-2033
Changed implementation of DPCTLPlatform_GetDefaultContext from using deprecated ext_oneapi_get_default_context to khr_get_default_context gh-2042
Updated supported array API specification version to 2024.12 gh-2047
Implementation struct for tensor.imag now uses a static member value for the imaginary part of real-valued inputs gh-2063
Updated repr to show the shape of the abbreviated arrays and show the shape and data type of zero-size arrays gh-2067
Changed tensor.__array_namespace_info__().capabilities()[]"max dimensions"] to None gh-2071

Fixed

Refactored code common to accumulation operations (dpt.cumulative_sum, dpt.cumulative_prod, dpt.cumulative_logsumexp) and removed unnecessary event initialization gh-2011
Fixed incorrect results for dpt.cumulative_sum and dpt.cumulative_prod when dtype=dpt.bool gh-2018
Fixed a typo in dpctl.SyclPlatform repr gh-2035
Fixed a bug in tensor.asarray where order="K" could fail to produce an array sufficient for the internal copy operation for some edge cases, including a contiguous array with permuted dimensions gh-2058
Fixed a typo in dpctl.memory.USMAllocationError gh-2072

Maintenance

Document dpctl.device_type, dpctl.backend_type, dpctl.event_status_type, and dpctl.global_mem_cache_type enums gh-2019
Updated SYCL_INCLUDE_DIR_HINT in Conda recipe gh-2039
Updated expected dtypes in element-wise function docstrings gh-2041, gh-2048
Set ARRAY_API_TESTS_VERSION=2024.12 when running array API conformity job in CI gh-2046
Install hwloc when running CI job for nightly SYCL compiler gh-2050
Added cython-lint to pre-commit to improve style and readability of Cython code gh-2056
Skip upload jobs when GitHub CI is called from a forked repo gh-2059
Disable nightly tests run from forked repos gh-2060
Fixed a typo in beginner's guide example gh-2061
Updated bandit version gh-2075
Updated Conda installation instructions gh-2080, gh-2081
Fixed an incorrect link to changelog in package metadata gh-2085
Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts gh-2020, gh-2034, gh-2043, gh-2044, gh-2065, gh-2066, gh-2068, gh-2070

New Contributors

@jharlow-intel made their first contribution in #2054
@david-cortes-intel made their first contribution in #2080

Contributors

jharlow-intel and david-cortes-intel

Assets 4

28 Feb 19:25

ndgrigorian

0.19.0

1336b31

v0.19.0

This release features official, out-of-the-box support for compiling dpctl for specified AMD GPU architectures, the addition of new function tensor.top_k, a radix-sort-based implementation of sorting functions, and improvements to interoperability with DLPack through tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice.

A number of adjustments were also made to improve performance of dpctl reductions (i.e., sum, min, max, etc.), accumulators (i.e., cumulative_sum, cumulative_logsumexp), and copy-and-cast operations.

Added

Support for compiling dpctl for specified AMD GPU architecture with use of CodePlay oneAPI plug-in gh-1731
Added tensor.top_k per Python Array API specification gh-1921
Added functions tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice for converting between DLPack and sycl devices, and a method get_device_id to dpctl.SyclDevice to improve interoperability with DLPack protocol gh-1953
Added DPCTL_OFFLOAD_COMPRESS cmake option (set to OFF by default) to toggle --offload-compress linker option when building dpctl gh-1961

Changed

Improved performance of copy-and-cast operations from numpy.ndarray to tensor.usm_ndarray for contiguous inputs gh-1829
py_sort and py_argsort now throw py::value_error if inputs are not C-contiguous gh-1838
Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices gh-1850
Improved performance of tensor.argsort function for all types gh-1859
Improved performance of tensor.sort and tensor.argsort for short arrays in the range [16, 64] elements gh-1866
Implemented radix sort algorithm to be used in dpt.sort and dpt.argsort gh-1867, gh-1883
Extended dpctl.SyclTimer with device_timer keyword, implementing different methods of collecting device times gh-1872
dpctl changed to see GPU devices out of the box in virtual environment on Windows gh-1922
Improved performance of tensor.cumulative_sum, tensor.cumulative_prod, tensor.cumulative_logsumexp as well as performance of boolean indexing gh-1923, gh-1942
Improved performance of tensor.min, tensor.max, tensor.logsumexp, tensor.reduce_hypot for floating point type arrays by at least 2x gh-1932, gh-1937
Updated Cython examples to use scikit-build gh-1935
Reduced binary size of _tensor_accumulation_impl by 13 MB gh-1957
Extended tensor.asarray to support objects that implement __usm_ndarray__ property to be interpreted as usm_ndarray objects gh-1959
tensor.usm_ndarray object disallows implicit conversions to NumPy array gh-1964
stream arguments in tensor.usm_ndarray methods now raise an error if stream is not a tensor.SyclQueue gh-1969
dpctl initialization sets subprocess to use SPAWN method on Linux to enable gdb-oneapi to debug kernels submitted from Python applications gh-1971
Reduced binary size of _tensor_elementwise_impl gh-1976
Allow dpctl.SyclQueue.memcpy to and from multi-dimensional buffers gh-1985

Fixed

Fixed a bug in tensor.roll for very large values of shift gh-1869
Fix for tensor.result_type when all inputs are Python built-in scalars gh-1877
Improved error in constructors tensor.full and tensor.full_like when provided a non-numeric fill value gh-1878
Added a check for pointer alignment when copying to C-contiguous memory gh-1890, gh-1891
Fixed dpctl installed into virtual environment not finding DPC++ runtime libraries by adding DPCTL_WITH_REDIST cmake option (set to OFF by default) gh-1893
Fixed incorrect result (issue gh-1901) in tensor.cumulative_sum and in advanced indexing gh-1902
Fixed __setitem__() for tensor.usm_ndarray when passed an empty boolean mask gh-1915
tensor.from_dlpack docstring now shows that return type can be NumPy array and stipulates when this will be the case gh-1919
Fixed docstring in helper class in DLPack tests gh-1920
Fixed a bug in tensor.astype where copy=False would not be respected for 1d arrays when order keyword is specified gh-1928
Replaced deprecated CL/sycl.hpp with recommended sycl/sycl.hpp in examples gh-1933
Fixed tensor.take_along_axis and tensor.put_along_axis raising an error for tensor.uint64 indices when given an array of dimension greater than 1 gh-1934
Fixed unexpected results of tensor.sum with a requested output type of bool gh-1958
Use std::move to avoid unnecessary copying of temporary in triul_ctor.cpp gh-1960
Make stream a keyword-only argument in tensor.usm_ndarray.to_device per requirement by array API specification gh-1966
Improve efficiency of copy implementation and avoid an unnecessary kernel invocation in tensor.argsort for 1d input gh-1967
Corrected uses of NumPy constructors with tensor.usm_ndarray inputs in test suite gh-1968
Fixed array API namespace inspection utilities showing complex128 as a valid dtype on devices without double precision and device keywords not working with dpctl.SyclQueue or filter strings gh-1979
Fixed a bug in test_sycl_device_interface.cpp which would cause compilation to fail with Clang version 20.0 gh-1989
Fixed memory leaks in smart-pointer-managed USM temporaries in synchronizing kernel calls gh-2002
UsmNDArray_MakeSimpleFromPtr and UsmNDArray_MakeFromPtr now raise an error when provided an invalid typenum before attempting to create the array gh-2003
Fixed typos in tensor.from_numpy and tensor.astype gh-2006

Maintenance

Revert pinning of cmake to 3.26 on Windows gh-1823
Update black version used in Python code style workflow gh-1828
Fixed CI/CD workflow for building conda packages on Windows gh-1831
Revert work-around in test_sycl_kernel_submit.py for problem in MKL 2024.2.0 gh-1836
Do not use Mambaforge variant of miniforge as deprecated gh-1844
Use pybind11=2.13.6 gh-1845
Remove unnecessary include in C++ header file gh-1846
Build translation unit "simplify_iteration_space.cpp" compiled multiple times as a static library gh-1847
Add instructions for installing dpctl from Intel PyPi channel gh-1860
Fix warnings when generating docs gh-1855, gh-1861
Align conda recipe with conda-forge's {{ stdlib("c") }} migration gh-1868
Add missing include of SYCL header to "math_utils.hpp" gh-1899
Add support of CV-qualifiers in is_complex<T> helper gh-1900
Tuning work for elementwise functions with modest performance gains (under 10%) gh-1889
Reduce binary ...

Contributors

sommerlukas

Assets 4

07 Dec 18:21

oleksandr-pavlyk

0.18.3

69be39d

v0.18.3

This is a bug fix release which supports use of dpctl in virtual environment on Windows, resolving gh-1745.

Assets 4

03 Dec 20:58

oleksandr-pavlyk

0.18.2

7bac769

v0.18.2

This is a bug-fix release, see https://github.com/IntelPython/dpctl/milestone/15.

It backports fixes for

tensor.result_type behavior for scalars (see gh-1874) and
errors when using dpctl in virtual environment on Linux (gh-1892).

Changes from PR gh-1899 were also backported.

Assets 4

14 Oct 11:56

oleksandr-pavlyk

0.18.1

5e5513f

v0.18.1

This is incremental release where only installation instructions in README were updated to reflect the change in location of index with Python packages built by Intel(R) relative to 0.18.0 release.

Assets 4

30 Sep 10:42

oleksandr-pavlyk

0.18.0

786365e

v0.18.0

This release reaches an important milestone of making offloading fully asynchronous.

Calls to dpctl.tensor submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.

The full list of changes that went into this release are: