v0.13.0
·
3334 commits
to master
since this release
Added
- Implemented and deployed dedicated kernels for copying with casting #781, used in
__setitem__
, implementaion ofasarray
,dpctl.tensor.copy
functions. - Implemented dedicated copying kernel for
dpctl.tensor.reshape
function #810, added support forcopy
keyword #807. - Implemented dedicated kernel to copy with casting from
numpy.ndarray
intodpctl.tensor.usm_ndarray
#817. - Implemented
dpctl.tensor.permute_dims
function from array-API #787. - Implemented
dpctl.tensor.expand_dims
function from array-API #788. - Implemented
dpctl.tensor.squeeze
function from array-API #790. - Implemented
dpctl.tensor.broadcast_to
function from array-API #791. - Implemented
dpctl.tensor.broadcast_arrays
function from array-API #798. - Implemented
dpctl.tensor.flip
function from array-API #801. - Implemented
dpctl.tensor.usm_ndarray.mT
property per array-API #805. - Implemented
dpctl.tensor.roll
function from array-API #809. - Implemented
dpctl.tensor.arange
function from array-API #814. - Implemented
dpctl.tensor.zeros
function from array-API #816. - Implemented
dpctl.tensor.zeros
function from array-API #816. - Implemented
dpctl.tensor.ones
,dpctl.tensor.full
,dpctl.tensor.empty_like
,dpctl.tensor.zeros_like
,dpctl.tensor.ones_like
,dpctl.tensor.full_like
functions from array-API #822. - Implemented
DPCTLQueue_Memset
function in SyclInterface library #812, and exposed it fordpctl.memory.MemoryUSM*
classes #815. - Implemented
dpctl.utils.get_coerced_usm_type
to deduced usm type of the output array from types of input arrays in compute-follows-data execution model #797. - Added
dpctl.SyclDevice.profiling_timer_resolution
property #825. - Added
dpctl.SyclDevice.platform
anddpctl.SyclPlatform.default_context
properties #827. - Provided pybind11 example for functions working on
dpctl.tensor.usm_ndarray
container applying oneMKL functions #780, #793, #819. The example was expanded to demonstrate implementing iterative linear solvers (Chebyshev solver, and Conjugate-Gradient solver) by asynchronously submitting individual SYCL kernels from Python #821, #833, #838. - Wrote manual page about working with
dpctl.SyclQueue
#829. - Added cmake scripts to dpctl package layout and a way to query the location #853.
- Implemented
dpctl.tensor.concat
function from array-API #867. - Implemented
dpctl.tensor.stack
function from array-API #872.
Changed
- Enhanced coverage collection for SyclInterface library by also collecting it during pytest run and combining traces with those collected during C-test run #818. This change also allows to not rebuild SyclInterface library when building C-test executable.
- Exported
keep_args_alive
utility indpctl4pybind11.hpp
header #820. The utility usessycl::handler::host_task
to keep given Python arguments alive until eacsycl::event
from the given vector of events is complete. The host task is scheduled on the SYCL queue provided as the first argument. - Changed the size of struct underlying
dpctl.SyclEvent
to avoid storing Python object previously used to keep kernel arguments scheduled withdpctl.SyclQueue.submit
#823. - Fixed docstring for
dpctl.SyclTimer
#824. - Changed type of exceptions raised on failure to create
dpctl.SyclDevice
fromValueError
todpctl.SyclDeviceCreationError
#826. - Improved performance of pybind11 type casters #837.
- Changed implementation of
dpctl.SyclProgram
from using deprecatedsycl::program
tosycl::kernel_bundle
#845. - Removed deprecated device aspects, added new supported aspects #844.
- Updated vendored
dlpack.h
to version 0.7 #847.
Fixed
- Fixed
dpctl.lsplatform()
to work correctly when used from within Jupyter notebook #800. - Fixed script to drive debug build #835 and fixed code to compile in debug mode #836.
- Fixed filter selector string produced in outputs of
dpctl.lsplatform(verbosity=2)
anddpctl.SyclDevice.print_device_info
#866. - Fixed issue with slicing reported in gh-870 in #871.
New contributor: @npolina4 contributed #867, #872 and reported #870