|
| 1 | +# August'20 release notes |
| 2 | + |
| 3 | +Release notes for the commit range 75b3dc2..414c1e5 |
| 4 | + |
| 5 | +## New features |
| 6 | + - Implemented basic support for the [Explicit SIMD extension](./sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md) |
| 7 | + for low-level GPU performance tuning [84bf234] [32bf607] [a lot of others] |
| 8 | + - Implemented support for the [SYCL_INTEL_usm_address_spaces extension](https://github.com/intel/llvm/pull/1840) |
| 9 | + - Implemented support for the [Use Pinned Host Memory Property extension](doc/extensions/UsePinnedMemoryProperty/UsePinnedMemoryPropery.adoc) [e5ea144][aee2d6c][396759d] |
| 10 | + - Implemented aspects feature from the SYCL 2020 provisional Specification |
| 11 | + [89804af] |
| 12 | + |
| 13 | + |
| 14 | +## Improvements |
| 15 | +### SYCL Compiler |
| 16 | + - [CUDA BE] Removed unnecessary memory fence in the `sycl::group::barier` |
| 17 | + implementation which should improve performance [e2fc1b8] |
| 18 | + - [CUDA BE] Added support for the sycl builtins from relational, geometric, |
| 19 | + common and math categories [d4e7929] [d9bad0b] [0c9c9c0] [99957c5] |
| 20 | + - Added support for `C array` as a kernel parameter [00e7308] |
| 21 | + - [CUDA BE] Added support for kernel offset [c7bb288] |
| 22 | + - [CUDA BE] Added support for `sycl::half` type [8444189][8f39763] |
| 23 | + - Added support for SYCL kernel inheritance and nested arrays [0b2de9e] |
| 24 | + - Added a diagnostic on attempt to use const static data members that are not |
| 25 | + const-initialized [bde1085] |
| 26 | + - Added support for a set of standard library functions for AOT compilation |
| 27 | + [2bd5dab] |
| 28 | + - Allowed use of function declarators with empty parentheses [a4f2182] |
| 29 | + - The fallback implementation of standard library functions is now linked to |
| 30 | + the device code, only if such functions are used in kernels only [9a8864c] |
| 31 | + - Added support for recursive function calls in a constexpr context [06f667a] |
| 32 | + - Added a diagnostic on attempt to capture `this` as a kernel parameter |
| 33 | + [1b9f026] |
| 34 | + - Added [[intel::reqd_sub_group_size()]] attribute as a replacement for |
| 35 | + [[cl::reqd_sub_group_size()]] which is now depricated [b2da2c8] |
| 36 | + - Added propagation of attributes from transitive calls to the kernel[5c91609] |
| 37 | + - Changed the driver to pass corresponding device specific options when `-g` |
| 38 | + or `-O0` is passed [31eb425] |
| 39 | + - The `sycl::usm_allocator` has been improved. Now it has equality operators |
| 40 | + and can be used with `std::allocate_shared`. Disallowed usage with |
| 41 | + device allocations [ce915ef] |
| 42 | + - Added support for lambda functions passed to reductions [115c1a0] |
| 43 | + |
| 44 | + |
| 45 | +### SYCL Library |
| 46 | + - Added support for braced-init-list or a number as range for |
| 47 | + `sycl::queue::parallel_for` family functions [17299ee] |
| 48 | + - Finished implementation of [parallel_for simplification extension](doc/extensions/ParallelForSimpification) [af792cb] |
| 49 | + - Added 64-bit type support for to `load` and `store` methods of |
| 50 | + `sycl::intel::sub_group` [fe8d852] |
| 51 | + - [CUDA BE] Do not enable event profiling if it's not requested by passing |
| 52 | + `sycl::property::queue::enable_profiling` property [bbe8457] |
| 53 | + - Sub-group support has been aligned with the latest changes to the extension |
| 54 | + document [bea6aa2] |
| 55 | + - [CUDA BE] Optimized waiting for event completion by synchronizing with |
| 56 | + latest event for a queue [d7ee359] |
| 57 | + - Finished implementation of the [Host task with interop capabilities](https://github.com/codeplaysoftware/standards-proposals/blob/master/host_task/host_task.md) |
| 58 | + extension [f088e38] |
| 59 | + - Added builtins for one-element `sycl::vec` for host device [073a36b] |
| 60 | + - [L0 BE] Added support for specialization constants [be4e641] |
| 61 | + - Improved diagnostic on attempt to submit a kernel with local size which |
| 62 | + doesn't math value specified in the `sycl::intel::reqd_work_group_size` |
| 63 | + attribute for the kernel [03ef819] |
| 64 | + - [CUDA BE] Changed active context to be persistent [296fa1a] |
| 65 | + - [CUDA BE] Changed default gpu architecture for device code to `SM_50` |
| 66 | + [800e452] |
| 67 | + - Added a diagnostic on attempt to create a device accessor from zero-sized |
| 68 | + buffer [80b2110] |
| 69 | + - Changed default backend to level zero [11ef88c] |
| 70 | + - Improved performance of the SYCL graph cleanup [c099e47] |
| 71 | + - [L0 BE] Added support for `sycl::sampler` [f3b8cdf] |
| 72 | + - Added support for `TriviallyCopyable` types to the |
| 73 | + `sycl::intel::sub_group::shuffle` [d3c7b20] |
| 74 | + - Implemented range simplification for queue Shortcuts [4009b8b] |
| 75 | + - Changed `sycl::accessor::operator[]` to return const reference when acess |
| 76 | + mode is `sycl::access::mode::read_only` [03db009] |
| 77 | + - Exceptions thrown in a host task are now will be returned as asynchronous |
| 78 | + exceptions [280b93c] |
| 79 | + - Fixed `sycl::buffer` constructor which takes a contiguous container to |
| 80 | + enable copy back on destruction. |
| 81 | + - Added support for user-defined sub-group reductions [728429a] |
| 82 | + - The `sycl::backend::level0` has been renamed to `sycl::backend::level_zero` |
| 83 | + [215f591] |
| 84 | + - Extended `sycl::broadcast` to support `TriviallyCopyable` types [df6d715] |
| 85 | + - Implemented `get_native` and `make_*` functions for Level Zero allowing to |
| 86 | + query native handles of SYCL objects and to create SYCL objects by providing |
| 87 | + a native handle: platform, device, queue, program. The feature is described |
| 88 | + the SYCL 2020 provisional specification [a51c333] |
| 89 | + - Added support for `sycl::intel::atomic_ref` from [SYCL_INTEL_extended_atomics extension](doc/extensions/ExtendedAtomics/SYCL_INTEL_extended_atomics.asciidoc) |
| 90 | + |
| 91 | + |
| 92 | +### Documentation |
| 93 | + - Added [SYCL_INTEL_accessor_properties](doc/extensions/accessor_properties/SYCL_INTEL_accessor_properties.asciidoc) extension specification [58fc414] |
| 94 | + - The documentation for the CUDA BE has been improved [928b815] |
| 95 | + - The [Queue Shortcuts extension](sycl/doc/extensions/QueueShortcuts/QueueShortcuts.adoc) |
| 96 | + document has been updated [defac3c2] |
| 97 | + - Added [Use Pinned Host Memory Property extension](doc/extensions/UsePinnedMemoryProperty/UsePinnedMemoryPropery.adoc) specification [e5ea144] |
| 98 | + - Updated the [SYCL_INTEL_extended_atomics extension](doc/extensions/ExtendedAtomics/SYCL_INTEL_extended_atomics.asciidoc) |
| 99 | + to describe `sycl::intel::atomic_accessor` [4968e7c] |
| 100 | + - The [SYCL_INTEL_sub_group extension](doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc) |
| 101 | + document has been updated [067536e] |
| 102 | + - Added [FPGA lsu extension](sycl/doc/extensions/IntelFPGA/FPGALsu.md) |
| 103 | + document [2c2b5f2] |
| 104 | + |
| 105 | + |
| 106 | +## Bug fixes |
| 107 | +### SYCL Compiler |
| 108 | + - Fixed the diagnostic on `cl::reqd_sub_group_size` attribute mismatches |
| 109 | + [75b3dc2] |
| 110 | + - Fixed the issue with empty input for -foffload-static-lib option [8c8137f] |
| 111 | + - Fixed a problem with template instantiation during integration header |
| 112 | + generation [4ba61d0] |
| 113 | + - Fixed a problem which could happen when using a command lines with large |
| 114 | + numbers of files [87b94d5] |
| 115 | + - Fixed a crash when a kernel object field is an array of structures [b00fb7c] |
| 116 | + - Fixed issue which could prevent using of structures with constant-sized |
| 117 | + arrays as a kernel parameter [a4a7950] |
| 118 | + - Fixed a bug in the pass for lowering hierarchical parallelism code |
| 119 | + (SYCLLowerWGScope). Transformation was generating the code where work items |
| 120 | + hit the barrier in the loop different number of times which is illegal |
| 121 | + [a4a7950] |
| 122 | + - Fixed crash on attempt to use objects of `sycl::experimental::spec_constant` |
| 123 | + in the struct [d5a7f20] |
| 124 | + |
| 125 | +### SYCL Library |
| 126 | + - Fixed problem with waiting on the same events several times which could |
| 127 | + happen when using USM [9bf602c] |
| 128 | + - Fixed a memory leak of `sycl::event` objects happened when using USM |
| 129 | + specific `sycl::queue` methods [a285b9d] |
| 130 | + - Fixed problem which could lead to a crash or deadlock when using |
| 131 | + `sycl::handler::codeplay_host_task` extension [e911de7] |
| 132 | + - Workarounded the problem which happened when an application uses long kernel |
| 133 | + names [b1b8510] |
| 134 | + - Fixed race which could happen when submitting the same kernel from multiple |
| 135 | + threads [95d3ec6] |
| 136 | + - [CUDA BE] Fixed a memory leak related to unreleased events [d0a148a] |
| 137 | + - [CUDA BE] Fixed diagnostic on attempt to fetch profiling info for commands |
| 138 | + which profiling is not enabled for [76bf2ed] |
| 139 | + - [L0 BE] Fixed memory leaks of device objects [eae48f6][6acb812] |
| 140 | + - [CUDA BE] Fixed a problem with that several operations were not profiled |
| 141 | + if required [a420e7a] |
| 142 | + - Fixed a possible race which could happen when an application builds an |
| 143 | + object of the `sycl::program` or submits kernels from multiple threads |
| 144 | + [363ad5f] |
| 145 | + - Fixed a memory leak of queue and context handles, which happened when |
| 146 | + backend is not OpenCL [9ddca50] |
| 147 | + - [CUDA BE] Fixed 3 dimensional buffer device to device copy [d917446] |
| 148 | + - Fixed one of the `sycl::queue` constructors which was ignoring |
| 149 | + `sycl::property::queue::enable_profiling` property [7863c0b] |
| 150 | + - Fixed endless-loop in `sycl::intel::reduction` for the data types not having |
| 151 | + fast atomics in case of local size is 1 [e6b6ae7] |
| 152 | + - Fixed a compilation error which happened when using |
| 153 | + `sycl::interop_handle::get_native_mem` method with an object of |
| 154 | + `sycl::accessor` created for host target [280b93c] |
| 155 | + - Fixed a deadlock which could happen when multiple threads try to build a |
| 156 | + program simultaneously |
| 157 | + - Aligned `sycl::handler::set_arg` with the SYCL specification [a6465c9] |
| 158 | + - Fixed an issue which could lead to "No kernel named was found" exception |
| 159 | + when using `sycl::handler::set_arg` method [a08674e] |
| 160 | + - Fixed `sycl::device::get_info<cl::sycl::info::device::sub_group_sizes>` |
| 161 | + which was return incorrect data [e65841b] |
| 162 | + |
| 163 | + |
| 164 | +## API/ABI breakages |
| 165 | + - The memory_manager API has changed |
| 166 | + - Layout of internal classes for `sycl::sampler` and `sycl::stream` have been |
| 167 | + changed |
| 168 | + |
| 169 | +## Known issues |
| 170 | + - The format of the object files produced by the compiler can change between |
| 171 | + versions. The workaround is to rebuild the application. |
| 172 | + - The SYCL library doesn't guarantee stable API/ABI, so applications compiled |
| 173 | + with older version of the SYCL library may not work with new one. |
| 174 | + The workaround is to rebuild the application. |
| 175 | + [ABI policy guide](doc/ABIPolicyGuide.md) |
| 176 | + - Using `cl::sycl::program` API to refer to a kernel defined in another |
| 177 | + translation unit leads to undefined behavior |
| 178 | + - Linkage errors with the following message: |
| 179 | + `error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined` |
| 180 | + can happen when a SYCL application is built using MS Visual Studio 2019 |
| 181 | + version below 16.3.0 |
| 182 | + The workaround is to enable `-std=c++17` for the failing MSVC version. |
| 183 | + |
1 | 184 | # June'20 release notes
|
2 | 185 |
|
3 | 186 | Release notes for the commit range ba404be..24726df
|
|
0 commit comments