|
| 1 | +# July'21 release notes |
| 2 | + |
| 3 | +Release notes for commit range 6a49170027fb..962909fe9e78 |
| 4 | + |
| 5 | +## New features |
| 6 | + - Implemented SYCL 2020 specialization constants [07b27965] [ba3d657] |
| 7 | + [bd8dcf4] [d15b841] |
| 8 | + - Provided SYCL 2020 function objects [24a2ad89] |
| 9 | + - Added support for ITT notification in SYCL Runtime [a7b8daf] [8d3921e3] |
| 10 | + - Implemented SYCL 2020 sub_group algorithms [e8caf6c3] |
| 11 | + - Implemented SYCL 2020 `sycl::handler::host_task` method [75e5a269] |
| 12 | + - Implemented SYCL 2020 `sycl::sub_group` class [19dcac79] |
| 13 | + - Added support for AMD GPU devices [ec612228] |
| 14 | + - Implemented SYCL 2020 `sycl::is_device_copyable` type trait [44c1cbcd] |
| 15 | + - Implemented SYCL 2020 USM features [1df6873d] |
| 16 | + - Implemented support for Device UUID from [Intel's Extensions for Device Information](doc/extensions/IntelGPU/IntelGPUDeviceInfo.md) [25aee287] |
| 17 | + - Implemented SYCL 2020 `sycl::atomic_fence` [dcd59547] |
| 18 | + - Implemented `intel::loop_count_max`, `intel::loop_count_max`, |
| 19 | + `intel::loop_count_avg` attributes that allow to specify number of loop |
| 20 | + iterations for FPGA [f74b4ef] |
| 21 | + - Implemented generation of compiler report for kernel arguments [201f902] |
| 22 | + - Implemented SYCL 2020 `[[reqd_sub_group_size]]` attribute [347e41c] |
| 23 | + - Implemented support for `[[intel::named_sub_group_size(primary)]]` attribute |
| 24 | + from [sub-group extension](doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc#attributes) |
| 25 | + [347e41c] |
| 26 | + - Implemented SYCL 2020 interoperability API [e6733e4] |
| 27 | + - Added [group sorting algorithm](doc/extensions/GroupAlgorithms/SYCL_INTEL_group_sort.asciidoc) |
| 28 | + extension specification [edaee9b] |
| 29 | + - Added [initial draft](doc/extensions/LevelZeroBackend/LevelZeroBackend.md) |
| 30 | + for querying of free device memory in LevelZero backend extension [fa428bf] |
| 31 | + - Added [InvokeSIMD](doc/extensions/InvokeSIMD/InvokeSIMD.asciidoc) and |
| 32 | + [Uniform](doc/extensions/Uniform/Uniform.asciidoc) extensions [72e1611] |
| 33 | + - Added [Matrix Programming Extension for DPC++ document](doc/extensions/Matrix/dpcpp-joint-matrix.asciidoc) [ace4c733] |
| 34 | + - Implemented SYCL 2020 `sycl::span` [9356d53] |
| 35 | + - Added [device-if](doc/extensions/DeviceIf/device_if.asciidoc) extension |
| 36 | + [4fb95fc] |
| 37 | + - Added a [programming guide](doc/MultiTileCardWithLevelZero.md) for |
| 38 | + multi-tile and multi-card under Level Zero backend [d581178a] |
| 39 | + - Implemented SYCL 2020 `sycl::bit_cast` [d4b66bd] |
| 40 | + |
| 41 | +## Improvements |
| 42 | +### SYCL Compiler |
| 43 | + - Use `opencl-aot` instead of `aoc` when AOT flow for FPGA Emulator is |
| 44 | + triggered [3a99558] |
| 45 | + - Allowed for using an external host compiler [6f0ad1a] |
| 46 | + - Cleaned up preprocessing output when `-fsycl` option is passed [3a18db6] |
| 47 | + - Allowed kernel names in anonymous namespace [e47dbad] |
| 48 | + - Set default value of -sycl-std to 2020 for SYCL enabled compilations |
| 49 | + [680adc0] |
| 50 | + - Added implication of `-fPIC` compilation for wrapped object when using |
| 51 | + `-shared` [1754934] |
| 52 | + - Added a diagnostic for `-fsycl` and `-ffreestanding` as non-supported |
| 53 | + combination [a36c6720] |
| 54 | + - [ESIMD] Renamed `simd::format` to `simd::bit_cast_view` [653dede1] |
| 55 | + - Allowed `[[sycl::work_group_size_hint]]` to accept constant expr args |
| 56 | + [ef8e4019] |
| 57 | + - Deprecated the old-style SYCL attributes according to the SYCL 2020 spec |
| 58 | + [001bbd42] |
| 59 | + - Deprecated `[[intel::reqd_work_group_size]]` attribute spelling, please use |
| 60 | + `[[sycl::reqd_work_group_size]]` instead [8ef7eacc] |
| 61 | + - Enabled native FP atomics by default. Defining the |
| 62 | + `SYCL_USE_NATIVE_FP_ATOMICS` macro explicitly is no longer required - it is |
| 63 | + now automatically defined for hardware targets with "native" support for |
| 64 | + atomic functions. [0bbb68ee] |
| 65 | + - Switched to ignoring `-O0` option for device code when compiling for FPGA |
| 66 | + with hardware [7d94edf4] |
| 67 | + - Allowed for known aliases to be used for `-fsycl-targets`. Passing |
| 68 | + `*-unknown-unknown-sycldevice` components of the SYCL target triple is no |
| 69 | + longer necessary. [9778952a] |
| 70 | + - [ESIMD] Added support for half type in ESIMD intrinsics [d5958ebf] |
| 71 | + - Implemented `sycl::kernel::get_kernel_bundle` method [69a68a6d] |
| 72 | + - Added a diagnostic in case of timing issues for FPGA AOT [c69a3115] |
| 73 | + - Added support for C `memcpy` usages in the device code [76051ccf] |
| 74 | + - [ESIMD] Added support for vectorizing scalar function [3fc66cc] |
| 75 | + - Disabled vectorization and loop transformation passes because loop unrolling |
| 76 | + in "SYCL optimization mode" used default heuristic, which is tuned the code |
| 77 | + for CPU and might not have been profitable for other devices [ff6929e6] |
| 78 | +### SYCL Library |
| 79 | + - Added an exception throw if no matched device is found when |
| 80 | + `SYCL_DEVICE_FILTER` is set regardless of `device_selector` used [ef4e6dd] |
| 81 | + - Changed event status update to complete without waiting when run on CUDA |
| 82 | + devices [be7c1cb] |
| 83 | + - Improved performance when executing with dynamic batching on Level Zero |
| 84 | + backend [fa382d6] |
| 85 | + - Introduced pooling for USM and buffer allocations in Level Zero backend |
| 86 | + [4cffedd] |
| 87 | + - Added support for vectors with length of 3 and 16 elements in sub-group load |
| 88 | + and store operations [4e6452d] |
| 89 | + - Added interop types for images for Level Zero and OpenCL backends [a58cfef] |
| 90 | + - Improved plugins discovery - continue discovering even if a plugin fails to |
| 91 | + load [8c07803] |
| 92 | + - Implemented queries for IEEE rounded `sqrt`/`div` in Level Zero backend |
| 93 | + [91b35c4] |
| 94 | + - Added SYCL 2020 `interop_handle::get_backend()` method [041ca27] |
| 95 | + - [ESIMD] Deprecated `block_load`/`block_store` and |
| 96 | + `simd::copy_from`/`simd::copy_to` [5c41ed6] |
| 97 | + - Allowed for `const` and `volatile` pointer in sub-group `load` operation |
| 98 | + [50edee4] |
| 99 | + - Replaced use of `interop<>` with SYCL 2020 `backend_return_t<>` in |
| 100 | + `interop_handle` [d08c21a] |
| 101 | + - [ESIMD] Moved ESIMD APIs to `sycl::ext::intel::experimental::esimd` namespace |
| 102 | + [92da579] |
| 103 | + - Added global offset support for Level Zero backend [9ca2f911] |
| 104 | + - [ESIMD] Changed `simd::replicate` API by adding suffixes into the names to |
| 105 | + reflect the order of template arguments [e45408ad] |
| 106 | + - Introduced `SYCL_REDUCTION_DETERMINISTIC` macro which forces reduction |
| 107 | + algorithms to produce stable results [a3fc51a4] |
| 108 | + - Improved `SYCL_DEVICE_ALLOWLIST` format [9216b49d] |
| 109 | + - Added `SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING` macro to disable range |
| 110 | + rounding [5c4275ac] |
| 111 | + - Disabled range rounding by default when compiling for FPGA [5c4275ac] |
| 112 | + - Deprecated `sycl::buffer::get_count()`, please use `sycl::buffer::size()` |
| 113 | + instead [baf2ed9d] |
| 114 | + - Implemented `sycl::group_barrier` free function [48363902] |
| 115 | + - Added support of [SYCL_INTEL_enqueue_barrier extension](doc/extensions/EnqueueBarrier/enqueue_barrier.asciidoc) for CUDA backend [2e978482] |
| 116 | + - Deprecated `has_extension` method of `sycl::device` and `sycl::platform` |
| 117 | + classes, please use `has` method with aspects APIs instead [51c747da] |
| 118 | + - Deprecated `sycl::*_class` types, please use STL classes instead [51c747da] |
| 119 | + - Deprecated `sycl::ndrange` with an offset [51c747da] |
| 120 | + - Deprecated `barrier` and `mem_fence` methods of `sycl::nd_item` class, |
| 121 | + please use `sycl::group_barrier()` and `sycl::atomic_fence()` free functions |
| 122 | + instead [51c747da] |
| 123 | + - Deprecated `sycl::byte`, please use `std::byte` instead [51c747da] |
| 124 | + - Deprecated `sycl::info::device::max_constant_buffer_size` and |
| 125 | + `sycl::info::device::max_constant_args` [51c747da] |
| 126 | + - Deprecated `sycl::ext::intel::fpga_reg` taking non-trivially copyable |
| 127 | + structs [b4c322a8] |
| 128 | + - Added support for `sycl::property::queue::cuda::use_default_stream` queue |
| 129 | + property [08330525] |
| 130 | + - Switched to using atomic version of reductions if `sycl::aspect::atomic64` |
| 131 | + is available for a target [544fb7c8] |
| 132 | + - Added support for `sycl::aspect::fp16` for CUDA backend [db20bab3] |
| 133 | + - Deprecated `sycl::aspect::usm_system_allocator`, please use |
| 134 | + `sycl::aspect::usm_system_allocations` instead [000cc82d] |
| 135 | + - Optimized `sycl::queue::wait` to wait for batch of events rather than |
| 136 | + waiting for each event individually [7fe72dba] |
| 137 | + - Deprecated `sycl::ONEAPI::atomic_fence`, please use `sycl::atomic_fence` |
| 138 | + instead [dcd59547] |
| 139 | + - Added constexpr constructor for `sycl::half` type [5759e2a1] |
| 140 | + - Added support for more than 4Gb device allocations in Level Zero backend |
| 141 | + [fb1808b8] |
| 142 | + |
| 143 | +### Documentation |
| 144 | + - Updated [sub-group algoritms](doc/extensions/SubGroupAlgorithms/SYCL_INTEL_sub_group_algorithms.asciidoc) |
| 145 | + extension to use `marray` instead of `vec` [98715ae] |
| 146 | + - Updated data flow pipes extension to be based on SYCL 2020 [f22f2e0] |
| 147 | + - Updated [ESIMD documentation](doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md) |
| 148 | + reflecting recent API changes [1e0bd1ed] |
| 149 | + - Updated [devicelib](doc/extensions/C-CXX-StandardLibrary/C-CXX-StandardLibrary.rst) |
| 150 | + extension document with `scalnbn`, `abs` and `div` (and their variants) as |
| 151 | + supported [febfb5a] |
| 152 | + - Addressed renaming of TBB dll to `tbb12.dll` in the |
| 153 | + [install script](tools/install.bat) [25433ba] |
| 154 | + |
| 155 | +## Bug fixes |
| 156 | +### SYCL Compiler |
| 157 | + - Fixed crash which could happen in corner cases when null attribute created |
| 158 | + [cec6469] |
| 159 | + - Fixed crash when lowering `__sycl_alocateLocalMemory` [4960e71] |
| 160 | + - Fixed workflow for multi-file compilation in AOT mode [a0099a5] |
| 161 | + - Fixed problem with unbundling from object for device archives for FPGA |
| 162 | + [25ea6e1] |
| 163 | + - Stopped implying `defaultlib msvcrt` for Linux based driver on Windows |
| 164 | + [d3dc212d] |
| 165 | + - Fixed handling of `[[intel::max_global_work_dim()]]` in case of |
| 166 | + redeclarations [9b615928] |
| 167 | + - Fixed incorrect diagnostics in the presence of OpenMP [cbec0b5f] |
| 168 | + - Fixed an issue with incorrect output project report when using `-o` option |
| 169 | + with FPGA AOT enabled [18ac1723] |
| 170 | + - Removed restriction that was preventing from applying |
| 171 | + `[[intel::use_stall_enable_clusters]]` attribute to ANY function [15da879d] |
| 172 | + - Fixed bugs with recursion in SYCL kernels - diagnostics won't be emitted on |
| 173 | + using recursion in a discarded branch and in constexpr context [9a9a018c] |
| 174 | + - Fixed handling of `intel::use_stall_enable_clusters` attribute [06e4ebc7] |
| 175 | +### SYCL Library |
| 176 | + - Fixed build issue when CUDA 11 is used [f7224f1] |
| 177 | + - Fixed caching of sub-devices in Level Zero backend[4c34f93] |
| 178 | + - Fixed requesting of USM memory allocation info on CUDA [691f842] |
| 179 | + - Fixed [`joint_matrix_mad`](doc/extensions/Matrix/dpcpp-joint-matrix.asciidoc) |
| 180 | + behaviour to return `A*B+C` instead of assigning the result to `C` [ea59c2b] |
| 181 | + - Workaround an issue in Level Zero backend when event isn't waited upon its |
| 182 | + completion but is queried for its status in an infinite loop [bfef316] |
| 183 | + - Fixed persistent cache key comparison (esp. when there is `\0` symbol) |
| 184 | + [3e9ed1d] |
| 185 | + - Fixed a build issue when `sycl::kernel::get_native` is used [eb17836] |
| 186 | + - Fixed collisions of helper functions and SPIR-V operations when building with |
| 187 | + `-O0` or `-O1` [9f2fd98] [c2d6cfa] |
| 188 | + - [OpenCL] Fixed false-positive assertion trigger when allocation alignment is |
| 189 | + expected [3351916ad] |
| 190 | + - Aligned behavior of empty command groups with SYCL 2020 [1cf697bd] |
| 191 | + - Fixed build options handling when they come from different sources |
| 192 | + [67411472] |
| 193 | + - Fixed host task CUDA native memory handle [e9cf124b6] |
| 194 | + - Fixed a memory leak which could happen if a command submission fails |
| 195 | + [67eac4bd] |
| 196 | + - Fixed support for math functions `floor/rndd/rndu/rndz/rnde` in ESIMD mode |
| 197 | + [de694dd8] |
| 198 | + - Fixed memory allocations for multi-device contexts on Level Zero [f83c9356a] |
| 199 | + - Renamed `sycl::property::no_init` property to `sycl::property::no_init` in |
| 200 | + accordance to final SYCL 2020 specification, the old spelling is deprecated |
| 201 | + [ad46b641] |
| 202 | + - Use local size specified in `[[sycl::reqd_work_group_size]]` if no local |
| 203 | + size explicitly passed [0a54bef2] |
| 204 | + - Disabled persistent device code caching by default since it doesn't reliably |
| 205 | + identify driver version change [48f6bc9e] |
| 206 | + - [ESIMD] Fixed a bug in `simd_view::operator--` [ccc97e23] |
| 207 | + - Fixed a memory leak for host USM allocations [c18c3456] |
| 208 | + - Fixed possible crashes that could happen when `sycl::free` is called while |
| 209 | + there are still running kernels [c74f05d6] |
| 210 | + |
| 211 | +## API/ABI breakages |
| 212 | + - None |
| 213 | + |
| 214 | +## Known issues |
| 215 | + - [new] The compiler generates a temporary source file which is used during |
| 216 | + host compilation. This source file will appear to be a source dependency |
| 217 | + and could break build environments (such as Bazel) which closely keeps track |
| 218 | + of the generated files during a compilation. Build environments such as |
| 219 | + these will need to be configured in the DPC++ space to expect an additional |
| 220 | + intermediate file to be part of the compilation flow. |
| 221 | + - User-defined functions with the name and signature matching those of any |
| 222 | + OpenCL C built-in function (i.e. an exact match of arguments, return type |
| 223 | + doesn't matter) can lead to Undefined Behavior. |
| 224 | + - A DPC++ system that has FPGAs installed does not support multi-process |
| 225 | + execution. Creating a context opens the device associated with the context |
| 226 | + and places a lock on it for that process. No other process may use that |
| 227 | + device. Some queries about the device through device.get_info<>() also |
| 228 | + open up the device and lock it to that process since the runtime needs |
| 229 | + to query the actual device to obtain that information. |
| 230 | + - The format of the object files produced by the compiler can change between |
| 231 | + versions. The workaround is to rebuild the application. |
| 232 | + - Using `sycl::program`/`sycl::kernel_bundle` API to refer to a kernel defined |
| 233 | + in another translation unit leads to undefined behavior |
| 234 | + - Linkage errors with the following message: |
| 235 | + `error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined` |
| 236 | + can happen when a SYCL application is built using MS Visual Studio 2019 |
| 237 | + version below 16.3.0 and user specifies `-std=c++14` or `/std:c++14`. |
| 238 | + - Printing internal defines isn't supported on Windows [50628db] |
| 239 | + |
1 | 240 | # May'21 release notes
|
2 | 241 |
|
3 | 242 | Release notes for commit range 2ffafb95f887..6a49170027fb
|
|
0 commit comments