|
| 1 | +# May'21 release notes |
| 2 | + |
| 3 | +Release notes for commit range 2ffafb95f887..6a49170027fb |
| 4 | + |
| 5 | +## New features |
| 6 | + - [ESIMD] Allowed ESIMD and regular SYCL kernels to coexist in the same |
| 7 | + translation unit and in the same program. The `-fsycl-explicit-simd` option |
| 8 | + is no longer required for compiling ESIMD code and was deprecated. DPCPP RT |
| 9 | + implicitly appends `-vc-codegen` compile option for ESIMD images. |
| 10 | + - [ESIMD] Added indirect read and write methods to ESIMD class [8208427] |
| 11 | + - Provided `sycl::ONEAPI::has_known_identity` type trait to determine if |
| 12 | + reduction interface supports user-defined type at compile-time [0c7bd24] |
| 13 | + [060fd50] |
| 14 | + - Added support for multiple reduction items [c042f9e] |
| 15 | + - Added support for `+=`, `*=`, `|=`, `^=`, `&=` operations for custom type |
| 16 | + reducers [b249099] |
| 17 | + - Added SYCL 2020 `sycl::kernel_bundle` support [5af118a] [dcfb6b1] [ae45333] |
| 18 | + [8335e17] |
| 19 | + - Added `sycl/sycl.hpp` entry header in compliance with SYCL 2020 [5edb228] |
| 20 | + [24d179c] |
| 21 | + - Added `__LIBSYCL_[MAJOR|MINOR|PATCH]_VERSION` macros, see |
| 22 | + [PreprocessorMacros](doc/PreprocessorMacros.md) for more information |
| 23 | + [9f3a74c] |
| 24 | + - Added support for SYCL 2020 reductions with `read_write` access mode to |
| 25 | + reduction variables [733d5e3] |
| 26 | + - Added support for SYCL 2020 reductions with |
| 27 | + `sycl::property::reduction::initialize_to_identity` property [3473c1a] |
| 28 | + - Implemented zero argument version of `sycl::buffer::reinterpret()` for |
| 29 | + SYCL 2020 [c0c3c80] |
| 30 | + - Added an initial AOT implementation of the experimental matrix extension on |
| 31 | + the CPU device to target AMX hardware. Base features are supported [35db973] |
| 32 | + - Added support for |
| 33 | + [SYCL_INTEL_local_memory extension](doc/extensions/LocalMemory/SYCL_INTEL_local_memory.asciidoc) |
| 34 | + [5a66fcb] [9a734f6] |
| 35 | + - Documented [Level Zero backend](doc/extensions/LevelZeroBackend/LevelZeroBackend.md) |
| 36 | + [8994e6d] |
| 37 | + |
| 38 | +## Improvements |
| 39 | +### SYCL Compiler |
| 40 | + - Added support for math built-ins: `fmax`, `fmin`, `isinf`, `isfinite`, |
| 41 | + `isnormal`, `fpclassify` [1040b94] |
| 42 | + - The FPGA initiation interval attribute spelling `[[intel::ii]]` is |
| 43 | + deprecated. The new spelling is `[[intel::initiation_interval]]`. In |
| 44 | + addition, `[[intel::initiation_interval]]` may now be used as a function |
| 45 | + attribute, formerly its use was limited to statement attribute [b04e6a0] |
| 46 | + - Added support for function attribute `[[intel::disable_loop_pipelining]]` |
| 47 | + and `[[intel::max_concurrency(n)]]` [7324b3e] |
| 48 | + - Enabled `-fsycl-id-queries-fit-in-int` by default [f27bb01] |
| 49 | + - Added support for stdlib functions: `abs`, `labs`, `llabs`, `div`, `ldiv`, |
| 50 | + `lldiv` [86716c5] [2e9d33c] |
| 51 | + - Enabled range rounding for ESIMD kernels [25b482b] [bb20b7b] |
| 52 | + - Improved diagnostics on invalid kernel names [0c0f4c5] |
| 53 | + - Improved compilation time by combining device code compilation and |
| 54 | + integration header generation into one step [f110dd4] |
| 55 | + - Added support for `sycl::queue::mem_advise` for the CUDA backend [2b56ac9] |
| 56 | +### SYCL Library |
| 57 | + - Specialized atomic `fetch_add`, `fetch_min` and `fetch_max` for |
| 58 | + floating-point types [37a9a2a] [59ceaf4] |
| 59 | + - Added support for accessors to array types [7ed4f58] |
| 60 | + - Added sub-group information queries on CUDA [c36fa65] |
| 61 | + - Added support for `sycl::queue::barrier` in Level Zero plugin [7c31f90] |
| 62 | + - Improved runtime memory usage in Level Zero plugin [c9d71d4] [2ce2ca6] |
| 63 | + [46e3c64] |
| 64 | + - Added Level Zero interoperability with specifying of ownership [41221e2] |
| 65 | + - Improved runtime memory usage when using USM [461fa02] |
| 66 | + - Provided facility for user to control execution range rounding [f6ac45f] |
| 67 | + - Ensured correct access mode in `sycl::handler::copy()` method [b489479] |
| 68 | + - Disallowed for atomic accessors in `sycl::handler::copy()` method [14437db] |
| 69 | + - Provided move-assignability of `usm_allocator` class [05a805e] |
| 70 | + - Improved performance of copying data during native memory object creation |
| 71 | + on devices without host unified memory [ad8c9d1] |
| 72 | + - [ESIMD] Added implicit set up of fence before barrier as required by hardware |
| 73 | + [692228c] |
| 74 | + - Allowed for using of interoperability program constructor with multi-device |
| 75 | + context [c7f7674] |
| 76 | + - Allowed trace of Level Zero calls only with `SYCL_PI_TRACE=-1` [ea73219] |
| 77 | + - Added throw of `feature_not_supported` when when upon attempt to create |
| 78 | + program using `create_program_with_source` with Level Zero or CUDA [ba77e3a] |
| 79 | + - Added support for `inline` `cl` namespace in debugger [8e441d4] |
| 80 | + - Added support for build with GCC 7 [d8fea22] |
| 81 | + - Added in-memory caching of programs built with custom build options |
| 82 | + [86b0e8d] [e152b0d] |
| 83 | + - Improved range rounding heuristics [7efb692] |
| 84 | + - Added `get_backend` methods to SYCL classes [ee7e99f] |
| 85 | + - Added `sycl::sub_group::load` and `sycl::sub_group::store` versions that |
| 86 | + take raw pointers [248f550] |
| 87 | + - Enabled caching of devices in `sycl::device` interoperability constructors |
| 88 | + [d3aeb4a] |
| 89 | + - Added a warning on using SYCL 1.2.1 OpenCL interoperability API when |
| 90 | + compiling in SYCL 2020 mode. It can be suppressed by defining |
| 91 | + `SYCL2020_DISABLE_DEPRECATION_WARNINGS` [a249316] |
| 92 | + - Added support for blitter engine in Level Zero plugin. Some memory |
| 93 | + operations are submitted to a Level Zero copy queue now [11ba5b5] |
| 94 | + - Improved `sycl::INTEL::lsu::load` and `sycl::INTEL::lsu::store` to take |
| 95 | + `sycl::multi_ptr` [697469f] |
| 96 | + - Added a diagnostic on attempt to compile a SYCL application without dynamic |
| 97 | + C++ RT on Windows [d4180f4] |
| 98 | + - Added support for `Queue Order Properties` extension for Level Zero [50005c7] |
| 99 | + - Improved plugin discovery mechanism - if a plugin fails to initialize others |
| 100 | + will be discovered anyway [d513074] |
| 101 | + - Added support for `sycl::info::partition_affinity_domain::numa` in Level |
| 102 | + Zero plugin [2ba8e05] |
| 103 | +### Documentation |
| 104 | + - Updated TBB paths in `GetStartedGuide` [a9acb70] |
| 105 | + - Aligned linked allocation document with recent changes [22b9d01] |
| 106 | + - Updated `GetStartedGuide` for building with `libcxx` [d3a74c3] |
| 107 | + - Updated table of contents in `GetStartedGuide` [0f401bf] |
| 108 | + - Filled in address spaces handling section in design documentation [f782c2a] |
| 109 | + - Improved design document for program cache [ed4b4c4] |
| 110 | + - Updated compiler options [description](doc/UsersManual.md) [e56e576] |
| 111 | + - Updated |
| 112 | + [SYCL_INTEL_sub_group]doc/extensions/SubGroup/SYCL_INTEL_sub_group.asciidoc |
| 113 | + extension document to use `automatic` instead of `auto` [c4d08f5] |
| 114 | + |
| 115 | +## Bug fixes |
| 116 | +### SYCL Compiler |
| 117 | + - Suppressed link time warning on Windows that incorrectly diagnosed |
| 118 | + conflicting section names while linking device binaries [8e6a3ec] |
| 119 | + - Disabled code coverage for device compilations [12a0b11] |
| 120 | + - Fixed an issue when unbundling a fat static archive and targeting non-FPGA |
| 121 | + device [90c79c7] |
| 122 | + - Addressed inconsistencies when performing compilations by using the target |
| 123 | + triple for FPGA (`spir64_fpga-unknown-unknown-sycldevice`) vs using |
| 124 | + `-fintelfpga` [c9a65fc] |
| 125 | + - Fixed generation of the output report folder when performing FPGA AOT |
| 126 | + compilations from a previously generated AOCR archive [eab4791] |
| 127 | + - Addressed issues dealing with improper settings when performing |
| 128 | + preprocessing when offloading is enabled [d03de03] |
| 129 | + - Fixed issue when using `-fsycl-device-only` on Windows when specifying an |
| 130 | + output file with `/o` [d1d6c5d] |
| 131 | + - Fixed inlining functions called from an ESIMD kernel, which broke code |
| 132 | + generation in the Intel GPU vector back-end [65b459d] |
| 133 | + - Fixed JIT crash on ESIMD kernels compiled with `-fsycl-id-queries-fit-in-int` |
| 134 | + [ad86c34] |
| 135 | + - Fixed compiler crash on ESIMD kernels calling external functions with |
| 136 | + `gpu::simd` arguments [dfaaaed] |
| 137 | + - Fixed issue with generating preprocessed output when using |
| 138 | + `-fsycl-device-only` [3d2225a] |
| 139 | +### SYCL Library |
| 140 | + - Fixed race-condition happening on application exit [8eb00d7] [c9c1de9] |
| 141 | + - Fixed faulty behaviour that happened when accessing a buffer in different |
| 142 | + contexts using `discard_*` access mode [f75b439] |
| 143 | + - Fixed support for `SYCL_PROGRAM_LINK_OPTIONS` and |
| 144 | + `SYCL_PROGRAM_COMPILE_OPTIONS` environment variables when compiling/linking |
| 145 | + through `sycl::program` class [9d74846] |
| 146 | + - Fixed deadlock in Level Zero plugin when batching enabled [645db17] |
| 147 | + - Fixed possible stack overflow in Level Zero plugin [ec6fbe1] |
| 148 | + - Fixed issues with empty wait list in Level Zero plugin [d8c8e08] |
| 149 | + - Added missing `double3` and `double4` support in geometric function `cross()` |
| 150 | + [b8afff4] |
| 151 | + - Fixed issue when using `std::vector<bool> &` argument for |
| 152 | + `sycl::buffer::set_final_data()` method [084d83a, 2a751bd] |
| 153 | + - Fixed support for `long long` in `sycl::vec::convert()` on Windows [5b49cd3] |
| 154 | + - Aligned local and image accessor with specification by allowing for property |
| 155 | + list in their constructor [88fab25] |
| 156 | + - Fixed support for offset in `parallel_for` for host device [1958715] |
| 157 | + - Added missing constructors for `sycl::buffer` class [bdfad9e] |
| 158 | + - Fixed coordinate conversion for `sampler` class on host device [cd6529f] |
| 159 | + - Fixed support for local accessors in debugger [fdacb75] |
| 160 | + - Fixed dropping of kernel attributes when execution range rounding is used |
| 161 | + [496f9a0] [677a7ea] |
| 162 | + - Added support for interoperability tasks that use `get_mem()` methods with |
| 163 | + Level Zero plugin [149f08d] |
| 164 | + - Fixed sub-device caching in the Level Zero plugin [0b18b49] |
| 165 | + - Fixed `get_native` methods to retain reference counter in case of OpenCL |
| 166 | + backend [ee7e99f] |
| 167 | + - Fixed sporadic failure happening due to illegal destruction of events before |
| 168 | + they have been signaled [2a76b2a] |
| 169 | + - Resolved a pinned host memory specific performance regression on CUDA that |
| 170 | + was introduced with the host unified behavior dependent logic [3be63ab] |
| 171 | + - Fixed illegal accesses that could happen when an application that uses host |
| 172 | + tasks exits without waiting for host tasks completion [552a521] |
| 173 | + - Fixed `sycl::event::get_info` queries that were working incorrectly when |
| 174 | + called on event without an encapsulated native handle [5d5a792] |
| 175 | + - Fixed compilation error with using multidimensional subscript for |
| 176 | + `sycl::accessor` with atomic access mode [0bfd34e] |
| 177 | + - Fixed a crash that happened when an accessor passed to a reduction was |
| 178 | + destroyed immediately after [b80f13e] |
| 179 | + - Fixed `sycl::device::get_info` with `sycl::info::device::max_mem_alloc_size` |
| 180 | + which was returning incorrect value in case of Level Zero backend [8dbaa53] |
| 181 | + |
| 182 | +## API/ABI breakages |
| 183 | +- None |
| 184 | + |
| 185 | +## Known issues |
| 186 | + - GlobalWorkOffset is not supported by Level Zero backend [6f9e9a76] |
| 187 | + - User-defined functions with the same name and signature (exact match of |
| 188 | + arguments, return type doesn't matter) as of an OpenCL C built-in |
| 189 | + function, can lead to Undefined Behavior. |
| 190 | + - A DPC++ system that has FPGAs installed does not support multi-process |
| 191 | + execution. Creating a context opens the device associated with the context |
| 192 | + and places a lock on it for that process. No other process may use that |
| 193 | + device. Some queries about the device through device.get_info<>() also |
| 194 | + open up the device and lock it to that process since the runtime needs |
| 195 | + to query the actual device to obtain that information. |
| 196 | + - The format of the object files produced by the compiler can change between |
| 197 | + versions. The workaround is to rebuild the application. |
| 198 | + - Using `sycl::program`/`sycl::kernel_bundle` API to refer to a kernel defined |
| 199 | + in another translation unit leads to undefined behavior |
| 200 | + - Linkage errors with the following message: |
| 201 | + `error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined` |
| 202 | + can happen when a SYCL application is built using MS Visual Studio 2019 |
| 203 | + version below 16.3.0 and user specifies `-std=c++14` or `/std:c++14`. |
| 204 | + - Printing internal defines isn't supported on Windows [50628db] |
| 205 | + |
1 | 206 | # January'21 release notes
|
2 | 207 |
|
3 | 208 | Release notes for commit range 5eebd1e4bfce..2ffafb95f887
|
|
0 commit comments