Highlights
OpenMP Parallelization:
qsort
, argsort
and keyvalue_qsort
routines now support multi-threading with OpenMP. This speeds up sorting medium to large arrays by 3-4x on both AVX-512 and AVX2. OpenMP is not enabled by default and needs to configured with Meson while building. Please refer to the README for details.
This feature has been contributed upstream to NumPy and is expected to be included in the upcoming 2.3.0 release. As with x86-simd-sort, OpenMP support in NumPy is not enabled by default; you will need to configure the Meson build system to enable OpenMP when building NumPy.
Miscellaneous
- Accelerate
qsort
,qselect
andpartial_qsort
for_Float16
on ICX. - Resolved a performance regression for 16-bit data types caused by the compiler dynamically allocating constant arrays.
- Improve
argsort
performance for already sorted arrays by adding early detection.
List of PR's merged
- Fix for MMX instructions being generated without emms by @sterrettm2 in #172
- Expose C-API's for some of the keyvalue qsort by @r-devulap in #173
- Fix and Cleanup C API code by @sterrettm2 in #181
- Fix kvsort/kvselect nan behavior and added tests for mixed nan/inf arrays by @sterrettm2 in #178
- Add defensive EMMS instructions to each SIMD sort function by @sterrettm2 in #183
- Hopefully fix scorecard.yml actions by @sterrettm2 in #186
- Update readme file for static methods by @r-devulap in #185
- Fix avx512fp16 build by @sterrettm2 in #187
- Change scorecard runner to ubuntu-latest by @sterrettm2 in #188
- Change 16-bit swizzle from vector to C arrays by @sterrettm2 in #190
- Adds OpenMP to qsort, should also improve test speed a bit by @sterrettm2 in #179
- Readme fix: argsort doesn't do 16-bit by @sterrettm2 in #192
- Fixes some tests/bugs, and adds a build with sanitizers, add ASAN CI run by @sterrettm2 in #182
- Add OpenMP support to argsort by @sterrettm2 in #195
- Add early exit for argsort if array is already sorted by @r-devulap in #197
- Avoid use of _mm512_set_epi16 for reversing 16-bit vectors by @sterrettm2 in #198
- Add Meson dependency declaration for use as a subproject by @blazingpretzel in #199
- Enable fp16 nonnative support for dynamic dispatch, make more ergonomic for static dispatch by @sterrettm2 in #200
- New helper function to determine number of threads when using openMP by @r-devulap in #203
- Prep for v7.0 and refactor openMP build flags by @r-devulap in #204
New Contributors
- @blazingpretzel made their first contribution in #199
Full Changelog: v6.0...v7.0