Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4231
build: update Makefile comments for C++ version change (#10598)
b4230
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…
b4227
vulkan: Dynamic subgroup size support for Q6_K mat_vec (#10536) * subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45) * force 16 sequential threads per block * make 16 subgroup size a constant
b4226
ggml : move AMX to the CPU backend (#10570) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <[email protected]>
b4224
imatrix : support combine-only (#10492) * imatrix-combine-only idea * ensured that behavior consistent with log
b4222
ggml : fix I8MM Q4_1 scaling factor conversion (#10562) ggml-ci
b4221
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (#10580)
b4220
sycl : offload of get_rows set to 0 (#10432)
b4219
sycl : Reroute permuted mul_mats through oneMKL (#10408) This PR fixes the failing MUL_MAT tests for the sycl backend.
b4218
CANN: RoPE operator optimization (#10563) * [cann] RoPE operator optimization * [CANN]Code Formatting --------- Co-authored-by: noemotiovon <[email protected]>