Skip to content

Releases: ngxson/llama.cpp

b4093

15 Nov 22:01
4047be7
Compare
Choose a tag to compare
scripts: update compare-llama-bench.py (#10319)

b4091

15 Nov 15:45
Compare
Choose a tag to compare
cmake : fix ppc64 check (whisper/0)

ggml-ci

b4088

15 Nov 13:38
1842922
Compare
Choose a tag to compare
AVX BF16 and single scale quant optimizations (#10212)

* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge

b4085

15 Nov 11:37
9901068
Compare
Choose a tag to compare
server : (web UI) add copy button for code block, fix api key (#10242)

* server : (web ui) add copy btn for code blocks

* fix problem with api key

* use settings-modal-short-input component

* always show copy btn for code snippet

b4082

15 Nov 04:44
5a54af4
Compare
Choose a tag to compare
sycl: Use syclcompat::dp4a (#10267)

* sycl: Use syclcompat::dp4a

* Using the syclcompat version allow the compiler to optimize the
  operation with native function

* Update news section

* Update CI Windows oneAPI version to 2025.0

* Reword doc

* Call syclcompat::dp4a inside dpct::dp4a

This reverts commit 90cb61d692d61360b46954a1c7f780bd2e569b73.

b4081

15 Nov 01:43
1607a5e
Compare
Choose a tag to compare
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921)

* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels

---------

Co-authored-by: Diego Devesa <[email protected]>

b4080

14 Nov 18:50
ae8de6d
Compare
Choose a tag to compare
ggml : build backends as libraries (#10256)

* ggml : build backends as libraries

---------

Signed-off-by: Xiaodong Ye <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: R0CKSTAR <[email protected]>

b4079

14 Nov 13:41
4a8ccb3
Compare
Choose a tag to compare
CUDA: no -sm row for very small matrices (#10185)

b4078

14 Nov 11:38
2a82891
Compare
Choose a tag to compare
speculative : fix out-of-bounds access (#10289)

b4077

14 Nov 06:41
af148c9
Compare
Choose a tag to compare
vulkan: Optimize binary ops (#10270)

Reuse the index calculations across all of src0/src1/dst. Add a shader
variant for when src0/src1 are the same dimensions and additional modulus
for src1 aren't needed. Div/mod are slow, so add "fast" div/mod that
have a fast path when the calculation isn't needed or can be done more
cheaply.