Releases · ngxson/llama.cpp

15 Nov 22:01

4047be7

b4093

scripts: update compare-llama-bench.py (#10319)

Assets 21

15 Nov 15:45

github-actions

b4091

09ecbcb

b4091

cmake : fix ppc64 check (whisper/0)

ggml-ci

Assets 21

15 Nov 13:38

github-actions

b4088

1842922

b4088

AVX BF16 and single scale quant optimizations (#10212)

* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge

Assets 21

15 Nov 11:37

github-actions

b4085

9901068

b4085

server : (web UI) add copy button for code block, fix api key (#10242)

* server : (web ui) add copy btn for code blocks

* fix problem with api key

* use settings-modal-short-input component

* always show copy btn for code snippet

Assets 21

15 Nov 04:44

github-actions

b4082

5a54af4

b4082

sycl: Use syclcompat::dp4a (#10267)

* sycl: Use syclcompat::dp4a

* Using the syclcompat version allow the compiler to optimize the
  operation with native function

* Update news section

* Update CI Windows oneAPI version to 2025.0

* Reword doc

* Call syclcompat::dp4a inside dpct::dp4a

This reverts commit 90cb61d692d61360b46954a1c7f780bd2e569b73.

Assets 21

15 Nov 01:43

github-actions

b4081

1607a5e

b4081

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921)

* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels

---------

Co-authored-by: Diego Devesa <[email protected]>

Assets 22

14 Nov 18:50

github-actions

b4080

ae8de6d

b4080

ggml : build backends as libraries (#10256)

* ggml : build backends as libraries

---------

Signed-off-by: Xiaodong Ye <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: R0CKSTAR <[email protected]>

Assets 22

14 Nov 13:41

github-actions

b4079

4a8ccb3

b4079

CUDA: no -sm row for very small matrices (#10185)

Assets 22

14 Nov 11:38

github-actions

b4078

2a82891

b4078

speculative : fix out-of-bounds access (#10289)

Assets 22

14 Nov 06:41

github-actions

b4077

af148c9

b4077

vulkan: Optimize binary ops (#10270)

Reuse the index calculations across all of src0/src1/dst. Add a shader
variant for when src0/src1 are the same dimensions and additional modulus
for src1 aren't needed. Div/mod are slow, so add "fast" div/mod that
have a fast path when the calculation isn't needed or can be done more
cheaply.

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b4093

Uh oh!

b4091

Uh oh!

b4088

Uh oh!

b4085

Uh oh!

b4082

Uh oh!

b4081

Uh oh!

b4080

Uh oh!

b4079

Uh oh!

b4078

Uh oh!

b4077

Uh oh!