Skip to content

Releases: ngxson/llama.cpp

b5579

02 Jun 19:06
3637576
Compare
Choose a tag to compare
server : disable speculative decoding for SWA models (#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

b5577

02 Jun 18:11
5582c49
Compare
Choose a tag to compare
gemma : more consistent attention scaling for v2 and v3 (#13951)

* gemma : fix attn scale for 27B

* cont : apply scale before attn

* cont : consistent attention scaling

b5576

02 Jun 17:46
c9bbc77
Compare
Choose a tag to compare
`server`: update deepseek reasoning format (pass reasoning_content as…

b5575

02 Jun 14:49
bfd3227
Compare
Choose a tag to compare
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961)

* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b5574

02 Jun 12:40
093e3f1
Compare
Choose a tag to compare
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13…

b5573

02 Jun 09:32
663445b
Compare
Choose a tag to compare
sycl: quantize and reorder the input to q8_1 when reorder is enabled …

b5572

01 Jun 16:42
7675c55
Compare
Choose a tag to compare
gguf: fix failure on version == 0 (#13956)

b5571

01 Jun 16:30
5e1c3ae
Compare
Choose a tag to compare
convert : fix nomic-bert-moe mask token (#13757)

b5569

01 Jun 15:13
e57bb87
Compare
Choose a tag to compare
ggml: check if non-native endian model is being loaded (#13943)

* gguf: prevent non-native endian models from being loaded

Signed-off-by: Aaron Teo <[email protected]>

* gguf: update error message

Signed-off-by: Aaron Teo <[email protected]>

* gguf: make the non-native endian check more verbose

Signed-off-by: Aaron Teo <[email protected]>

* ggml: move ggml_assert location

Signed-off-by: Aaron Teo <[email protected]>

* ggml: reword the endianness check error message

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

b5568

01 Jun 11:44
Compare
Choose a tag to compare
sync : ggml

ggml-ci