Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b5579
server : disable speculative decoding for SWA models (#13970) * server : use swa-full fo draft context ggml-ci * server : disable speculative decoding for SWA models
b5577
gemma : more consistent attention scaling for v2 and v3 (#13951) * gemma : fix attn scale for 27B * cont : apply scale before attn * cont : consistent attention scaling
b5576
`server`: update deepseek reasoning format (pass reasoning_content as…
b5575
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961) * mtmd : fix memory in mtmd_helper_eval_chunk_single * mtmd-cli : fix mem leak * Update tools/mtmd/mtmd-cli.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
b5574
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13…
b5573
sycl: quantize and reorder the input to q8_1 when reorder is enabled …
b5572
gguf: fix failure on version == 0 (#13956)
b5571
convert : fix nomic-bert-moe mask token (#13757)
b5569
ggml: check if non-native endian model is being loaded (#13943) * gguf: prevent non-native endian models from being loaded Signed-off-by: Aaron Teo <[email protected]> * gguf: update error message Signed-off-by: Aaron Teo <[email protected]> * gguf: make the non-native endian check more verbose Signed-off-by: Aaron Teo <[email protected]> * ggml: move ggml_assert location Signed-off-by: Aaron Teo <[email protected]> * ggml: reword the endianness check error message Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
b5568
sync : ggml ggml-ci