Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5576
`server`: update deepseek reasoning format (pass reasoning_content as…
b5575
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961) * mtmd : fix memory in mtmd_helper_eval_chunk_single * mtmd-cli : fix mem leak * Update tools/mtmd/mtmd-cli.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
b5574
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13…
b5573
sycl: quantize and reorder the input to q8_1 when reorder is enabled …
b5572
gguf: fix failure on version == 0 (#13956)
b5571
convert : fix nomic-bert-moe mask token (#13757)
b5569
ggml: check if non-native endian model is being loaded (#13943) * gguf: prevent non-native endian models from being loaded Signed-off-by: Aaron Teo <[email protected]> * gguf: update error message Signed-off-by: Aaron Teo <[email protected]> * gguf: make the non-native endian check more verbose Signed-off-by: Aaron Teo <[email protected]> * ggml: move ggml_assert location Signed-off-by: Aaron Teo <[email protected]> * ggml: reword the endianness check error message Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
b5568
sync : ggml ggml-ci
b5560
parallel : fix n_junk == 0 (#13952)
b5559
kv-cache : split implementation in separate sources (#13920) ggml-ci