Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5573
sycl: quantize and reorder the input to q8_1 when reorder is enabled …
b5572
gguf: fix failure on version == 0 (#13956)
b5571
convert : fix nomic-bert-moe mask token (#13757)
b5569
ggml: check if non-native endian model is being loaded (#13943) * gguf: prevent non-native endian models from being loaded Signed-off-by: Aaron Teo <[email protected]> * gguf: update error message Signed-off-by: Aaron Teo <[email protected]> * gguf: make the non-native endian check more verbose Signed-off-by: Aaron Teo <[email protected]> * ggml: move ggml_assert location Signed-off-by: Aaron Teo <[email protected]> * ggml: reword the endianness check error message Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
b5568
sync : ggml ggml-ci
b5560
parallel : fix n_junk == 0 (#13952)
b5559
kv-cache : split implementation in separate sources (#13920) ggml-ci
b5558
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…
b5556
server: allow unclosed thinking tags (#13931)
b5555
llama : deprecate explicit kv_self defrag/update calls (#13921) ggml-ci