Skip to content

Releases: ggml-org/llama.cpp

b5571

01 Jun 16:43
5e1c3ae
Compare
Choose a tag to compare
convert : fix nomic-bert-moe mask token (#13757)

b5569

01 Jun 15:10
e57bb87
Compare
Choose a tag to compare
ggml: check if non-native endian model is being loaded (#13943)

* gguf: prevent non-native endian models from being loaded

Signed-off-by: Aaron Teo <[email protected]>

* gguf: update error message

Signed-off-by: Aaron Teo <[email protected]>

* gguf: make the non-native endian check more verbose

Signed-off-by: Aaron Teo <[email protected]>

* ggml: move ggml_assert location

Signed-off-by: Aaron Teo <[email protected]>

* ggml: reword the endianness check error message

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

b5568

01 Jun 12:27
Compare
Choose a tag to compare
sync : ggml

ggml-ci

b5560

01 Jun 10:42
c046217
Compare
Choose a tag to compare
parallel : fix n_junk == 0 (#13952)

b5559

01 Jun 09:32
0fc16b4
Compare
Choose a tag to compare
kv-cache : split implementation in separate sources (#13920)

ggml-ci

b5558

31 May 23:57
053b153
Compare
Choose a tag to compare
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Win…

b5556

31 May 15:55
e15898d
Compare
Choose a tag to compare
server: allow unclosed thinking tags (#13931)

b5555

31 May 13:47
803f8ba
Compare
Choose a tag to compare
llama : deprecate explicit kv_self defrag/update calls (#13921)

ggml-ci

b5554

31 May 13:37
3600cc2
Compare
Choose a tag to compare
llama : use n_swa + n_ubatch cells for SWA cache (#13833)

* llama : use n_swa + n_ubatch cells for SWA cache

ggml-ci

* llama : add warning about multi-sqeuence SWA contexts

b5552

31 May 10:56
3f55f78
Compare
Choose a tag to compare
llama : auto-batch preparation (#13845)

* llama : auto-batch

ggml-ci

* context : simplify if branching