Skip to content

Releases: ggml-org/llama.cpp

b5494

26 May 14:47
f13847c
Compare
Choose a tag to compare
server: fix regression on streamed non-chat completion w/ stops (#13785)

* more forgiving message diffs: partial stop words aren't erased, full stops are

* Add (slow) server test for completion + stream + stop

b5493

26 May 11:30
79c137f
Compare
Choose a tag to compare
examples : allow extracting embeddings from decoder contexts (#13797)

ggml-ci

b5492

26 May 10:27
2222931
Compare
Choose a tag to compare
llama : clarify deprecation message (#13794)

b5490

26 May 04:19
fef693d
Compare
Choose a tag to compare
vulkan: mark IM2COL as supporting non-contig (#13783)

b5489

26 May 02:41
2d38b6e
Compare
Choose a tag to compare
CANN: Add the basic supports of Flash Attention kernel (#13627)

* cann: add the basic FA support

* cann: update the readme

* cann: update the FlashAttention with PSEShift

* cann: update the input parameters in FA

* cann: update the alibi with max_bias

* cann: add the constrints of softcap

* cann: update the docs CANN.md

* cann: update the docs CANN.md

* cann: fix typo of CANN.md

* cann: add some comments and update the CANN.md

* cann: update the CANN.md

* cann: update the inner precise for fusedInferAttention

* cann: update the constraints of flash_attn_ext on ggml-cann.cpp

* cann: clean the whitespace

* cann: clean the whitespace

* cann: add a new endline

b5488

26 May 00:05
e121edc
Compare
Choose a tag to compare
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3…

b5486

25 May 15:04
aa50ba4
Compare
Choose a tag to compare
tests : improve UGM tokenizer test coverage (#13773)

b5484

25 May 13:19
c508256
Compare
Choose a tag to compare
rpc : Fix build on OpenBSD (#13541)

b5483

25 May 12:34
40aaa8a
Compare
Choose a tag to compare
mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760)

* mtmd : add Qwen2-Audio support

* small clean up

* update discussion link

* clarify mtmd_get_output_embd

* clarification in multimodal.md

* fix ultravox bug

* ggml_cont

b5481

25 May 10:51
d785f9c
Compare
Choose a tag to compare
server: fix/test add_generation_prompt (#13770)

Co-authored-by: ochafik <[email protected]>