Releases · ggml-org/llama.cpp

26 May 14:47

f13847c

b5494

server: fix regression on streamed non-chat completion w/ stops (#13785)

* more forgiving message diffs: partial stop words aren't erased, full stops are

* Add (slow) server test for completion + stream + stop

Assets 18

26 May 11:30

github-actions

b5493

79c137f

b5493

examples : allow extracting embeddings from decoder contexts (#13797)

ggml-ci

Assets 18

26 May 10:27

github-actions

b5492

2222931

b5492

llama : clarify deprecation message (#13794)

Assets 18

26 May 04:19

github-actions

b5490

fef693d

b5490

vulkan: mark IM2COL as supporting non-contig (#13783)

Assets 18

26 May 02:41

github-actions

b5489

2d38b6e

b5489

CANN: Add the basic supports of Flash Attention kernel (#13627)

* cann: add the basic FA support

* cann: update the readme

* cann: update the FlashAttention with PSEShift

* cann: update the input parameters in FA

* cann: update the alibi with max_bias

* cann: add the constrints of softcap

* cann: update the docs CANN.md

* cann: update the docs CANN.md

* cann: fix typo of CANN.md

* cann: add some comments and update the CANN.md

* cann: update the CANN.md

* cann: update the inner precise for fusedInferAttention

* cann: update the constraints of flash_attn_ext on ggml-cann.cpp

* cann: clean the whitespace

* cann: clean the whitespace

* cann: add a new endline

Assets 18

26 May 00:05

github-actions

b5488

e121edc

b5488

`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3…

Assets 18

25 May 15:04

github-actions

b5486

aa50ba4

b5486

tests : improve UGM tokenizer test coverage (#13773)

Assets 18

25 May 13:19

github-actions

b5484

c508256

b5484

rpc : Fix build on OpenBSD (#13541)

Assets 18

25 May 12:34

github-actions

b5483

40aaa8a

b5483

mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760)

* mtmd : add Qwen2-Audio support

* small clean up

* update discussion link

* clarify mtmd_get_output_embd

* clarification in multimodal.md

* fix ultravox bug

* ggml_cont

Assets 18

25 May 10:51

github-actions

b5481

d785f9c

b5481

server: fix/test add_generation_prompt (#13770)

Co-authored-by: ochafik <[email protected]>

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5494

Uh oh!

b5493

Uh oh!

b5492

Uh oh!

b5490

Uh oh!

b5489

Uh oh!

b5488

Uh oh!

b5486

Uh oh!

b5484

Uh oh!

b5483

Uh oh!

b5481

Uh oh!