Skip to content

Releases: ggml-org/llama.cpp

b5585

04 Jun 07:50
0b4be4c
Compare
Choose a tag to compare
CUDA: fix FTZ in FA for Gemma 3 (#13991)

b5584

04 Jun 07:45
e0e806f
Compare
Choose a tag to compare
kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)

ggml-ci

b5581

03 Jun 00:49
71e74a3
Compare
Choose a tag to compare
opencl: add `backend_synchronize` (#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

b5580

03 Jun 00:38
bfb1e01
Compare
Choose a tag to compare
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840)

* add concat, pad, repeat, tsembd, tanh, upscale

* small fixes

b5579

02 Jun 19:30
3637576
Compare
Choose a tag to compare
server : disable speculative decoding for SWA models (#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

b5578

02 Jun 19:26
ea394d7
Compare
Choose a tag to compare
metal : use F32 accumulators in FA kernels (#13975)

ggml-ci

b5577

02 Jun 18:16
5582c49
Compare
Choose a tag to compare
gemma : more consistent attention scaling for v2 and v3 (#13951)

* gemma : fix attn scale for 27B

* cont : apply scale before attn

* cont : consistent attention scaling

b5576

02 Jun 17:42
c9bbc77
Compare
Choose a tag to compare
`server`: update deepseek reasoning format (pass reasoning_content as…

b5575

02 Jun 15:06
bfd3227
Compare
Choose a tag to compare
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961)

* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b5574

02 Jun 12:39
093e3f1
Compare
Choose a tag to compare
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13…