Skip to content

Releases: ggml-org/llama.cpp

b4396

29 Dec 09:11
fdd2188
Compare
Choose a tag to compare
vulkan: Use push constant offset to handle misaligned descriptors (#1…

b4394

28 Dec 15:45
16cdce7
Compare
Choose a tag to compare
server : fix token duplication when streaming with stop strings (#10997)

b4393

26 Dec 16:29
d79d8f3
Compare
Choose a tag to compare
vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

b4392

26 Dec 14:32
d283d02
Compare
Choose a tag to compare
examples, ggml : fix GCC compiler warnings (#10983)

Warning types fixed (observed under MSYS2 GCC 14.2.0):
* format '%ld' expects argument of type 'long int', but argument has type 'size_t'
* llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers]  (emitted for all struct field except first)

b4391

24 Dec 21:06
9ba399d
Compare
Choose a tag to compare
server : add support for "encoding_format": "base64" to the */embeddi…

b4390

24 Dec 18:32
2cd43f4
Compare
Choose a tag to compare
ggml : more perfo with llamafile tinyblas on x86_64 (#10714)

* more perfo with llamafile tinyblas on x86_64.

- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71 )
- reduce memory bandwidth

simple tinyblas dispache and more cache freindly

* tinyblas dynamic dispaching

* sgemm: add M blocs.

* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2

* remove not stable test

b4389

24 Dec 17:27
09fe2e7
Compare
Choose a tag to compare
server:  allow filtering llama server response fields (#10940)

* llama_server_response_fields

* llama_server_response_fields_fix_issues

* params fixes

* fix

* clarify docs

* change to "response_fields"

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b4388

24 Dec 08:50
30caac3
Compare
Choose a tag to compare
llama : the WPM vocabs use the CLS token as BOS (#10930)

* llama : the WPM vocabs use the CLS token as BOS

ggml-ci

* llama : add comment

b4387

24 Dec 04:00
60cfa72
Compare
Choose a tag to compare
ggml : use wstring for backend search paths (#10960)

ggml-ci

b4386

24 Dec 03:54
3327bb0
Compare
Choose a tag to compare
ggml : fix arm enabled features check (#10961)