Releases · ggml-org/llama.cpp

29 Dec 09:11

fdd2188

b4396

vulkan: Use push constant offset to handle misaligned descriptors (#1…

Assets 23

28 Dec 15:45

github-actions

b4394

16cdce7

b4394

server : fix token duplication when streaming with stop strings (#10997)

Assets 23

26 Dec 16:29

github-actions

b4393

d79d8f3

b4393

vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

Assets 23

26 Dec 14:32

github-actions

b4392

d283d02

b4392

examples, ggml : fix GCC compiler warnings (#10983)

Warning types fixed (observed under MSYS2 GCC 14.2.0):
* format '%ld' expects argument of type 'long int', but argument has type 'size_t'
* llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers]  (emitted for all struct field except first)

Assets 23

24 Dec 21:06

github-actions

b4391

9ba399d

b4391

server : add support for "encoding_format": "base64" to the */embeddi…

Assets 23

24 Dec 18:32

github-actions

b4390

2cd43f4

b4390

ggml : more perfo with llamafile tinyblas on x86_64 (#10714)

* more perfo with llamafile tinyblas on x86_64.

- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71 )
- reduce memory bandwidth

simple tinyblas dispache and more cache freindly

* tinyblas dynamic dispaching

* sgemm: add M blocs.

* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2

* remove not stable test

Assets 23

24 Dec 17:27

github-actions

b4389

09fe2e7

b4389

server:  allow filtering llama server response fields (#10940)

* llama_server_response_fields

* llama_server_response_fields_fix_issues

* params fixes

* fix

* clarify docs

* change to "response_fields"

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 23

24 Dec 08:50

github-actions

b4388

30caac3

b4388

llama : the WPM vocabs use the CLS token as BOS (#10930)

* llama : the WPM vocabs use the CLS token as BOS

ggml-ci

* llama : add comment

Assets 23

24 Dec 04:00

github-actions

b4387

60cfa72

b4387

ggml : use wstring for backend search paths (#10960)

ggml-ci

Assets 23

24 Dec 03:54

github-actions

b4386

3327bb0

b4386

ggml : fix arm enabled features check (#10961)

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b4396

Uh oh!

b4394

Uh oh!

b4393

Uh oh!

b4392

Uh oh!

b4391

Uh oh!

b4390

Uh oh!

b4389

Uh oh!

b4388

Uh oh!

b4387

Uh oh!

b4386

Uh oh!