Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4396
vulkan: Use push constant offset to handle misaligned descriptors (#1…
b4394
server : fix token duplication when streaming with stop strings (#10997)
b4393
vulkan: multi-row k quants (#10846) * multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default
b4392
examples, ggml : fix GCC compiler warnings (#10983) Warning types fixed (observed under MSYS2 GCC 14.2.0): * format '%ld' expects argument of type 'long int', but argument has type 'size_t' * llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers] (emitted for all struct field except first)
b4391
server : add support for "encoding_format": "base64" to the */embeddi…
b4390
ggml : more perfo with llamafile tinyblas on x86_64 (#10714) * more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: https://github.com/ikawrakow/ik_llama.cpp/pull/71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test
b4389
server: allow filtering llama server response fields (#10940) * llama_server_response_fields * llama_server_response_fields_fix_issues * params fixes * fix * clarify docs * change to "response_fields" --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b4388
llama : the WPM vocabs use the CLS token as BOS (#10930) * llama : the WPM vocabs use the CLS token as BOS ggml-ci * llama : add comment
b4387
ggml : use wstring for backend search paths (#10960) ggml-ci
b4386
ggml : fix arm enabled features check (#10961)