Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4034
llama : add <|tool_call|> formatting to Granite template (#10177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <[email protected]>
b4033
ggml : fix arch check in bf16_to_fp32 (#10164)
b4027
cuda : clear error after changing peer access (#10153)
b4024
CANN: adjust backend registry refactor. (#10158) remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.
b4023
sync : ggml
b4020
ggml : move CPU backend to a separate file (#10144)
b4019
metal : minor fixup in FA kernel (#10143) * metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var
b4016
server : fix slot selection by lru (#10126) * server : fix slot selection by lru, migrate lcs to `size_t` * minor debug log fix
b4014
llama : adjust default context size + print warnings (#10136) * llama : adjust default context size + print warnings ggml-ci * ggml-ci : add missing gpu-layers + adjust context sizes
b4013
simple-chat : only add bos on first prompt (#10129)