Releases · ggml-org/llama.cpp

09 Dec 07:55

1a05004

b4292

cmake : simplify msvc charsets (#10672)

Assets 22

08 Dec 22:51

github-actions

b4291

ce8784b

b4291

server : fix format_infill (#10724)

* server : fix format_infill

* fix

* rename

* update test

* use another model

* update test

* update test

* test_invalid_input_extra_req

Assets 22

08 Dec 20:45

github-actions

b4290

e52522b

b4290

server : bring back info of final chunk in stream mode (#10722)

* server : bring back into to final chunk in stream mode

* clarify a bit

* traling space

Assets 22

08 Dec 12:11

github-actions

b4288

43ed389

b4288

llama : use cmake for swift build (#10525)

* llama : use cmake for swift build

* swift : <> -> ""

* ci : remove make

* ci : disable ios build

* Revert "swift : <> -> """

This reverts commit d39ffd9556482b77d4ea5b118b453fc1c097a31d.

* ci : try fix ios build

* ci : cont

* ci : cont

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 22

08 Dec 08:43

github-actions

b4287

ecc93d0

b4287

vulkan: compile a test shader in cmake to check for coopmat2 support …

Assets 22

07 Dec 20:20

github-actions

b4285

3573fa8

b4285

server : (refactor) no more json in server_task input (#10691)

* server : (refactor) no more json in server_task input

* add test for slots endpoint

* add tests for /props and /slots

* remove task inf_type

* fix CI by adding safe_json_to_str

* add "model_path" to /props

* update readme

Assets 22

07 Dec 17:12

github-actions

b4284

d9c3ba2

b4284

ggml : disable iq4_nl interleave size 8 (#10709)

ggml-ci

Assets 22

07 Dec 16:50

github-actions

b4283

ce4a7b8

b4283

server : various fixes (#10704)

* server : various fixes

ggml-ci

* server : show curent seed in slot_params

ggml-ci

* fix /slots endpoint

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* server : reflect endpoint response changes in the readme

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 22

07 Dec 13:31

github-actions

b4282

19d8762

b4282

ggml : refactor online repacking (#10446)

* rename ggml-cpu-aarch64.c to .cpp

* reformat extra cpu backend.

- clean Q4_0_N_M and IQ4_0_N_M
  - remove from "file" tensor type
  - allow only with dynamic repack

- extract cpu extra bufts and convert to C++
  - hbm
  - "aarch64"

- more generic use of extra buffer
  - generalise extra_supports_op
  - new API for "cpu-accel":
     - amx
     - aarch64

* clang-format

* Clean Q4_0_N_M ref

Enable restrict on C++

* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack

* added/corrected control on tensor size for Q4 repacking.

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* add debug logs on repacks.

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 22

07 Dec 10:35

github-actions

b4281

c2a16c0

b4281

server : fix free of spec context and batch (#10651)

ggml-ci

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b4292

Uh oh!

b4291

Uh oh!

b4290

Uh oh!

b4288

Uh oh!

b4287

Uh oh!

b4285

Uh oh!

b4284

Uh oh!

b4283

Uh oh!

b4282

Uh oh!

b4281

Uh oh!