Skip to content

Releases: ggml-org/llama.cpp

b4411

04 Jan 09:39
c31fc8b
Compare
Choose a tag to compare
fix: Vulkan shader gen binary path (#11037)

b4409

03 Jan 10:16
e7da954
Compare
Choose a tag to compare
metal : avoid uint (#11019)

b4406

02 Jan 14:41
0da5d86
Compare
Choose a tag to compare
server : allow using LoRA adapters per-request (#10994)

* slot.can_batch_with

* lora per request

* test: force disable cache prompt

* move can_batch_with check

* fix condition

* add slow test with llama 8b

* update docs

* move lora change task to queue

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* lora_base

* remove redundant check

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b4404

31 Dec 15:14
0827b2c
Compare
Choose a tag to compare
ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027)

* Fixes for clang AVX VNNI

* enable AVX VNNI and alder lake build for MSVC

* Apply suggestions from code review

---------

Co-authored-by: slaren <[email protected]>

b4403

31 Dec 15:08
45095a6
Compare
Choose a tag to compare
server : clean up built-in template detection (#11026)

* server : clean up built-in template detection

* fix compilation

* add chat template test

* fix condition

b4402

31 Dec 12:26
5896c65
Compare
Choose a tag to compare
server : add OAI compat for /v1/completions (#10974)

* server : add OAI compat for /v1/completions

* add test

* add docs

* better docs

b4400

31 Dec 01:19
6e1531a
Compare
Choose a tag to compare
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings w…

b4399

30 Dec 18:02
716bd6d
Compare
Choose a tag to compare
vulkan: optimize mul_mat for small values of N (#10991)

Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where
the batch_strides are overloaded to hold the row strides. Put the loads from the
B matrix in the innermost loop because it should cache better.

Share some code for reducing the result values to memory in mul_mat_vec_base.

b4398

30 Dec 13:42
c250ecb
Compare
Choose a tag to compare
android : fix llama_batch free (#11014)

b4397

29 Dec 09:54
a813bad
Compare
Choose a tag to compare
vulkan: im2col and matmul optimizations for stable diffusion (#10942)

* tests: Add im2col perf tests

* vulkan: optimize im2col, more elements per thread

* vulkan: increase small tile size for NV_coopmat2

* vulkan: change im2col to 512 elements per workgroup