Releases · ggml-org/llama.cpp

04 Jan 09:39

c31fc8b

b4411

fix: Vulkan shader gen binary path (#11037)

Assets 23

03 Jan 10:16

github-actions

b4409

e7da954

b4409

metal : avoid uint (#11019)

Assets 23

02 Jan 14:41

github-actions

b4406

0da5d86

b4406

server : allow using LoRA adapters per-request (#10994)

* slot.can_batch_with

* lora per request

* test: force disable cache prompt

* move can_batch_with check

* fix condition

* add slow test with llama 8b

* update docs

* move lora change task to queue

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

* lora_base

* remove redundant check

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 23

31 Dec 15:14

github-actions

b4404

0827b2c

b4404

ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027)

* Fixes for clang AVX VNNI

* enable AVX VNNI and alder lake build for MSVC

* Apply suggestions from code review

---------

Co-authored-by: slaren <[email protected]>

Assets 23

31 Dec 15:08

github-actions

b4403

45095a6

b4403

server : clean up built-in template detection (#11026)

* server : clean up built-in template detection

* fix compilation

* add chat template test

* fix condition

Assets 23

31 Dec 12:26

github-actions

b4402

5896c65

b4402

server : add OAI compat for /v1/completions (#10974)

* server : add OAI compat for /v1/completions

* add test

* add docs

* better docs

Assets 23

31 Dec 01:19

github-actions

b4400

6e1531a

b4400

common, examples, ggml : fix MSYS2 GCC compiler errors and warnings w…

Assets 23

30 Dec 18:02

github-actions

b4399

716bd6d

b4399

vulkan: optimize mul_mat for small values of N (#10991)

Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where
the batch_strides are overloaded to hold the row strides. Put the loads from the
B matrix in the innermost loop because it should cache better.

Share some code for reducing the result values to memory in mul_mat_vec_base.

Assets 23

30 Dec 13:42

github-actions

b4398

c250ecb

b4398

android : fix llama_batch free (#11014)

Assets 23

29 Dec 09:54

github-actions

b4397

a813bad

b4397

vulkan: im2col and matmul optimizations for stable diffusion (#10942)

* tests: Add im2col perf tests

* vulkan: optimize im2col, more elements per thread

* vulkan: increase small tile size for NV_coopmat2

* vulkan: change im2col to 512 elements per workgroup

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b4411

Uh oh!

b4409

Uh oh!

b4406

Uh oh!

b4404

Uh oh!

b4403

Uh oh!

b4402

Uh oh!

b4400

Uh oh!

b4399

Uh oh!

b4398

Uh oh!

b4397

Uh oh!