Skip to content

Releases: ggml-org/llama.cpp

b4217

29 Nov 07:03
f095a64
Compare
Choose a tag to compare
vulkan: get the first command buffer submitted sooner (#10499)

This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.

With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.

b4216

29 Nov 00:58
678d799
Compare
Choose a tag to compare
llava: return false instead of exit (#10546)

b4215

28 Nov 19:53
dc22344
Compare
Choose a tag to compare
ggml : remove redundant copyright notice + update authors

b4214

28 Nov 19:37
4c0a95b
Compare
Choose a tag to compare
llama : add missing model types

b4212

28 Nov 17:48
8907193
Compare
Choose a tag to compare
common: fix warning message when no GPU found (#10564)

b4210

28 Nov 15:41
e90688e
Compare
Choose a tag to compare
ci : fix tag name in cuda and hip releases (#10566)

b4209

28 Nov 14:38
76b27d2
Compare
Choose a tag to compare
ggml : fix row condition for i8mm kernels (#10561)

ggml-ci

b4208

28 Nov 14:37
eea986f
Compare
Choose a tag to compare
cmake : fix ARM feature detection (#10543)

ggml-ci

b4206

28 Nov 12:49
2025fa6
Compare
Choose a tag to compare
kompute : improve backend to pass test_backend_ops (#10542)

* kompute: op_unary: reject unsupported parameters

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: softmax: implement ALiBi support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: rope: implement neox and phi3 support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_q4_k permutted support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_f16 permutted support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_q6_k permutted support

Signed-off-by: Sergio Lopez <[email protected]>

---------

Signed-off-by: Sergio Lopez <[email protected]>

b4204

28 Nov 08:04
605fa66
Compare
Choose a tag to compare
CANN: Fix SOC_TYPE compile bug (#10519)

* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment

* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.

* fix CANN  compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version