Releases · ggml-org/llama.cpp

29 Nov 07:03

f095a64

b4217

vulkan: get the first command buffer submitted sooner (#10499)

This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.

With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.

Assets 22

29 Nov 00:58

github-actions

b4216

678d799

b4216

llava: return false instead of exit (#10546)

Assets 22

28 Nov 19:53

github-actions

b4215

dc22344

b4215

ggml : remove redundant copyright notice + update authors

Assets 22

28 Nov 19:37

github-actions

b4214

4c0a95b

b4214

llama : add missing model types

Assets 22

28 Nov 17:48

github-actions

b4212

8907193

b4212

common: fix warning message when no GPU found (#10564)

Assets 22

28 Nov 15:41

github-actions

b4210

e90688e

b4210

ci : fix tag name in cuda and hip releases (#10566)

Assets 22

28 Nov 14:38

github-actions

b4209

76b27d2

b4209

ggml : fix row condition for i8mm kernels (#10561)

ggml-ci

Assets 22

28 Nov 14:37

github-actions

b4208

eea986f

b4208

cmake : fix ARM feature detection (#10543)

ggml-ci

Assets 22

28 Nov 12:49

github-actions

b4206

2025fa6

b4206

kompute : improve backend to pass test_backend_ops (#10542)

* kompute: op_unary: reject unsupported parameters

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: softmax: implement ALiBi support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: rope: implement neox and phi3 support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_q4_k permutted support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_f16 permutted support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_q6_k permutted support

Signed-off-by: Sergio Lopez <[email protected]>

---------

Signed-off-by: Sergio Lopez <[email protected]>

Assets 22

28 Nov 08:04

github-actions

b4204

605fa66

b4204

CANN: Fix SOC_TYPE compile bug (#10519)

* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment

* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.

* fix CANN  compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b4217

Uh oh!

b4216

Uh oh!

b4215

Uh oh!

b4214

Uh oh!

b4212

Uh oh!

b4210

Uh oh!

b4209

Uh oh!

b4208

Uh oh!

b4206

Uh oh!

b4204

Uh oh!