Releases · ngxson/llama.cpp

29 May 12:00

54a2c7a

b5535

arm64: optimize q4_k_q8_k kernel with i8mm (#13886)

This PR improves q4_k_q8_k gemm kernel with arm64 i8mm instruction.

Tested on neoverse-n2 with llama3 8b q4_k_m quantization model.
- 34% ~ 50% S_PP uplift for all batch sizes
- 12% ~ 37% S_TG uplift for batch size 4 and above

Perplexity doesn't change with this PR.

```
// tested on neoverse-n2
$ llama-batched-bench \
      -m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf \
      --no-mmap -fa \
      -c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \
      -npl 1,2,4,8,16,32 \
      -t 64

---------------------------------------------------------------------
|    PP |     TG |    B |       S_PP t/s      |       S_TG t/s      |
|       |        |      | original |  this pr | original |  this pr |
|-------|--------|------|----------|----------|----------|----------|
|   128 |    128 |    1 |   110.12 |   147.83 |    24.36 |    24.28 |
|   128 |    128 |    2 |   121.16 |   172.42 |    46.36 |    47.93 |
|   128 |    128 |    4 |   120.15 |   169.75 |    74.68 |    84.00 |
|   128 |    128 |    8 |   130.97 |   196.81 |    91.04 |   114.74 |
|   128 |    128 |   16 |   131.01 |   196.88 |   101.43 |   135.79 |
|   128 |    128 |   32 |   130.85 |   196.51 |   106.97 |   147.29 |
---------------------------------------------------------------------
```

Assets 18

29 May 11:09

github-actions

b5534

21fcc21

b5534

cmake: Factor out CPU architecture detection (#13883)

* cmake: Define function for querying architecture

The tests and results match exactly those of ggml/src/CMakeLists.txt

* Switch arch detection over to new function

Assets 18

29 May 10:00

github-actions

b5533

dd8ba93

b5533

ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Al…

Assets 18

29 May 06:50

github-actions

b5530

6385b84

b5530

llama : add RobertaForSequenceClassification reranker support (#13875)

Assets 18

29 May 06:26

github-actions

b5529

1b8fb81

b5529

ggml: aarch64: Implement SVE F32 kernels for vector functions (#13843)

* F32-Mamba-SVE

* F32-Mamba-SVE

* Resolve test errors-1

* Resolve test errors-2

* F32-vec-SVE

* F32-vec-SVE

* F32-vec-SVE

Assets 18

28 May 21:02

github-actions

b5527

763d06e

b5527

llama : fix KV shift for qwen2vl (#13870)

* llama : fix KV shift for qwen2vl

* add ref to the PR

Assets 18

28 May 17:24

github-actions

b5524

e0e3aa2

b5524

llama : add support for BertForSequenceClassification reranker (#13858)

* convert: add support for BertForSequenceClassification

* add support for reranking using BertForSequenceClassification

* merge checks of eos and sep

* fix lint

---------

Co-authored-by: dinhhuy <[email protected]>

Assets 18

28 May 15:07

github-actions

b5523

aa6dff0

b5523

convert: small addition to support LlamaModel (#13838)

Co-authored-by: dinhhuy <[email protected]>

Assets 18

28 May 12:05

github-actions

b5519

a682474

b5519

CUDA: fix FA tg at long context for CC >= 8.9 (#13852)

Assets 18

28 May 04:12

github-actions

b5517

1e8659e

b5517

CANN: Add SOC TYPE printing in cmake configuration (#13837)

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b5535

Uh oh!

b5534

Uh oh!

b5533

Uh oh!

b5530

Uh oh!

b5529

Uh oh!

b5527

Uh oh!

b5524

Uh oh!

b5523

Uh oh!

b5519

Uh oh!

b5517

Uh oh!