ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

xctan · 2024-10-24T07:12:26Z

This PR supersedes Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #9953
The PR contains RISC-V Vector version of ggml_gemm_q4_0_8x8_q8_0 used for Q4_0_8_8 quantized models

I've tested Mistral 7B in qemu, and it just worked. I'm still choosing suitable 3B models for my dev board with only 4GB RAM (Banana Pi BPI-F3), so I can't give any performance evaluation as of now and any help is welcome! BTW Mistral 7B could run on BPI-F3 4GB with another 4GB swap memory enabled, but it was much slower than even qemu.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

xctan · 2024-10-24T16:33:50Z

Model: https://huggingface.co/CobraMamba/mamba-gpt-3b-v4
Compiler: GCC 13.2.0

model	size	params	backend	threads	test	t/s	speedup	commit
llama 3B Q4_0_8_8	1.84 GiB	3.43 B	CPU	8	pp512	1.82 ± 0.00	271%	`78c78e2`
llama 3B Q4_0	1.84 GiB	3.43 B	CPU	8	pp512	1.04 ± 0.00	112%	`66c2c93`
llama 3B Q4_0_8_8	1.84 GiB	3.43 B	CPU	8	pp512	0.49 ± 0.00		`66c2c93`

llama 3B Q4_0_8_8	1.84 GiB	3.43 B	CPU	8	tg128	2.25 ± 0.10	350%	`78c78e2`
llama 3B Q4_0	1.84 GiB	3.43 B	CPU	8	tg128	1.27 ± 0.03	154%	`66c2c93`
llama 3B Q4_0_8_8	1.84 GiB	3.43 B	CPU	8	tg128	0.50 ± 0.01		`66c2c93`

xctan · 2024-10-30T04:56:11Z

Is there anything wrong with this PR? Should I provide more test results like llama-perplexity? @ggerganov
I'm trying ./llama-perplexity -m <model_name> -f wikitext-2-raw/wiki.test.raw for both Q4_0_8_8 GEMM implementations with and without this PR, but it takes ~400 hours in total to finish.

* ggml : RISC-V vector gemv for q4_0_8x8 * ggml : Added WIP rvv q4_0_8x8 gemm * ggml : Added initial implementation of rvv gemm * ggml : optimize gemm to avoid register spillover * ggml : Fix GCC rvv load alignment issue * ggml : Format gemm rvv code * ggml : Fix a typo in RVV q4_0_8_8 GEMM

xctan added 5 commits October 20, 2024 01:15

ggml : RISC-V vector gemv for q4_0_8x8

9bfecf4

ggml : Added WIP rvv q4_0_8x8 gemm

3f7fdf2

ggml : Added initial implementation of rvv gemm

238cd66

ggml : optimize gemm to avoid register spillover

c039415

ggml : Fix GCC rvv load alignment issue

78c78e2

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 24, 2024

ggerganov approved these changes Oct 25, 2024

View reviewed changes

ggml : Format gemm rvv code

37057a0

ggml : Fix a typo in RVV q4_0_8_8 GEMM

274a772

ggerganov merged commit fc83a9e into ggml-org:master Oct 30, 2024
51 of 52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

Uh oh!

xctan commented Oct 24, 2024

Uh oh!

xctan commented Oct 24, 2024 •

edited

Loading

Uh oh!

xctan commented Oct 30, 2024

Uh oh!

Uh oh!

Uh oh!

ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

ggml : Implementations for Q4_0_8_8 quantization based functions - RISC-V vector version #10029

Uh oh!

Conversation

xctan commented Oct 24, 2024

Uh oh!

xctan commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xctan commented Oct 30, 2024

Uh oh!

Uh oh!

Uh oh!

xctan commented Oct 24, 2024 •

edited

Loading