Vulkan: Tune Vulkan mmq int dot shader for performance #12767

0cc4m · 2025-04-05T07:34:31Z

Retune the DP4A matmul shaders I added in #12135 to extract more performance from them after a bugfix reduced their performance in #12722. Here are my test results:

RTX 3090:

model	size	params	backend	ngl	test	t/s fp16	t/s int dot master	t/s int dot PR	t/s coopmat1	t/s coopmat2	t/s CUDA
llama 8B Q4_0	4.33 GiB	8.03 B	Vulkan	99	pp512	1025.95 ± 1.11	1928.35 ± 8.34	1925.12 ± 6.65	3138.60 ± 28.29	4247.01 ± 60.18	5069.38 ± 18.59
llama 8B Q8_0	7.95 GiB	8.03 B	Vulkan	99	pp512	1000.95 ± 2.89	1898.78 ± 4.69	1894.99 ± 7.38	2749.46 ± 3.65	4329.58 ± 16.31	4932.49 ± 14.11

AMD Radeon RX 6800 XT:

model	size	params	backend	ngl	test	t/s fp16	t/s int dot master	t/s int dot PR	t/s ROCm
llama 8B Q4_0	4.33 GiB	8.03 B	Vulkan	99	pp512	922.14 ± 1.04	1284.95 ± 2.76	1463.92 ± 1.74	1678.14 ± 2.28
llama 8B Q8_0	7.95 GiB	8.03 B	Vulkan	99	pp512	902.82 ± 0.80	1060.93 ± 0.74	1250.73 ± 1.23	1618.84 ± 1.25

AMD Radeon Pro VII:

model	size	params	backend	ngl	test	t/s fp16	t/s int dot master	t/s int dot PR	t/s ROCm
llama 8B Q4_0	4.33 GiB	8.03 B	Vulkan	99	pp512	313.00 ± 0.60	390.98 ± 1.52	588.09 ± 0.42	1012.39 ± 0.46
llama 8B Q8_0	7.95 GiB	8.03 B	Vulkan	99	pp512	309.01 ± 0.48	317.19 ± 0.91	501.33 ± 0.55	398.53 ± 0.06

Intel A770:

model	size	params	backend	ngl	test	t/s fp16	t/s int dot master	t/s int dot PR	t/s SYCL
llama 8B Q4_0	4.33 GiB	8.03 B	Vulkan	99	pp512	165.47 ± 0.14	508.05 ± 0.95	734.05 ± 1.60	917.26 ± 6.47
llama 8B Q8_0	7.95 GiB	8.03 B	Vulkan	99	pp512	157.58 ± 0.17	493.80 ± 0.76	678.78 ± 0.75	893.41 ± 4.35

Vulkan: Tune Vulkan mmq int dot shader for performance

724b35a

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Apr 5, 2025

0cc4m requested a review from jeffbolznv April 5, 2025 07:34

0cc4m mentioned this pull request Apr 5, 2025

Misc. bug: Vulkan performance regression on Iris Xe #12754

Closed

LostRuins added a commit to LostRuins/koboldcpp that referenced this pull request Apr 5, 2025

early merge ggml-org#12767 to test

c92b0cf

jeffbolznv approved these changes Apr 5, 2025

View reviewed changes

0cc4m merged commit 6bf28f0 into master Apr 5, 2025
51 checks passed

0cc4m deleted the 0cc4m/vulkan-mmq-dp4a-tune branch April 5, 2025 16:04

0cc4m mentioned this pull request Apr 22, 2025

Big performance regression of llama-bench with Vulkan backend using forced integer dot product code path (at least on NV 4070 latest driver) (from initial support in b5010).. #13063

Closed

colout pushed a commit to colout/llama.cpp that referenced this pull request Apr 29, 2025

Vulkan: Tune Vulkan mmq int dot shader for performance (ggml-org#12767)

ff88c0d

timwu pushed a commit to timwu/llama.cpp that referenced this pull request May 5, 2025

Vulkan: Tune Vulkan mmq int dot shader for performance (ggml-org#12767)

9244da6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vulkan: Tune Vulkan mmq int dot shader for performance #12767

Vulkan: Tune Vulkan mmq int dot shader for performance #12767

Uh oh!

0cc4m commented Apr 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Vulkan: Tune Vulkan mmq int dot shader for performance #12767

Vulkan: Tune Vulkan mmq int dot shader for performance #12767

Uh oh!

Conversation

0cc4m commented Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

0cc4m commented Apr 5, 2025 •

edited

Loading