Skip to content

Vulkan: Tune Vulkan mmq int dot shader for performance #12767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 5, 2025

Conversation

0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Apr 5, 2025

Retune the DP4A matmul shaders I added in #12135 to extract more performance from them after a bugfix reduced their performance in #12722. Here are my test results:

RTX 3090:

model size params backend ngl test t/s fp16 t/s int dot master t/s int dot PR t/s coopmat1 t/s coopmat2 t/s CUDA
llama 8B Q4_0 4.33 GiB 8.03 B Vulkan 99 pp512 1025.95 ± 1.11 1928.35 ± 8.34 1925.12 ± 6.65 3138.60 ± 28.29 4247.01 ± 60.18 5069.38 ± 18.59
llama 8B Q8_0 7.95 GiB 8.03 B Vulkan 99 pp512 1000.95 ± 2.89 1898.78 ± 4.69 1894.99 ± 7.38 2749.46 ± 3.65 4329.58 ± 16.31 4932.49 ± 14.11

AMD Radeon RX 6800 XT:

model size params backend ngl test t/s fp16 t/s int dot master t/s int dot PR t/s ROCm
llama 8B Q4_0 4.33 GiB 8.03 B Vulkan 99 pp512 922.14 ± 1.04 1284.95 ± 2.76 1463.92 ± 1.74 1678.14 ± 2.28
llama 8B Q8_0 7.95 GiB 8.03 B Vulkan 99 pp512 902.82 ± 0.80 1060.93 ± 0.74 1250.73 ± 1.23 1618.84 ± 1.25

AMD Radeon Pro VII:

model size params backend ngl test t/s fp16 t/s int dot master t/s int dot PR t/s ROCm
llama 8B Q4_0 4.33 GiB 8.03 B Vulkan 99 pp512 313.00 ± 0.60 390.98 ± 1.52 588.09 ± 0.42 1012.39 ± 0.46
llama 8B Q8_0 7.95 GiB 8.03 B Vulkan 99 pp512 309.01 ± 0.48 317.19 ± 0.91 501.33 ± 0.55 398.53 ± 0.06

Intel A770:

model size params backend ngl test t/s fp16 t/s int dot master t/s int dot PR t/s SYCL
llama 8B Q4_0 4.33 GiB 8.03 B Vulkan 99 pp512 165.47 ± 0.14 508.05 ± 0.95 734.05 ± 1.60 917.26 ± 6.47
llama 8B Q8_0 7.95 GiB 8.03 B Vulkan 99 pp512 157.58 ± 0.17 493.80 ± 0.76 678.78 ± 0.75 893.41 ± 4.35

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Apr 5, 2025
@0cc4m 0cc4m requested a review from jeffbolznv April 5, 2025 07:34
LostRuins added a commit to LostRuins/koboldcpp that referenced this pull request Apr 5, 2025
@0cc4m 0cc4m merged commit 6bf28f0 into master Apr 5, 2025
51 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-mmq-dp4a-tune branch April 5, 2025 16:04
colout pushed a commit to colout/llama.cpp that referenced this pull request Apr 29, 2025
timwu pushed a commit to timwu/llama.cpp that referenced this pull request May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants