Skip to content

vulkan: initial support for IQ4_XS quantization #11501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

remyoudompheng
Copy link
Contributor

As a followup to #11360 this PR adds support for IQ4_XS quantization.

Note that coopmat2 correctness was not tested due to lack of compatible hardware.

Performance numbers

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon 780M (RADV GFX1103_R1) (radv) | uma: 1 | fp16: 1 | warp size: 64 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| qwen2 3B IQ4_XS - 4.25 bpw     |   1.61 GiB |     3.09 B | Vulkan     |  99 |         pp512 |        512.53 ± 0.26 |
| qwen2 3B IQ4_XS - 4.25 bpw     |   1.61 GiB |     3.09 B | Vulkan     |  99 |         tg128 |         34.78 ± 0.31 |
| qwen2 3B Q4_K - Medium         |   1.79 GiB |     3.09 B | Vulkan     |  99 |         pp512 |       463.05 ± 28.33 |
| qwen2 3B Q4_K - Medium         |   1.79 GiB |     3.09 B | Vulkan     |  99 |         tg128 |         33.99 ± 0.15 |

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jan 29, 2025
Copy link
Collaborator

@jeffbolznv jeffbolznv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the one tiny fix.

@remyoudompheng
Copy link
Contributor Author

Branch updated with suggested change

@0cc4m 0cc4m self-requested a review January 30, 2025 08:28
Copy link
Collaborator

@netrunnereve netrunnereve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks and runs good on my end.

@jeffbolznv
Copy link
Collaborator

@0cc4m I know you're busy, but any concern if we go ahead and merge this? @remyoudompheng has a few changes stacked up and I'd like to unblock them.

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 5, 2025

@0cc4m I know you're busy, but any concern if we go ahead and merge this? @remyoudompheng has a few changes stacked up and I'd like to unblock them.

It's been a very busy week for me, yeah. For this PR I just want to do a sanity check with the hardware I have. I'll do that today.

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 5, 2025

I have a single test failure on AMD Vega20. Not sure what is going on, maybe something in relation to workgroup size 64? Here is the output of GGML_VULKAN_CHECK_RESULTS, maybe it helps:

  MUL_MAT(type_a=iq4_xs,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): ERROR: avg_err=1.4903 in MUL_MAT (check 171)
tensor=0x618a77caf2a0 tensor->name=out tensor->type: f32 ne0=16 nb0=4 ne1=9 nb1=64 ne2=1 nb2=576 ne3=1 nb3=576 offset=0
src0=0x618a77caece0 op=NONE type=iq4_xs ne0=256 nb0=136 ne1=16 nb1=136 ne2=1 nb2=2176 ne3=1 nb3=2176 offset=0
src1=0x618a77caefc0 op=NONE type=f32 ne0=256 nb0=4 ne1=9 nb1=1024 ne2=1 nb2=9216 ne3=1 nb3=9216 offset=0
First error: result=8.16406 correct=9.26931 i3=0 i2=0 i1=0 i0=4

Result:
               0       1       2       3       4       5       6       7       8       9
      0:  -10.00   -6.59    0.77   -5.95   -5.74    0.50   -2.28    0.74    2.34
      1:    7.98   -8.41   -0.19   -2.18    3.74  -12.41    1.11   -2.37   -7.66
      2:   -7.83   -6.92    2.15    4.31   -1.06   -4.96    1.58    0.00    3.05
      3:    7.27   -2.11    2.84    0.75    8.48   -4.01   -3.87   -0.42    8.42
      4:    8.16    8.05   -9.32    0.61  -17.23   -0.00   -7.27   17.83    9.52
      5:    1.29    2.61   -1.70   -5.21   -0.35   -5.15   -4.14   -1.82    4.06
      6:   -1.17    2.94   -7.18  -11.21    2.69   -3.84   -3.15    8.77   -1.53
      7:    1.29   -6.46   -5.88    3.50    5.39    1.54   -1.94    1.17   -8.61
      8:    4.46    9.68   -8.83    0.30    8.71   -8.28    1.60    5.41   -4.36
      9:    0.37   -5.25   -4.61   -5.13   -1.46    1.76   -6.34    0.50   -1.74

Correct:
               0       1       2       3       4       5       6       7       8       9
      0:   -9.97   -6.61    0.78   -5.98   -5.72    0.50   -2.26    0.78    2.30
      1:    7.95   -8.39   -0.22   -2.18    3.72  -12.45    1.09   -2.34   -7.66
      2:   -7.88   -6.92    2.14    4.34   -1.06   -4.95    1.59    0.01    3.05
      3:    7.28   -2.08    2.79    0.71    8.43   -3.95   -3.89   -0.44    8.44
      4:    9.27   -1.41    1.42    3.69  -15.42    3.07   -2.48    4.72    3.12
      5:   -1.18   -3.72   -8.36   -3.61   -1.48   -1.41   -1.89   -1.44    0.21
      6:    2.84   -5.72   -5.06   -4.26    7.67   -5.38    4.31    1.09   -5.65
      7:    2.58   -4.30    5.54   -0.48    7.44   -4.51   -3.46    5.91   -5.20
      8:    3.34   10.60   -7.52    0.53    4.86   -8.32    3.78    3.48   -6.79
      9:   -1.08   -8.28   -4.67   -4.46   -0.21    0.58   -5.51    1.42   -1.37

MUL_MAT gpu=0
 NONE gpu=0
 NONE gpu=0

@jeffbolznv
Copy link
Collaborator

Yeah, I can reproduce that failure with subgroupsize forced to 64 and coopmat2 disabled.

@jeffbolznv
Copy link
Collaborator

Hmm, actually I get a bunch of other failures with subgroup size forced to 64, so I'm not sure that experiment is meaningful.

@netrunnereve
Copy link
Collaborator

netrunnereve commented Feb 5, 2025

I have a single test failure on AMD Vega20. Not sure what is going on, maybe something in relation to workgroup size 64

That's strange as my RX 470 and W8100 aren't seeing this and those are size 64 cards. Maybe try running the Vega 20 in fp32 mode and see what happens?

@remyoudompheng
Copy link
Contributor Author

I'm not seeing this failure either on the Ryzen 5500U iGPU (which should also be GCN 5.1)

@0cc4m
Copy link
Collaborator

0cc4m commented Feb 6, 2025

It once again only happens on very recent Mesa versions, so it might be another RADV Vega20 bug. I'll try to resolve that with mesa, this PR is fine.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@0cc4m 0cc4m merged commit 8a7e3bf into ggml-org:master Feb 6, 2025
45 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
@remyoudompheng remyoudompheng deleted the vulkan-iq4xs branch February 17, 2025 23:16
orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants