vulkan: initial support for IQ4_XS quantization #11501

remyoudompheng · 2025-01-29T21:14:31Z

As a followup to #11360 this PR adds support for IQ4_XS quantization.

Note that coopmat2 correctness was not tested due to lack of compatible hardware.

Performance numbers

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon 780M (RADV GFX1103_R1) (radv) | uma: 1 | fp16: 1 | warp size: 64 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| qwen2 3B IQ4_XS - 4.25 bpw     |   1.61 GiB |     3.09 B | Vulkan     |  99 |         pp512 |        512.53 ± 0.26 |
| qwen2 3B IQ4_XS - 4.25 bpw     |   1.61 GiB |     3.09 B | Vulkan     |  99 |         tg128 |         34.78 ± 0.31 |
| qwen2 3B Q4_K - Medium         |   1.79 GiB |     3.09 B | Vulkan     |  99 |         pp512 |       463.05 ± 28.33 |
| qwen2 3B Q4_K - Medium         |   1.79 GiB |     3.09 B | Vulkan     |  99 |         tg128 |         33.99 ± 0.15 |

ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs_cm2.comp

jeffbolznv

LGTM other than the one tiny fix.

remyoudompheng · 2025-01-30T06:33:33Z

Branch updated with suggested change

netrunnereve

Looks and runs good on my end.

jeffbolznv · 2025-02-05T02:53:07Z

@0cc4m I know you're busy, but any concern if we go ahead and merge this? @remyoudompheng has a few changes stacked up and I'd like to unblock them.

0cc4m · 2025-02-05T06:49:54Z

@0cc4m I know you're busy, but any concern if we go ahead and merge this? @remyoudompheng has a few changes stacked up and I'd like to unblock them.

It's been a very busy week for me, yeah. For this PR I just want to do a sanity check with the hardware I have. I'll do that today.

0cc4m · 2025-02-05T08:51:52Z

I have a single test failure on AMD Vega20. Not sure what is going on, maybe something in relation to workgroup size 64? Here is the output of GGML_VULKAN_CHECK_RESULTS, maybe it helps:

  MUL_MAT(type_a=iq4_xs,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): ERROR: avg_err=1.4903 in MUL_MAT (check 171)
tensor=0x618a77caf2a0 tensor->name=out tensor->type: f32 ne0=16 nb0=4 ne1=9 nb1=64 ne2=1 nb2=576 ne3=1 nb3=576 offset=0
src0=0x618a77caece0 op=NONE type=iq4_xs ne0=256 nb0=136 ne1=16 nb1=136 ne2=1 nb2=2176 ne3=1 nb3=2176 offset=0
src1=0x618a77caefc0 op=NONE type=f32 ne0=256 nb0=4 ne1=9 nb1=1024 ne2=1 nb2=9216 ne3=1 nb3=9216 offset=0
First error: result=8.16406 correct=9.26931 i3=0 i2=0 i1=0 i0=4

Result:
               0       1       2       3       4       5       6       7       8       9
      0:  -10.00   -6.59    0.77   -5.95   -5.74    0.50   -2.28    0.74    2.34
      1:    7.98   -8.41   -0.19   -2.18    3.74  -12.41    1.11   -2.37   -7.66
      2:   -7.83   -6.92    2.15    4.31   -1.06   -4.96    1.58    0.00    3.05
      3:    7.27   -2.11    2.84    0.75    8.48   -4.01   -3.87   -0.42    8.42
      4:    8.16    8.05   -9.32    0.61  -17.23   -0.00   -7.27   17.83    9.52
      5:    1.29    2.61   -1.70   -5.21   -0.35   -5.15   -4.14   -1.82    4.06
      6:   -1.17    2.94   -7.18  -11.21    2.69   -3.84   -3.15    8.77   -1.53
      7:    1.29   -6.46   -5.88    3.50    5.39    1.54   -1.94    1.17   -8.61
      8:    4.46    9.68   -8.83    0.30    8.71   -8.28    1.60    5.41   -4.36
      9:    0.37   -5.25   -4.61   -5.13   -1.46    1.76   -6.34    0.50   -1.74

Correct:
               0       1       2       3       4       5       6       7       8       9
      0:   -9.97   -6.61    0.78   -5.98   -5.72    0.50   -2.26    0.78    2.30
      1:    7.95   -8.39   -0.22   -2.18    3.72  -12.45    1.09   -2.34   -7.66
      2:   -7.88   -6.92    2.14    4.34   -1.06   -4.95    1.59    0.01    3.05
      3:    7.28   -2.08    2.79    0.71    8.43   -3.95   -3.89   -0.44    8.44
      4:    9.27   -1.41    1.42    3.69  -15.42    3.07   -2.48    4.72    3.12
      5:   -1.18   -3.72   -8.36   -3.61   -1.48   -1.41   -1.89   -1.44    0.21
      6:    2.84   -5.72   -5.06   -4.26    7.67   -5.38    4.31    1.09   -5.65
      7:    2.58   -4.30    5.54   -0.48    7.44   -4.51   -3.46    5.91   -5.20
      8:    3.34   10.60   -7.52    0.53    4.86   -8.32    3.78    3.48   -6.79
      9:   -1.08   -8.28   -4.67   -4.46   -0.21    0.58   -5.51    1.42   -1.37

MUL_MAT gpu=0
 NONE gpu=0
 NONE gpu=0

jeffbolznv · 2025-02-05T13:33:31Z

Yeah, I can reproduce that failure with subgroupsize forced to 64 and coopmat2 disabled.

jeffbolznv · 2025-02-05T13:36:31Z

Hmm, actually I get a bunch of other failures with subgroup size forced to 64, so I'm not sure that experiment is meaningful.

netrunnereve · 2025-02-05T15:58:35Z

I have a single test failure on AMD Vega20. Not sure what is going on, maybe something in relation to workgroup size 64

That's strange as my RX 470 and W8100 aren't seeing this and those are size 64 cards. Maybe try running the Vega 20 in fp32 mode and see what happens?

remyoudompheng · 2025-02-05T21:40:42Z

I'm not seeing this failure either on the Ryzen 5500U iGPU (which should also be GCN 5.1)

0cc4m · 2025-02-06T06:09:14Z

It once again only happens on very recent Mesa versions, so it might be another RADV Vega20 bug. I'll try to resolve that with mesa, this PR is fine.

0cc4m

Thank you!

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jan 29, 2025

jeffbolznv reviewed Jan 30, 2025

View reviewed changes

ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs_cm2.comp Outdated Show resolved Hide resolved

jeffbolznv approved these changes Jan 30, 2025

View reviewed changes

vulkan: initial support for IQ4_XS quantization

743cfdf

remyoudompheng force-pushed the vulkan-iq4xs branch from c91d4e2 to 743cfdf Compare January 30, 2025 06:32

0cc4m self-requested a review January 30, 2025 08:28

netrunnereve approved these changes Jan 30, 2025

View reviewed changes

This was referenced Jan 30, 2025

vulkan: initial support for IQ1_S and IQ1_M quantizations #11528

Merged

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations #11595

Merged

0cc4m approved these changes Feb 6, 2025

View reviewed changes

0cc4m merged commit 8a7e3bf into ggml-org:master Feb 6, 2025
45 checks passed

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025

vulkan: initial support for IQ4_XS quantization (ggml-org#11501)

2ec2be4

remyoudompheng deleted the vulkan-iq4xs branch February 17, 2025 23:16

orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025

vulkan: initial support for IQ4_XS quantization (ggml-org#11501)

4b87d8e

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025

vulkan: initial support for IQ4_XS quantization (ggml-org#11501)

f5852d3

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

vulkan: initial support for IQ4_XS quantization (ggml-org#11501)

7f533bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: initial support for IQ4_XS quantization #11501

vulkan: initial support for IQ4_XS quantization #11501

Uh oh!

remyoudompheng commented Jan 29, 2025

Uh oh!

Uh oh!

jeffbolznv left a comment

Uh oh!

remyoudompheng commented Jan 30, 2025

Uh oh!

netrunnereve left a comment

Uh oh!

jeffbolznv commented Feb 5, 2025

Uh oh!

0cc4m commented Feb 5, 2025

Uh oh!

0cc4m commented Feb 5, 2025

Uh oh!

jeffbolznv commented Feb 5, 2025

Uh oh!

jeffbolznv commented Feb 5, 2025

Uh oh!

netrunnereve commented Feb 5, 2025 •

edited

Loading

Uh oh!

remyoudompheng commented Feb 5, 2025

Uh oh!

0cc4m commented Feb 6, 2025

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Uh oh!

vulkan: initial support for IQ4_XS quantization #11501

vulkan: initial support for IQ4_XS quantization #11501

Uh oh!

Conversation

remyoudompheng commented Jan 29, 2025

Uh oh!

Uh oh!

jeffbolznv left a comment

Choose a reason for hiding this comment

Uh oh!

remyoudompheng commented Jan 30, 2025

Uh oh!

netrunnereve left a comment

Choose a reason for hiding this comment

Uh oh!

jeffbolznv commented Feb 5, 2025

Uh oh!

0cc4m commented Feb 5, 2025

Uh oh!

0cc4m commented Feb 5, 2025

Uh oh!

jeffbolznv commented Feb 5, 2025

Uh oh!

jeffbolznv commented Feb 5, 2025

Uh oh!

netrunnereve commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

remyoudompheng commented Feb 5, 2025

Uh oh!

0cc4m commented Feb 6, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

netrunnereve commented Feb 5, 2025 •

edited

Loading