-
Notifications
You must be signed in to change notification settings - Fork 12.2k
vulkan: initial support for IQ4_XS quantization #11501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the one tiny fix.
c91d4e2
to
743cfdf
Compare
Branch updated with suggested change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks and runs good on my end.
@0cc4m I know you're busy, but any concern if we go ahead and merge this? @remyoudompheng has a few changes stacked up and I'd like to unblock them. |
It's been a very busy week for me, yeah. For this PR I just want to do a sanity check with the hardware I have. I'll do that today. |
I have a single test failure on AMD Vega20. Not sure what is going on, maybe something in relation to workgroup size 64? Here is the output of
|
Yeah, I can reproduce that failure with subgroupsize forced to 64 and coopmat2 disabled. |
Hmm, actually I get a bunch of other failures with subgroup size forced to 64, so I'm not sure that experiment is meaningful. |
That's strange as my RX 470 and W8100 aren't seeing this and those are size 64 cards. Maybe try running the Vega 20 in fp32 mode and see what happens? |
I'm not seeing this failure either on the Ryzen 5500U iGPU (which should also be GCN 5.1) |
It once again only happens on very recent Mesa versions, so it might be another RADV Vega20 bug. I'll try to resolve that with mesa, this PR is fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
As a followup to #11360 this PR adds support for IQ4_XS quantization.
Note that coopmat2 correctness was not tested due to lack of compatible hardware.
Performance numbers