Skip to content

cuda : fix 2-bit quants on amd hip #5105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 24, 2024

Conversation

Engininja2
Copy link
Contributor

Fixes evaluating iq2_xxs and iq2_xs on AMD HIP from HIP's half2 components appearing to be shorts instead of half. main would generate '######' and some MUL_MAT_ID and MUL_MAT ops would fail.

Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are functions __low2half and __high2half that do the exact same thing as .x and .y on NVIDIA but notably also do the correct thing on AMD. I suggest you use those instead; check the rest of the code for examples.

@Artefact2
Copy link
Collaborator

Thanks for this! I can confirm this works on gfx1030 on Linux/rocm 5.7.1.

@sorasoras
Copy link

Tested on windows with rocm 5.7.1.
I can confirm working on my 7900XTX

@Engininja2
Copy link
Contributor Author

I used __low2float since the value was immediately being cast to float and that function is already used elsewhere in the code.

@JohannesGaessler JohannesGaessler merged commit cd4fddb into ggml-org:master Jan 24, 2024
@Engininja2 Engininja2 deleted the fix-2bit-quants-amd branch January 31, 2024 14:29
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
* cuda : fix 2-bit quants on amd hip

* use __low2float intrinsic function for new quants
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* cuda : fix 2-bit quants on amd hip

* use __low2float intrinsic function for new quants
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants