Skip to content

Try fix quantized k-cache on ROCm #6205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

ikawrakow
Copy link
Contributor

@Artefact2

Does this fix the issue you reported in #6183

@Artefact2
Copy link
Collaborator

Unfortunately not, perplexity -ctk q4_0 still returns nan.

@Engininja2
Copy link
Contributor

If you force use_mul_mat_q off, then it looks like it works.

I also tried some really short prompts below 32 tokens and then some lengths with the same prompt and seed would produce nans or not on different runs, so I think the root issue is related to memory being accessed incorrectly somewhere, or at least the ROCm compiler not liking how it's done.

@ikawrakow
Copy link
Contributor Author

If this doesn't solve the problem, someone with an AMD device needs to look at it.

@ikawrakow ikawrakow closed this Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants