Skip to content

AVX2 optimization for vec_dot_q4_2_q8_0 #1068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 20, 2023
Merged

AVX2 optimization for vec_dot_q4_2_q8_0 #1068

merged 1 commit into from
Apr 20, 2023

Conversation

sw
Copy link
Contributor

@sw sw commented Apr 19, 2023

Adding a missing piece to the puzzle...

@sw sw marked this pull request as ready for review April 19, 2023 19:01
@Green-Sky
Copy link
Collaborator

Green-Sky commented Apr 20, 2023

it's fast. almost as fast as openblas

7B q4_2 openblas:

perplexity : calculating perplexity over 655 chunks, batch_size=512
25.03 seconds per pass - ETA 4.55 hours

7B q4_2 just avx2:

perplexity : calculating perplexity over 655 chunks, batch_size=512
27.31 seconds per pass - ETA 4.97 hours

7B q4_2 pre pr:

perplexity : calculating perplexity over 655 chunks, batch_size=512
162.62 seconds per pass - ETA 29.59 hours

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants