Replies: 1 comment
-
The GPU performance increase has been even greater, like 5 times faster or more. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Only CPU - i9 9900
I made some tests with
2 months old llama build and 65B ggml model also 2 months old .
2 month old 65B q5_1 model and old llama.cpp - around 1700 ms/token
Current 65B q4k_m ( similar prepexity to q5_1) and current llama.cpp - around 1000 ms/token.
And also tested my new ryzen 7950x3d :P
Current 65B q4k_m ( similar prepexity to q5_1) and current llama.cpp - around 600 ms/token. ( Didn't test for avx512 yet )
So combination new models and new builds for CPU comparing from 2 months ago is giving more than 60% performance improvement.
That's awesome!
Beta Was this translation helpful? Give feedback.
All reactions