Replies: 1 comment
-
I forgot that every layer in the model are equally traversed. So the count of every layer should be the same. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I found that llama.cpp always offload the first n layers into gpu.
Is there any way to offload the most used layers to gpu to get better performance?
For example, we can run the model with cpu to count how many times each layers have been accessed and write the result into a file. Then we use this file to tell llama.cpp which layer should stay in gpu to get the theoretical best performance.
Beta Was this translation helpful? Give feedback.
All reactions