-
Notifications
You must be signed in to change notification settings - Fork 12.2k
llama-bench: add -d
depth arg
#13096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-bench: add -d
depth arg
#13096
Conversation
Co-authored-by: Johannes Gäßler <[email protected]>
…rwal/llama.cpp into llama-bench/add-depth-param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine too. Please fix the trailing whitespaces and I'll merge.
@JohannesGaessler Can you merge this? |
Yes, I was just waiting for the CI to finish. |
I think there is problem with the test statistics for non-zero depths: ./bin/llama-bench -m ../models/llama-3.2-1b-instruct/ggml-model-q8_0.gguf -fa 1 -p 1,2,3,4,4,4,4,5,6,7,8 -d 0,1024 -n 32 -t 1
build: f9cd683 (5503) Notice how the uncertainty of the results for |
The |
If I remember correctly we are currently calculating the means and standard deviations of t/s values rather than the runtimes. As long as the differences are small I think this is fine but for large differences between runs (such as when individual runs are very short) I think this is not quite correct and it could lead to bad estimates of the uncertainty. If you want to be fancy you could also do Rao-Blackwellization to get a tighter estimate of the uncertainty but I think this is not needed. |
Add
-d
or--n-depth
arg in llama-bench to run tests with prefilled KV cache contextRelevant discussion #12874
Sample output