what is "prompt eval time" and "eval time" ? #2260

Xiang-cd · 2023-07-16T04:33:32Z

Xiang-cd
Jul 16, 2023

what is eval time and prompt eval time? i found these two part of time is quite slow in my gernateion:

hardware: Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz
gcc: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

./main --numa -m ./models/llama-7b-ggml-f16.bin -p "Building a website can be done in 10 simple steps:" -n 512

system info:

system_info: n_threads = 56 / 112 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 512, n_keep = 0

llama_print_timings:        load time =   907.71 ms
llama_print_timings:      sample time =   301.80 ms /   512 runs   (    0.59 ms per token,  1696.48 tokens per second)
llama_print_timings: prompt eval time =  6294.68 ms /   271 tokens (   23.23 ms per token,    43.05 tokens per second)
llama_print_timings:        eval time = 115222.19 ms /   510 runs   (  225.93 ms per token,     4.43 tokens per second)
llama_print_timings:       total time = 121963.55 ms

Answered by abc-nix

Jul 16, 2023

Edit: found the thread: #1323 (comment)

~~I was not able to find the discussion I read in the past which explains the meaning of each llama_print_timings, but~~ what I understand is that:

load time: time it takes for the model to load.
sample time: time it takes to "tokenize" (sample) the prompt message for it to be processed by the program.
prompt eval time: time it takes to process the tokenized prompt message. If this isn't done, there would be no context for the model to know what token to predict next.
eval time: time needed to generate all tokens as the response to the prompt (excludes all pre-processing time, and it only measures the time since it starts outputting tokens).

I recomme…

View full answer

abc-nix · 2023-07-16T08:04:43Z

abc-nix
Jul 16, 2023

Edit: found the thread: #1323 (comment)

~~I was not able to find the discussion I read in the past which explains the meaning of each llama_print_timings, but~~ what I understand is that:

load time: time it takes for the model to load.
sample time: time it takes to "tokenize" (sample) the prompt message for it to be processed by the program.
prompt eval time: time it takes to process the tokenized prompt message. If this isn't done, there would be no context for the model to know what token to predict next.
eval time: time needed to generate all tokens as the response to the prompt (excludes all pre-processing time, and it only measures the time since it starts outputting tokens).

I recommend two things:

If you don't have an issue to report, and just have a simple question to ask, you should use the “discussions” space in this repo instead.
If you want to improve prompt processing speed (related to prompt eval time), read the main readme of the github project. You will see many options there, but maybe non fit your exact hardware. On my non-multithreading machine, I used OpneBlas and MKL to speed up prompt processing, before I started using a dedicated GPU for this. Probably in your case, BLAS will not be good enough compared to llama.cpp current CPU prompt processing. As described in this reddit post, you will need to find the optimal number of threads to speed up prompt processing (token generation dependends mainly on memory access speed).

1 reply

SlyEcho Jul 21, 2023
Collaborator

Actually "sample time" is not tokenization time. I understood wrong back then. It is the time taken to find the next likely token using all the rules and parameters that the user sets up on the command line.

Xiang-cd · 2023-07-17T03:53:35Z

Xiang-cd
Jul 17, 2023
Author

thank you very much, it was very helpful!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

what is "prompt eval time" and "eval time" ? #2260

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

what is "prompt eval time" and "eval time" ? #2260

Uh oh!

Xiang-cd Jul 16, 2023

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

abc-nix Jul 16, 2023

Uh oh!

SlyEcho Jul 21, 2023 Collaborator

Uh oh!

Xiang-cd Jul 17, 2023 Author

Xiang-cd
Jul 16, 2023

Replies: 2 comments 1 reply

abc-nix
Jul 16, 2023

SlyEcho Jul 21, 2023
Collaborator

Xiang-cd
Jul 17, 2023
Author