Skip to content

Commit d6aea3d

Browse files
Di Xu (SWE)facebook-github-bot
authored andcommitted
Support more breakdown of latency metrics/stats for Llama (#6072)
Summary: Pull Request resolved: #6072 Support more breakdown of latency metrics/stats for Llama - This is needed when we debugging the Frame-LLM project across teams Reviewed By: cccclai Differential Revision: D64139460 fbshipit-source-id: ec92ee2e15621705e7b8aa28d53e54e66c45a7cc
1 parent 83c95df commit d6aea3d

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

extension/llm/runner/stats.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,14 @@ struct Stats {
2929
long model_load_end_ms;
3030
// inference_start_ms: Immediately after the model is loaded (or we check
3131
// for model load), measure the inference time.
32+
// NOTE: It's actually the tokenizer encode + model execution time.
3233
long inference_start_ms;
34+
// End of the tokenizer encode time.
35+
long token_encode_end_ms;
36+
// Start of the model execution (forward function) time.
37+
long model_execution_start_ms;
38+
// End of the model execution (forward function) time.
39+
long model_execution_end_ms;
3340
// prompt_eval_end_ms: Prompt array allocation and tokenization. Ends right
3441
// before the inference loop starts
3542
long prompt_eval_end_ms;

0 commit comments

Comments
 (0)