Skip to content

Commit 7c942c4

Browse files
committed
[LLava] Fix stats for C++ runner
Before: I 00:00:28.414816 executorch:stats.h:84] Prompt Tokens: 616 Generated Tokens: 33 I 00:00:28.414826 executorch:stats.h:90] Model Load Time: 9.244000 (seconds) I 00:00:28.414835 executorch:stats.h:100] Total inference time: 0.000000 (seconds) Rate: inf (tokens/second) I 00:00:28.414838 executorch:stats.h:108] Prompt evaluation: 0.000000 (seconds) Rate: inf (tokens/second) I 00:00:28.414839 executorch:stats.h:119] Generated 33 tokens: 0.000000 (seconds) Rate: inf (tokens/second) I 00:00:28.414841 executorch:stats.h:127] Time to first generated token: 0.000000 (seconds) I 00:00:28.414842 executorch:stats.h:134] Sampling time over 649 tokens: 0.002000 (seconds) With real image on M1: I 00:00:34.231017 executorch:stats.h:84] Prompt Tokens: 616 Generated Tokens: 33 I 00:00:34.231028 executorch:stats.h:90] Model Load Time: 9.108000 (seconds) I 00:00:34.231038 executorch:stats.h:100] Total inference time: 25.103000 (seconds) Rate: 1.314584 (tokens/second) I 00:00:34.231040 executorch:stats.h:108] Prompt evaluation: 11.544000 (seconds) Rate: 53.361053 (tokens/second) I 00:00:34.231042 executorch:stats.h:119] Generated 33 tokens: 13.559000 (seconds) Rate: 2.433808 (tokens/second) I 00:00:34.231043 executorch:stats.h:127] Time to first generated token: 11.544000 (seconds) I 00:00:34.231045 executorch:stats.h:134] Sampling time over 649 tokens: 0.000000 (seconds) With bogus image (same dims) on Android S23: I 00:00:34.649120 executorch:stats.h:84] Prompt Tokens: 616 Generated Tokens: 33 I 00:00:34.649128 executorch:stats.h:90] Model Load Time: 12.337000 (seconds) I 00:00:34.649169 executorch:stats.h:100] Total inference time: 22.301000 (seconds) Rate: 1.479754 (tokens/second) I 00:00:34.649174 executorch:stats.h:108] Prompt evaluation: 17.964000 (seconds) Rate: 34.290804 (tokens/second) I 00:00:34.649179 executorch:stats.h:119] Generated 33 tokens: 4.337000 (seconds) Rate: 7.608946 (tokens/second) I 00:00:34.649183 executorch:stats.h:127] Time to first generated token: 17.964000 (seconds) I 00:00:34.649186 executorch:stats.h:134] Sampling time over 649 tokens: 0.001000 (seconds)
1 parent 99fbca3 commit 7c942c4

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

examples/models/llava/runner/llava_runner.cpp

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,8 @@ Error LlavaRunner::generate_from_pos(
105105

106106
uint64_t prefill_next_token =
107107
ET_UNWRAP(prefill_prompt(prompt, start_pos, /*bos=*/0, /*eos*/ 0));
108+
stats_.first_token_ms = util::time_in_ms();
109+
stats_.prompt_eval_end_ms = util::time_in_ms();
108110
stats_.num_prompt_tokens = start_pos;
109111

110112
// Generate tokens
@@ -113,7 +115,6 @@ Error LlavaRunner::generate_from_pos(
113115

114116
// Bookkeeping
115117
stats_.num_generated_tokens = num_generated_tokens;
116-
::executorch::llm::print_report(stats_);
117118
if (stats_callback) {
118119
stats_callback(stats_);
119120
}
@@ -147,6 +148,7 @@ Error LlavaRunner::generate(
147148
};
148149

149150
int64_t pos = 0;
151+
stats_.inference_start_ms = util::time_in_ms();
150152

151153
// prefill preset prompt
152154
prefill_prompt(kPresetPrompt, pos, /*bos=*/1, /*eos*/ 0);
@@ -163,6 +165,9 @@ Error LlavaRunner::generate(
163165
Error err =
164166
generate_from_pos(prompt, seq_len, pos, wrapped_callback, stats_callback);
165167

168+
stats_.inference_end_ms = util::time_in_ms();
169+
::executorch::llm::print_report(stats_);
170+
166171
ET_LOG(
167172
Info,
168173
"RSS after finishing text generation: %f MiB (0 if unsupported)",

0 commit comments

Comments
 (0)