[LLava] Fix stats for C++ runner #5147
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before:
I 00:00:28.414816 executorch:stats.h:84] Prompt Tokens: 616 Generated Tokens: 33
I 00:00:28.414826 executorch:stats.h:90] Model Load Time: 9.244000 (seconds)
I 00:00:28.414835 executorch:stats.h:100] Total inference time: 0.000000 (seconds) Rate: inf (tokens/second)
I 00:00:28.414838 executorch:stats.h:108] Prompt evaluation: 0.000000 (seconds) Rate: inf (tokens/second)
I 00:00:28.414839 executorch:stats.h:119] Generated 33 tokens: 0.000000 (seconds) Rate: inf (tokens/second)
I 00:00:28.414841 executorch:stats.h:127] Time to first generated token: 0.000000 (seconds)
I 00:00:28.414842 executorch:stats.h:134] Sampling time over 649 tokens: 0.002000 (seconds)
With real image on M1:
I 00:00:34.231017 executorch:stats.h:84] Prompt Tokens: 616 Generated Tokens: 33
I 00:00:34.231028 executorch:stats.h:90] Model Load Time: 9.108000 (seconds)
I 00:00:34.231038 executorch:stats.h:100] Total inference time: 25.103000 (seconds) Rate: 1.314584 (tokens/second)
I 00:00:34.231040 executorch:stats.h:108] Prompt evaluation: 11.544000 (seconds) Rate: 53.361053 (tokens/second)
I 00:00:34.231042 executorch:stats.h:119] Generated 33 tokens: 13.559000 (seconds) Rate: 2.433808 (tokens/second)
I 00:00:34.231043 executorch:stats.h:127] Time to first generated token: 11.544000 (seconds)
I 00:00:34.231045 executorch:stats.h:134] Sampling time over 649 tokens: 0.000000 (seconds)
With bogus image (same dims) on Android S23:
I 00:00:34.649120 executorch:stats.h:84] Prompt Tokens: 616 Generated Tokens: 33
I 00:00:34.649128 executorch:stats.h:90] Model Load Time: 12.337000 (seconds)
I 00:00:34.649169 executorch:stats.h:100] Total inference time: 22.301000 (seconds) Rate: 1.479754 (tokens/second)
I 00:00:34.649174 executorch:stats.h:108] Prompt evaluation: 17.964000 (seconds) Rate: 34.290804 (tokens/second)
I 00:00:34.649179 executorch:stats.h:119] Generated 33 tokens: 4.337000 (seconds) Rate: 7.608946 (tokens/second)
I 00:00:34.649183 executorch:stats.h:127] Time to first generated token: 17.964000 (seconds)
I 00:00:34.649186 executorch:stats.h:134] Sampling time over 649 tokens: 0.001000 (seconds)