-
Notifications
You must be signed in to change notification settings - Fork 606
Add an activity for benchmarking only #4443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4443
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit 2644fac with merge base d9cfd6a ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Example usage: ``` adb shell am start -n com.example.executorchllamademo/com.example.executorchllamademo.Benchmarking --es "model_path" "/data/local/tmp/llama/stories_kv_sdpa_fp32_xnn.pte" --es "tokenizer_path" "/data/local/tmp/llama/tokenizer.bin" ``` Then ``` adb shell run-as com.example.executorchllamademo cat files/benchmark_results.txt ``` See result like ``` loadStart: 1722275116708 loadEnd: 1722275117629 generateStart: 1722275117629 generateEnd: 1722275118834 tokens/second: 105.445114 ``` Note: We use activity because we assume it has higher RAM priority than a background service. Differential Revision: D60399589 Pulled By: kirklandsign
ebb6936
to
005f88f
Compare
This pull request was exported from Phabricator. Differential Revision: D60399589 |
Summary: Example usage: ``` adb shell am start -n com.example.executorchllamademo/com.example.executorchllamademo.Benchmarking --es "model_path" "/data/local/tmp/llama/stories_kv_sdpa_fp32_xnn.pte" --es "tokenizer_path" "/data/local/tmp/llama/tokenizer.bin" ``` Then ``` adb shell run-as com.example.executorchllamademo cat files/benchmark_results.txt ``` See result like ``` loadStart: 1722275116708 loadEnd: 1722275117629 generateStart: 1722275117629 generateEnd: 1722275118834 tokens/second: 105.445114 ``` Note: We use activity because we assume it has higher RAM priority than a background service. Differential Revision: D60399589 Pulled By: kirklandsign
005f88f
to
1dda6b5
Compare
This pull request was exported from Phabricator. Differential Revision: D60399589 |
@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Example usage: ``` adb shell am start -n com.example.executorchllamademo/com.example.executorchllamademo.Benchmarking --es "model_path" "/data/local/tmp/llama/stories_kv_sdpa_fp32_xnn.pte" --es "tokenizer_path" "/data/local/tmp/llama/tokenizer.bin" ``` Then ``` adb shell run-as com.example.executorchllamademo cat files/benchmark_results.txt ``` See result like ``` loadStart: 1722275116708 loadEnd: 1722275117629 generateStart: 1722275117629 generateEnd: 1722275118834 tokens/second: 105.445114 ``` Note: We use activity because we assume it has higher RAM priority than a background service. Differential Revision: D60399589
43e80fd
to
42dfff6
Compare
This pull request was exported from Phabricator. Differential Revision: D60399589 |
android:name=".Benchmarking" | ||
android:exported="true"> | ||
<intent-filter> | ||
<action android:name="com.example.executorchllamademo.BENCHMARK" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we name it to be something more generic, e.g. llm benchmark runner
- Can we later move the entire app under
executorch/extension/llm
as an extension for llm benchmarking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is not a blocker for this PR. Once you addressed 1) this PR should be ready to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed 1 now. Working on moving it out of llamademoapp and use a separate app (for generic as well)
long loadEnd; | ||
long generateStart; | ||
long generateEnd; | ||
String tokens; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would want to dump it to a standard and portable format later, e.g. json. Something we can reuse from AIBench.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea and I plan to do that on the API as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a task for it T197322159. You can coordinate with Varun on it.
In a follow up PR, you may want to connect this new apk to android-perf.yml here https://github.com/pytorch/executorch/blob/main/.github/workflows/android-perf.yml#L160-L162 and see if the test-spec could recognize it. |
Summary: Example usage: ``` adb shell am start -n com.example.executorchllamademo/com.example.executorchllamademo.Benchmarking --es "model_path" "/data/local/tmp/llama/stories_kv_sdpa_fp32_xnn.pte" --es "tokenizer_path" "/data/local/tmp/llama/tokenizer.bin" ``` Then ``` adb shell run-as com.example.executorchllamademo cat files/benchmark_results.txt ``` See result like ``` loadStart: 1722275116708 loadEnd: 1722275117629 generateStart: 1722275117629 generateEnd: 1722275118834 tokens/second: 105.445114 ``` Note: We use activity because we assume it has higher RAM priority than a background service. Differential Revision: D60399589
42dfff6
to
73316a7
Compare
This pull request was exported from Phabricator. Differential Revision: D60399589 |
Summary: Example usage: ``` adb shell am start -n com.example.executorchllamademo/com.example.executorchllamademo.LlmBenchmarkRunner --es "model_path" "/data/local/tmp/llama/stories_kv_sdpa_fp32_xnn.pte" --es "tokenizer_path" "/data/local/tmp/llama/tokenizer.bin" ``` Then ``` adb shell run-as com.example.executorchllamademo cat files/benchmark_results.txt ``` See result like ``` loadStart: 1722275116708 loadEnd: 1722275117629 generateStart: 1722275117629 generateEnd: 1722275118834 tokens/second: 105.445114 ``` Note: We use activity because we assume it has higher RAM priority than a background service. Differential Revision: D60399589 Pulled By: kirklandsign
73316a7
to
2644fac
Compare
This pull request was exported from Phabricator. Differential Revision: D60399589 |
Example usage:
Then
See result like
Note: We use activity because we assume it has higher RAM priority than a background service.