You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
To be able to display the benchmark results, we need the following information:
1. About the model
* Name, i.e. `mv2`
* The backend it uses, i.e. `xnnpack`
* The quantization (dtype) applied, i.e. `q8`
2. About the metric
* Name, i.e. `token_per_sec`. Note that this needs to be flexible to cover future metrics
* Value
* An optional target (so that we can highlight regression if it happens)
3. More metadata
* The device name, i.e. `samsung`
* The device model and its Android version
* More can be included here
I codified these fields in a new `BenchmarkMetric` class, so that the benchmark results can be expressed as a list of different metrics in the result JSON.
NB: Atm, the information about the model is extracted from its name, i.e. `NAME_BACKEND_QUANTIZATION.pte`, but it's better to get it from the file itself instead. Achieving this needs a bit more research.
### Testing
https://github.com/pytorch/executorch/actions/runs/10843580072
* The JSON for `llama2`:
```
[
{
"actual": 247,
"arch": "SM-S901U1 / 12",
"benchmarkModel": {
"backend": "",
"name": "llama2",
"quantization": ""
},
"device": "samsung",
"metric": "model_load_time(ms)",
"target": 0
},
{
"actual": 367,
"arch": "SM-S901U1 / 12",
"benchmarkModel": {
"backend": "",
"name": "llama2",
"quantization": ""
},
"device": "samsung",
"metric": "generate_time(ms)",
"target": 0
},
{
"actual": 342.69662,
"arch": "SM-S901U1 / 12",
"benchmarkModel": {
"backend": "",
"name": "llama2",
"quantization": ""
},
"device": "samsung",
"metric": "token_per_sec",
"target": 0
}
]
```
* The JSON for `mv2_xnnpack_q8`. I keep the average latency here as the final number to show later on the dashboard.
```
[
{
"actual": 91.1,
"arch": "SM-S908U1 / 12",
"benchmarkModel": {
"backend": "xnnpack",
"name": "mv2",
"quantization": "q8"
},
"device": "samsung",
"metric": "avg_inference_latency(ms)",
"target": 0
}
]
```
Pull Request resolved: #5332
Reviewed By: guangy10, kirklandsign
Differential Revision: D62624549
Pulled By: huydhn
fbshipit-source-id: 5c1a605c1012396ff904c148e9a99967c83321f6
Copy file name to clipboardExpand all lines: examples/demo-apps/android/LlamaDemo/app/src/main/java/com/example/executorchllamademo/LlmBenchmarkRunner.java
0 commit comments