-
Notifications
You must be signed in to change notification settings - Fork 606
Add script to fetch benchmark results for execuTorch #11734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yang Wang <[email protected]>
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11734
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled Job, 2 Pending, 3 Unrelated FailuresAs of commit 1a3795e with merge base da36d8a ( CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
FYI, this method can be more general, but since only execuTorch is using it, i just make it execuTorch specific @huydhn |
Signed-off-by: Yang Wang <[email protected]>
the excel sheet has limit of sheet name len < 31, which can be easy to break in the future. @huydhn @guangy10 , I think instead of generate one file per category, maybe we can generate list of excel files stored in folders [private, public] But right now with the hard-coded abbreviation, this works fine. THe excel sheet option is there in case people want to use it. |
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamped to unblock! Let's start using the script and improve it along the way
I think we should make it more flexible as there are always new models, recipes, devices added. For example, we recently add more models (see on dash) from Similarly with new "devices" or "backends" available, we want to be able to query the results via the script as well.
Yeah noticed the limits when I manually created the excel sheet. Ideally I'd like to get rid of the excel sheet by wiring the outputs from db to the analysis script directly. Given what is currently supported in this PR, what does the workflow look like if I want to rerun the analysis? That is, how is this script interfaced to the analysis script? |
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
please review this again, this script is synced with analysis script with excel output now, test mv3 data results, and post the analysis example in the comment, for samples i saw, the result generated from the new script is similar to the results post in #10982 |
Signed-off-by: Yang Wang <[email protected]>
- `--device-pools`: Filter by private device pool names (e.g., "samsung-galaxy-s22-5g", "samsung-galaxy-s22plus-5g") | ||
- `--backends`: Filter by specific backend names (e.g.,"xnnpack_q8") | ||
- `--models`: Filter by specific model names (e.g., "mv3", "meta-llama-llama-3.2-1b-instruct-qlora-int4-eo8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the examples names are still incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linter error to fix
Signed-off-by: Yang Wang <[email protected]>
# Summary Provide methods and script to fetch all execuTorch benchamrk data from HUD API into two dataset,private and public, the script will: - fetch all data from HUD API from input time range in UTC - clean out records and tables with only FAILURE_REPORT due to job-level failures - get all private table metrics, generate `table_name` and find intersected public table metrics - generate private and public table groups - output data OutputType: - run with excel-sheet export - run with csv export - run with dataframe format print - run with json format print See more guidance in README.md the data is similar to the excel sheet generated manually in pytorch#10982 The result should be the same as the hud per model datatable: <img width="1480" alt="image" src="https://github.com/user-attachments/assets/7c6cc12e-50c5-4ce2-ac87-5cac650486e3" /> ## helper methods: common.py provide common.py helper method to convert back csv and excel sheets back to {"groupInfo":{}, "df":df.DataFrame} format. # run with ``` bash python3 .ci/scripts/benchmark_tooling/get_benchmark_analysis_data.py \ --startTime "2025-04-29T09:48:57" \ --endTime "2025-05-13T22:00:00" \ --outputType "excel" \ --models "mv3" python3 .ci/scripts/benchmark_tooling/analyze_benchmark_stability.py \ --primary-file private.xlsx \ --reference-file public.xlsx ``` Generate excel files: [private.xlsx](https://github.com/user-attachments/files/20844977/private.xlsx) [public.xlsx](https://github.com/user-attachments/files/20844978/public.xlsx) For instance you can find result for mv3 xnnq_q8 S22 Ultra android 14: ``` Latency Stability Analysis: table10 (Primary) ================================================================================ Model: mv3(xnnpack_q8) Device: Samsung Galaxy S22 Ultra 5G (private)(Android 14) Dataset Overview: - Number of samples: 88 - Date range: 2025-04-29 09:48:57+00:00 to 2025-05-13 21:08:36+00:00 Central Tendency Metrics: - Mean latency: 2.91 ms - Median latency (P50): 2.54 ms - Mean trimmed latency: 2.41 ms - Median trimmed latency: 2.15 ms Dispersion Metrics: - Standard deviation: 1.14 ms - Coefficient of variation (CV): 39.08% - Interquartile range (IQR): 0.82 ms - Trimmed standard deviation: 0.76 ms - Trimmed coefficient of variation: 31.60% Percentile Metrics: - P50 (median): 2.54 ms - P90: 3.88 ms - P95: 4.60 ms - P99: 5.91 ms Inter-Jitter Metrics (variability between runs): - Max/Min ratio: 5.6103 - P99/P50 ratio: 2.3319 - Mean rolling std (window=5): 0.79 ms Intra-Jitter Metrics (variability within runs): - Mean trimming effect ratio: 15.37% - Max trimming effect ratio: 38.83% Stability Assessment: - Overall stability score: 0.0/100 - Overall stability rating: Poor Interpretation: The benchmark shows poor stability (score: 0.0/100) with significant variation between runs (CV: 39.08%). Performance is unpredictable and may lead to inconsistent user experience. The significant difference between raw and trimmed means suggests considerable intra-run jitter (15.4%) with occasional outliers within benchmark runs. The max/min ratio of 5.61 indicates substantial performance differences between the best and worst runs. The P99/P50 ratio of 2.33 suggests occasional latency spikes that could affect tail latency sensitive applications. ``` --------- Signed-off-by: Yang Wang <[email protected]>
Summary
Provide methods and script to fetch all execuTorch benchamrk data from HUD API into two dataset,private and public, the script will:
table_name
and find intersected public table metricsOutputType:
See more guidance in README.md
the data is similar to the excel sheet generated manually in #10982

The result should be the same as the hud per model datatable:
helper methods: common.py
provide common.py helper method to convert back csv and excel sheets back to {"groupInfo":{}, "df":df.DataFrame} format.
run with
Generate excel files:
private.xlsx
public.xlsx
For instance you can find result for mv3 xnnq_q8 S22 Ultra android 14: