Skip to content

Commit 6ddeabd

Browse files
authored
[Benchmark] Generate benchmark record for job failure (#9247)
# Description compose failure benchmark record # Related Query File https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_llms/query.sql # Details when a job fails in git_job level, or a device fails in the benchmark test, we return a benchmark record to indicate the failure, so that we can properly render the HUD UI ti distinguish metris not run and metris run with failures This pr may introduce `Unknown` to fields in the HUD execubench table temporarily, will fix it in HUD UI to handle special benchmark value. for both level of failure, the metric name will be "FAILURE_REPORT". In HUD, We mainly use the special metric name to identify failure, if more information needed, we get which level the job fails in benchmark.extra_info. ## Step Failure When a failure detected: we try to extract model info from git_job_name, the step will fail if the model info cannot be extracted. # Example of benchmark record for failure benchmark: ## When a job failed at device-job-level, - device_name: get name from job_report.name . for instance `iPhone 15` - device_os: job_report.os with prefix "Android" or "iOS". this should match both android and ios setting - model.name: extract from git job name - model.backend: extract from git job name - metric.name: "FAILURE_REPORT" ``` { "benchmark": { "name": "ExecuTorch", "mode": "inference", "extra_info": { "app_type": "IOS_APP", "job_conclusion": **"FAILED"**, "failure_type": **"DEVICE_JOB"**, "job_report": "..." }, "model": { "name": **"ic4",** "type": "OSS model", "backend": **"mps"** }, "metric": { "name": **"FAILURE_REPORT"**, "benchmark_values": 0, "target_value": 0, "extra_info": { "method": "" } }, "runners": [ { "name": **"iPhone 15"**, "type": **"iOS 18.0"**, } ] } ``` ## when a job failed at git-job-level (there is no job_reports) this happens when a job fails before it runs the benchmark job - device_name: device_pool_name from git job bane #exmaple: sumsung_galaxy_22 - device_os: "Android" or "iOS" - model.name: extract from git job name - model.backend: extract from git job name - metric.name: "FAILURE_REPORT" the failure benchmark record looks like: ``` { "benchmark": { "name": "ExecuTorch", "mode": "inference", "extra_info": { "app_type": "IOS_APP", "job_conclusion": **"FAILURE"**, "failure_type": **"GIT_JOB"**, "job_report": "{}" } }, "model": { "name": "ic4", "type": "OSS model", "backend": "mps" }, "metric": { "name": "FAILURE_REPORT", ... }, "runners": [ { "name": "samsung_galaxy_s22", "type": "Android", ... } ] } ```
1 parent 77752a4 commit 6ddeabd

File tree

5 files changed

+712
-47
lines changed

5 files changed

+712
-47
lines changed

.ci/scripts/gather_benchmark_configs.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,8 @@ def is_valid_huggingface_model_id(model_name: str) -> bool:
263263
def get_benchmark_configs() -> Dict[str, Dict]: # noqa: C901
264264
"""
265265
Gather benchmark configurations for a given set of models on the target operating system and devices.
266-
266+
CHANGE IF this function's return changed:
267+
extract_model_info() in executorch/.github/scripts/extract_benchmark_results.py IF YOU CHANGE THE RESULT OF THIS FUNCTION.
267268
Args:
268269
None
269270

0 commit comments

Comments
 (0)