Skip to content

[Benchmark] Generate benchmark record for job failure #9247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 17, 2025
Merged

[Benchmark] Generate benchmark record for job failure #9247

merged 19 commits into from
Mar 17, 2025

Conversation

yangw-dev
Copy link
Contributor

@yangw-dev yangw-dev commented Mar 13, 2025

Description

compose failure benchmark record

Issue: pytorch/test-infra#6294

Related Query File

https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_llms/query.sql

Details

when a job fails in git_job level, or a device fails in the benchmark test,
we return a benchmark record to indicate the failure, so that we can properly render the HUD UI ti distinguish metris not run and metris run with failures

This pr may introduce Unknown to fields in the HUD execubench table temporarily, will fix it in HUD UI to handle special benchmark value.

for both level of failure, the metric name will be "FAILURE_REPORT".
In HUD, We mainly use the special metric name to identify failure, if more information needed, we get which level the job fails in benchmark.extra_info.

Step Failure

When a failure detected:
we try to extract model info from git_job_name, the step will fail if the model info cannot be extracted.

Example of benchmark record for failure benchmark:

When a job failed at device-job-level,

  • device_name: get name from job_report.name . for instance iPhone 15
  • device_os: job_report.os with prefix "Android" or "iOS". this should match both android and ios setting
  • model.name: extract from git job name
  • model.backend: extract from git job name
  • metric.name: "FAILURE_REPORT"
{
  "benchmark": {
    "name": "ExecuTorch",
    "mode": "inference",
    "extra_info": {
      "app_type": "IOS_APP",
      "job_conclusion": **"FAILED"**,
      "failure_type": **"DEVICE_JOB"**,
      "job_report": "..."
  },
  "model": {
    "name": **"ic4",**
    "type": "OSS model",
    "backend": **"mps"**
  },
  "metric": {
    "name": **"FAILURE_REPORT"**,
    "benchmark_values": 0,
    "target_value": 0,
    "extra_info": {
      "method": ""
    }
  },
  "runners": [
    {
      "name": **"iPhone 15"**,
      "type": **"iOS 18.0"**,
    }
  ]
}

when a job failed at git-job-level (there is no job_reports)

this happens when a job fails before it runs the benchmark job

  • device_name: device_pool_name from git job bane #exmaple: sumsung_galaxy_22
  • device_os: "Android" or "iOS"
  • model.name: extract from git job name
  • model.backend: extract from git job name
  • metric.name: "FAILURE_REPORT"

the failure benchmark record looks like:

{
  "benchmark": {
    "name": "ExecuTorch",
    "mode": "inference",
    "extra_info": {
      "app_type": "IOS_APP",
      "job_conclusion": **"FAILURE"**,
      "failure_type": **"GIT_JOB"**,
      "job_report": "{}"
    }
  },
  "model": {
    "name": "ic4",
    "type": "OSS model",
    "backend": "mps"
  },
  "metric": {
    "name": "FAILURE_REPORT",
    ...
  },
  "runners": [
    {
      "name": "samsung_galaxy_s22",
      "type": "Android",
     ...
    }
  ]
}

Copy link

pytorch-bot bot commented Mar 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9247

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 69c42fa with merge base dd9a85a (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 13, 2025
@yangw-dev yangw-dev requested a review from huydhn March 13, 2025 23:04
@yangw-dev yangw-dev had a problem deploying to upload-benchmark-results March 14, 2025 00:14 — with GitHub Actions Failure
@yangw-dev yangw-dev changed the title add test [Benchmark] Generate benchmark record for job failure Mar 14, 2025
@yangw-dev yangw-dev had a problem deploying to upload-benchmark-results March 14, 2025 02:44 — with GitHub Actions Failure
@yangw-dev yangw-dev had a problem deploying to upload-benchmark-results March 14, 2025 02:56 — with GitHub Actions Failure
@yangw-dev yangw-dev temporarily deployed to upload-benchmark-results March 14, 2025 04:44 — with GitHub Actions Inactive
@yangw-dev yangw-dev temporarily deployed to upload-benchmark-results March 14, 2025 05:16 — with GitHub Actions Inactive
@yangw-dev yangw-dev marked this pull request as ready for review March 14, 2025 06:06
@yangw-dev yangw-dev requested a review from ZainRizvi March 14, 2025 06:06
@yangw-dev yangw-dev temporarily deployed to upload-benchmark-results March 14, 2025 06:50 — with GitHub Actions Inactive
@yangw-dev yangw-dev temporarily deployed to upload-benchmark-results March 14, 2025 07:37 — with GitHub Actions Inactive
Job can be failed at two levels: GIT_JOB and DEVICE_JOB. If any job fails, generate failure benchmark record.
"""
artifacts = content.get("artifacts")
git_job_name = content["git_job_name"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A word of caution when trying to extract the information about the run from the job name. The job name comes from this line https://github.com/pytorch/executorch/blob/main/.ci/scripts/gather_benchmark_configs.py#L335. So, I think:

  • Make the error when failing to parse the job name in extract_model_info clearer by referring to gather_benchmark_configs script. Mostly likely, it has been updated without updating extract_benchmark_results
  • Add a comment on both script that they need to be in sync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

Copy link
Contributor Author

@yangw-dev yangw-dev Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also raise exception for get_app_type and get_device_os_type.
Added unittest to those cases too

Copy link
Contributor Author

@yangw-dev yangw-dev Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huydhn
not urgent: maybe we can add a unit test to check if this works as expected. could crate a common lib to share some configs between this script and another one.

i also added comment in perf.yml for job name step change too.
if this script uses out of those yml files, we can make the regex prefix more flexible to let user pass the step name for chekcing

@yangw-dev yangw-dev requested a review from huydhn March 14, 2025 22:00
@yangw-dev yangw-dev temporarily deployed to upload-benchmark-results March 14, 2025 23:06 — with GitHub Actions Inactive
@yangw-dev yangw-dev temporarily deployed to upload-benchmark-results March 14, 2025 23:42 — with GitHub Actions Inactive
@huydhn
Copy link
Contributor

huydhn commented Mar 15, 2025

While checking the results from your branch on the dashboard, I notice a curious issue where older commits from your branch have a newer timestamps:

Screenshot 2025-03-14 at 19 33 42

It seems like an issue for later.

Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's take an action item to notice the next failure from these workflows after this lands to double check the results on the dashboard before resolving the issue

@yangw-dev
Copy link
Contributor Author

While checking the results from your branch on the dashboard, I notice a curious issue where older commits from your branch have a newer timestamps:

Screenshot 2025-03-14 at 19 33 42

It seems like an issue for later.

interesting, created a issue here :pytorch/test-infra#6427

@yangw-dev
Copy link
Contributor Author

LGTM! Let's take an action item to notice the next failure from these workflows after this lands to double check the results on the dashboard before resolving the issue

sounds good, I also need to update UI

@yangw-dev yangw-dev merged commit 6ddeabd into main Mar 17, 2025
181 of 182 checks passed
@yangw-dev yangw-dev deleted the addFake branch March 17, 2025 20:17
DannyYuyang-quic pushed a commit to CodeLinaro/executorch that referenced this pull request Apr 2, 2025
# Description
compose failure benchmark record

# Related Query File 

https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_llms/query.sql


# Details
when a job fails in git_job level, or a device fails in the benchmark
test,
we return a benchmark record to indicate the failure, so that we can
properly render the HUD UI ti distinguish metris not run and metris run
with failures

This pr may introduce `Unknown` to fields in the HUD execubench table
temporarily, will fix it in HUD UI to handle special benchmark value.

for both level of failure, the metric name will be "FAILURE_REPORT".
In HUD, We mainly use the special metric name to identify failure, if
more information needed, we get which level the job fails in
benchmark.extra_info.

## Step Failure
When a failure detected:
we try to extract model info from git_job_name, the step will fail if
the model info cannot be extracted.


# Example of benchmark record for failure benchmark:
## When a job failed at device-job-level, 
- device_name:  get name from job_report.name . for instance `iPhone 15`
- device_os: job_report.os with prefix "Android" or "iOS". this should
match both android and ios setting
- model.name: extract from git job name
- model.backend: extract from git job name
- metric.name: "FAILURE_REPORT"

```
{
  "benchmark": {
    "name": "ExecuTorch",
    "mode": "inference",
    "extra_info": {
      "app_type": "IOS_APP",
      "job_conclusion": **"FAILED"**,
      "failure_type": **"DEVICE_JOB"**,
      "job_report": "..."
  },
  "model": {
    "name": **"ic4",**
    "type": "OSS model",
    "backend": **"mps"**
  },
  "metric": {
    "name": **"FAILURE_REPORT"**,
    "benchmark_values": 0,
    "target_value": 0,
    "extra_info": {
      "method": ""
    }
  },
  "runners": [
    {
      "name": **"iPhone 15"**,
      "type": **"iOS 18.0"**,
    }
  ]
}
```

## when a job failed at git-job-level (there is no job_reports)
this happens when a job fails before it runs the benchmark job

- device_name: device_pool_name from git job bane #exmaple:
sumsung_galaxy_22
- device_os: "Android" or "iOS"
- model.name: extract from git job name
- model.backend: extract from git job name
- metric.name: "FAILURE_REPORT"

the failure benchmark record looks like:
```
{
  "benchmark": {
    "name": "ExecuTorch",
    "mode": "inference",
    "extra_info": {
      "app_type": "IOS_APP",
      "job_conclusion": **"FAILURE"**,
      "failure_type": **"GIT_JOB"**,
      "job_report": "{}"
    }
  },
  "model": {
    "name": "ic4",
    "type": "OSS model",
    "backend": "mps"
  },
  "metric": {
    "name": "FAILURE_REPORT",
    ...
  },
  "runners": [
    {
      "name": "samsung_galaxy_s22",
      "type": "Android",
     ...
    }
  ]
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants