-
Notifications
You must be signed in to change notification settings - Fork 607
[Executorch][perf-ci] Fix perf ci #8374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Previous PR #7927 deecoupled max_seq_length from kv cache. That broke perf ci workflow. Fix that. Test Plan: Trigger it manually and check Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8374
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 553d875 with merge base 78752a0 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
The linked job in the PR summary doesn't run with the SpinQuant and QLora. You need to trigger the job using the model id on Hugging Face: |
Also the readme pages? https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md |
let me do this in follow up PR Actually let me just do it here |
What does this mean? Is there description as to how to trigger this. I followed steps here https://github.com/pytorch/executorch/tree/main/extension/benchmark |
Summary: Previous PR #7927 deecoupled max_seq_length from kv cache. That broke perf ci workflow. Fix that. Test Plan: Trigger it manually and check apple perf: https://github.com/pytorch/executorch/actions/runs/13267110949 android perf: https://github.com/pytorch/executorch/actions/runs/13267110908 Reviewers: Subscribers: Tasks: Tags: cc guangy10 huydhn kirklandsign shoumikhin [ghstack-poisoned]
this is updated. But I think I am gonna have to do one more round of scrubbing in subsequent PRs for various incarnations of llama |
You need to specify the models you want to benchmark against explicitly, separated by ",". In this case, they are "meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8,meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8". See the screenshot for example. Updated the screenshot. You need to run against your branch, not on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me
oh you are right. I forgot about that step. |
Stack from ghstack (oldest at bottom):
Summary:
Previous PR #7927 deecoupled max_seq_length from kv cache. That broke
perf ci workflow. Fix that.
Test Plan:
Trigger it manually and check
apple perf: https://github.com/pytorch/executorch/actions/runs/13267110949
android perf: https://github.com/pytorch/executorch/actions/runs/13267110908
Reviewers:
Subscribers:
Tasks:
Tags:
cc @guangy10 @huydhn @kirklandsign @shoumikhin