-
Notifications
You must be signed in to change notification settings - Fork 608
Add HuggingFace Llama3.2 1B to benchmark #5368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5368
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Cancelled JobAs of commit e2779ee with merge base 8460d42 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
449b4d1
to
b48035a
Compare
53e7756
to
a13a44b
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
a13a44b
to
97050c2
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Upload model artifacts to GitHub is skipped https://github.com/pytorch/executorch/actions/runs/10858058150/job/30136354800. Don't see the reason behind from the log. The model artifacts are placed under |
Oops, the size of the export model is 11+ GB I think. I think uploading such large file to GH is taking too long and the job timed out.
I think I need to rework the upload part here as GH doesn't scale, so we need to go straight to S3. |
cd4c507
to
60b62d3
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
60b62d3
to
009f932
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Tried running the gemma-2b on Google Pixel 8 Pro (w/ 12GB RAM). The failure is same. Some I/O failures when connecting the device in the pool: https://github.com/pytorch/executorch/actions/runs/10908663134/job/30277474048. In the stacktrace I see there is a call |
I'm checking AWS doc on this https://docs.aws.amazon.com/devicefarm/latest/developerguide/limits.html and it mentions a 4GB limit, but that's for the size of the app, not the extra data archive. Let me run this manually using AWS UI and see if it accepts the model. The archive size is 5.4 GB https://github.com/pytorch/executorch/actions/runs/10908663134/job/30278173066#step:11:38. IIRC, llam2 7b works but it's only ~3GB |
9e89593
to
b2d837e
Compare
f936584
to
7b55bb9
Compare
7b55bb9
to
6cb6af9
Compare
cb3efe3
to
bedecd8
Compare
SpinQuant and QLORA are passing. |
Orignal BF16 is passing: |
bedecd8
to
e2779ee
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Decided to leaving the logics of running the 1b model in scheduled jobs in a separate PR to simplify the review as it will require significant refactoring in the workflow. |
Just a data point that I see your test run shows up on the dashboard at https://hud.pytorch.org/benchmark/llms?startTime=Wed%2C%2011%20Dec%202024%2004%3A07%3A58%20GMT&stopTime=Wed%2C%2018%20Dec%202024%2004%3A07%3A58%20GMT&granularity=hour&lBranch=add_hf_model_to_benchinfra&lCommit=e2779ee5cbe666072a2d0f7a6821d640a11d1ad9&rBranch=add_hf_model_to_benchinfra&rCommit=e2779ee5cbe666072a2d0f7a6821d640a11d1ad9&repoName=pytorch%2Fexecutorch&modelName=All%20Models&backendName=All%20Backends&dtypeName=All%20DType&deviceName=All%20Devices and the extraction logic looks wrong in which llama model has the backend and benchmark configs swapped. I guess this is what you means by introducing the new |
Add llama3.2 1b from Hugging Face to benchmark w/ the following configs:
Switched to use the memory intensive runners in the benchmark workflow to reduce operation cost.