-
Notifications
You must be signed in to change notification settings - Fork 607
Add torchao mps lowbit ops to llama runner #7037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add torchao mps lowbit ops to llama runner #7037
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7037
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Cancelled JobAs of commit d47141f with merge base daf9aee ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ad4dbaf
to
cd9a5fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits but largely looks good. Please include the output for llama runner in the summary.
Consider fixing the nits. Particularly I dont think we should add TORCHAO_MPS. Just overload TORCHAO one
if verbose: | ||
print("quantized model:", model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please using logging
b8f9360
to
b7f8f84
Compare
b7f8f84
to
8a10740
Compare
Setup ET: https://pytorch.org/executorch/stable/getting-started-setup
Install llama runner requirements
Build ET with MPS ON:
Build llama runner with torchao mps ops
Export model. Note: qmode can be any of
torchao:fpa1w
...torchao:fpa7w
, andgroup_size
can be any of: 32, 64, 128, 256Run: