-
Notifications
You must be signed in to change notification settings - Fork 607
[MPS] Add support for Int4 groupwise quantization #4623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MPS] Add support for Int4 groupwise quantization #4623
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4623
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 3 Unrelated FailuresAs of commit 033c562 with merge base 6efc222 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This is awesome! It seems this PR includes all the changes in #4574? |
this pr needs to be landed after the 4GB serialization pr. |
Thanks for adding the pr. Really glad to have it enable llama models. A separate question, looks like we're using the source tranform from |
3eda42a
to
f2dcffc
Compare
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
f2dcffc
to
38c3a01
Compare
38c3a01
to
7a8a2c6
Compare
@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
3 similar comments
@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
79acc62
to
033c562
Compare
@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Differential Revision: D61032289 Pull Request resolved: pytorch#4623
Add support for MPS Int4 per channel group-wise quantization through MPSGraph.
Testing:
AOT export
Runtime (note that macOS 15.0 (Sequoia) or iOS/iPadOS 18 for Int4 Quantization:
Answer:
Note: this is dependent of #4574 to be merged first!
cc: @cccclai, @larryliu0820, @kimishpatel