Skip to content

[MPS] Add support for Int4 groupwise quantization #4623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DenisVieriu97
Copy link
Collaborator

@DenisVieriu97 DenisVieriu97 commented Aug 9, 2024

Add support for MPS Int4 per channel group-wise quantization through MPSGraph.


Testing:
AOT export

python -m examples.models.llama2.export_llama --checkpoint /Volumes/Source/weights/llama2/llama2-7b/llama-2-7b/consolidated.00.pth --params /Volumes/Source/weights/llama2/llama2-7b/llama-2-7b/params.json -kv --use_sdpa_with_kv_cache --mps -d fp32 --disable_dynamic_shape -qmode 8da4w -G 32

Runtime (note that macOS 15.0 (Sequoia) or iOS/iPadOS 18 for Int4 Quantization:

~/tools/buck2_old2 run examples/models/llama2:main -- --model_path=mps_llama2_q.pte --tokenizer_path=tokenizer_llama2.bin --prompt="What is the best place to visit in New York?"  --temperature=0

Answer:

What is the best place to visit in New York?
New York is a city that has something for everyone. Whether you’re looking for a place to relax and enjoy the sights, or you’re looking for a place to party and have a good time, New York has it all.
There are so many different places to visit in New York, it can be hard to decide where to go. But don’t worry, we’ve got you covered. We’ve compiled a list of the best

Note: this is dependent of #4574 to be merged first!

cc: @cccclai, @larryliu0820, @kimishpatel

Copy link

pytorch-bot bot commented Aug 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4623

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit 033c562 with merge base 6efc222 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 9, 2024
@DenisVieriu97 DenisVieriu97 changed the title Denis/mps int4 quantization Add support for Int4 groupwise quantization Aug 9, 2024
@DenisVieriu97 DenisVieriu97 changed the title Add support for Int4 groupwise quantization [MPS] Add support for Int4 groupwise quantization Aug 9, 2024
@larryliu0820 larryliu0820 requested a review from lucylq August 9, 2024 18:36
@facebook-github-bot
Copy link
Contributor

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@larryliu0820 larryliu0820 requested a review from shoumikhin August 9, 2024 18:38
@larryliu0820
Copy link
Contributor

This is awesome! It seems this PR includes all the changes in #4574?

@cccclai
Copy link
Contributor

cccclai commented Aug 9, 2024

this pr needs to be landed after the 4GB serialization pr.

@cccclai
Copy link
Contributor

cccclai commented Aug 9, 2024

Thanks for adding the pr. Really glad to have it enable llama models.

A separate question, looks like we're using the source tranform from -qmode 8da4w. If apply this pr to stories, are we using gpu or ANE?

@DenisVieriu97 DenisVieriu97 force-pushed the denis/mps_int4_quantization branch 2 times, most recently from 3eda42a to f2dcffc Compare August 10, 2024 00:52
@facebook-github-bot
Copy link
Contributor

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@DenisVieriu97 DenisVieriu97 force-pushed the denis/mps_int4_quantization branch from f2dcffc to 38c3a01 Compare August 12, 2024 19:56
@DenisVieriu97 DenisVieriu97 force-pushed the denis/mps_int4_quantization branch from 38c3a01 to 7a8a2c6 Compare August 13, 2024 21:33
@facebook-github-bot
Copy link
Contributor

@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

3 similar comments
@facebook-github-bot
Copy link
Contributor

@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@DenisVieriu97 DenisVieriu97 force-pushed the denis/mps_int4_quantization branch from 79acc62 to 033c562 Compare August 14, 2024 20:12
@facebook-github-bot
Copy link
Contributor

@shoumikhin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot facebook-github-bot merged commit 5c4a2a2 into pytorch:main Aug 15, 2024
87 of 93 checks passed
kirklandsign pushed a commit to kirklandsign/executorch that referenced this pull request Aug 15, 2024
Differential Revision: D61032289

Pull Request resolved: pytorch#4623
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants