Add CoreML Quantize #5228

YifanShenSZ · 2024-09-10T18:32:20Z

Motivation

Short term: TorchAO int4 quantization yields float zero point, but CoreML does not have good support for it yet. We will need CoreML int4 quantization for now.

Intermediate term: Before torch implements all CoreML-supported quantizations (e.g. palettization, sparcification, joint compression...), it will be great to have a way to use/experiment those CoreML quantizations.

Solution

In CoreML preprocess, we add CoreML quantization config as a compile spec

pytorch-bot · 2024-09-10T18:32:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5228

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit 554382a with merge base 7e374d7 ():

NEW FAILURES - The following jobs have failed:

Apple / test-demo-ios / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1
trunk / test-models-macos (cmake, llama2, portable, macos-m1-stable, 90) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Apple / upload-frameworks-ios (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

YifanShenSZ · 2024-09-10T20:32:22Z

@cccclai 🙏

cccclai · 2024-09-10T20:40:14Z

Hey could you share the command to run the script? The arg list is getting longer and it's hard to guess...

YifanShenSZ · 2024-09-10T21:21:47Z

Hey could you share the command to run the script? The arg list is getting longer and it's hard to guess...

Sure

python -m examples.models.llama2.export_llama -c Meta-Llama-3-8B/consolidated.00.pth -p Meta-Llama-3-8B/params.json --disable_dynamic_shape -kv --coreml --coreml-quantize b4w --coreml-enable-state

This is not the final command, though: we are adding fused sdpa

facebook-github-bot · 2024-09-10T21:46:16Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

examples/models/llama2/export_llama_lib.py

cccclai · 2024-09-11T02:27:35Z

Hey could you rebase? Run into land race and the other PR touched the same file merge first...

YifanShenSZ · 2024-09-11T03:13:23Z

Rebased ✅

GitHub is not showing conflict yet, though. Is the conflict change in Meta internal only for now? (And I need to wait until it gets exported?)

facebook-github-bot · 2024-09-11T03:25:51Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-11T16:21:26Z

@cccclai merged this pull request in 4da3c5d.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2024

YifanShenSZ force-pushed the coreml-quantization branch 2 times, most recently from 3bc734f to 455cea4 Compare September 10, 2024 19:14

YifanShenSZ changed the title ~~add coreml quantize~~ Add CoreML Quantize Sep 10, 2024

YifanShenSZ force-pushed the coreml-quantization branch from 455cea4 to 06dba4b Compare September 10, 2024 20:32

YifanShenSZ marked this pull request as ready for review September 10, 2024 20:32

cccclai added the ciflow/trunk label Sep 10, 2024

cccclai reviewed Sep 10, 2024

View reviewed changes

examples/models/llama2/export_llama_lib.py Outdated Show resolved Hide resolved

cccclai approved these changes Sep 10, 2024

View reviewed changes

This was referenced Sep 11, 2024

Preserve SDPA for CoreML YifanShenSZ/executorch#1

Closed

Preserve SDPA for CoreML #5258

Closed

yifan_shen3 added 2 commits September 10, 2024 20:11

add coreml quantize

4d92b32

address review comment: elaborate arg help info

554382a

YifanShenSZ force-pushed the coreml-quantization branch from f639e7c to 554382a Compare September 11, 2024 03:12

facebook-github-bot closed this in 4da3c5d Sep 11, 2024

facebook-github-bot added the Merged label Sep 11, 2024

YifanShenSZ mentioned this pull request Sep 11, 2024

Add 4-bit Groupwise Weight-Only Quantization for Core ML #4866

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CoreML Quantize #5228

Add CoreML Quantize #5228

Uh oh!

YifanShenSZ commented Sep 10, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading

Uh oh!

YifanShenSZ commented Sep 10, 2024

Uh oh!

cccclai commented Sep 10, 2024

Uh oh!

YifanShenSZ commented Sep 10, 2024

Uh oh!

facebook-github-bot commented Sep 10, 2024

Uh oh!

Uh oh!

cccclai commented Sep 11, 2024

Uh oh!

YifanShenSZ commented Sep 11, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

Uh oh!

Add CoreML Quantize #5228

Add CoreML Quantize #5228

Uh oh!

Conversation

YifanShenSZ commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Uh oh!

pytorch-bot bot commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5228

❌ 2 New Failures, 1 Unrelated Failure

Uh oh!

YifanShenSZ commented Sep 10, 2024

Uh oh!

cccclai commented Sep 10, 2024

Uh oh!

YifanShenSZ commented Sep 10, 2024

Uh oh!

facebook-github-bot commented Sep 10, 2024

Uh oh!

Uh oh!

cccclai commented Sep 11, 2024

Uh oh!

YifanShenSZ commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

Uh oh!

YifanShenSZ commented Sep 10, 2024 •

edited

Loading

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading

YifanShenSZ commented Sep 11, 2024 •

edited

Loading