Update quantize.py to use torchao Quantizers #882

larryliu0820 · 2024-07-03T23:21:23Z

Summary:

Remove duplicate code for Int8DynActInt4WeightQuantizer and use torchao API.

Test Plan:

python torchchat.py generate llama2 --quantize '{"linear:a8w4dq": {"groupsize": 256}, "precision": {"dtype":"float16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-07-03T23:21:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/882

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job

As of commit f85339a with merge base ee681bf ():

CANCELLED JOB - The following job was cancelled. Please retry:

Run the README instructions - with stories - on MPS/MacOS / test-readme-mps-macos / macos-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jack-Khuu

Looking good contextually, we'll want a stamp from the AO side though

quantization/quantize.py

byjlw

Can you update the readme and the --help to explain the --quantize flag?
Also when you pass --quantize during generate what exactly happens if the model on disk is fp16 and you want 8bit? Does it quantize then and there? Does it get saved for future use? We need enough information in the help and readme for users to know what to expect and how to do what they want.

install_requirements.sh

larryliu0820 · 2024-07-15T22:52:07Z

Can you update the readme and the --help to explain the --quantize flag? Also when you pass --quantize during generate what exactly happens if the model on disk is fp16 and you want 8bit? Does it quantize then and there? Does it get saved for future use? We need enough information in the help and readme for users to know what to expect and how to do what they want.

Good question we will make sure that is added. Basically I believe --quantize only has effect on the model in memory but not the one on disk, unless we explicitly ask generate.py to save it.

jackzhxng

I just pushed out a fix for the failing tests, all compile tests now pass for x86. We should be able to get this across the line now. Approved pending re-review from @Jack-Khuu and @byjlw

install_requirements.sh

Early approval, I thought only the test that I fixed was failing. Needs more work to pass tests.

Summary: Remove duplicate code for Int4WeightOnlyQuantizer and Int8DynActInt4WeightQuantizer and use torchao API. Test Plan: ``` python torchchat.py generate llama2 --quantize '{"linear:int4": {"groupsize": 256}, "precision": {"dtype":"float16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 python torchchat.py generate llama2 --quantize '{"linear:a8w4dq": {"groupsize": 256}, "precision": {"dtype":"float16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 ``` Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Jack-Khuu

Looks legit, I'll can check on the perf when it land

larryliu0820 · 2024-07-17T22:35:29Z

@byjlw We will add documentation in next PR. Any other concerns?

Merge this PR asap

larryliu0820 requested a review from Jack-Khuu July 3, 2024 23:21

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 3, 2024

larryliu0820 requested a review from jerryzh168 July 3, 2024 23:22

Jack-Khuu reviewed Jul 3, 2024

View reviewed changes

quantization/quantize.py Outdated Show resolved Hide resolved

quantization/quantize.py Outdated Show resolved Hide resolved

Jack-Khuu requested a review from HDCharles July 3, 2024 23:49

Jack-Khuu reviewed Jul 5, 2024

View reviewed changes

quantization/quantize.py Outdated Show resolved Hide resolved

byjlw previously requested changes Jul 9, 2024

View reviewed changes

install_requirements.sh Show resolved Hide resolved

jackzhxng force-pushed the use_ao branch from e33a233 to bd47bfc Compare July 15, 2024 23:45

jackzhxng previously approved these changes Jul 15, 2024

View reviewed changes

install_requirements.sh Show resolved Hide resolved

malfet force-pushed the main branch from 85c4e71 to ee681bf Compare July 17, 2024 17:37

larryliu0820 and others added 17 commits July 17, 2024 11:06

Fix import

3a4cf0a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Install torchao from gh

c56aed6

Explain import

a5c9859

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Fix dependencies

fa6ecf5

Test ao PR #479

35e88de

Update torchao hash

c2f5972

Update torchao pin

1a4aa1e

Fix scheduler bf16/fp16 mix error

4eddac4

Incorporate torchao changes

4da2ef9

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

update hash

0a7beba

Fix GPU CI job

f1dd625

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

More fix

a4f3167

Fix executorch CI job

19cdd9e

Use quant api for int4 weight only quantization

a4df06b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Fix

f530db1

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Fix again

431b083

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

larryliu0820 added 4 commits July 17, 2024 11:06

Fix 3

35a16b6

Fix 4

4f75e44

Try something

8794941

debug

84981d9

larryliu0820 force-pushed the use_ao branch from 4fd4fb5 to 84981d9 Compare July 17, 2024 18:06

Only migrate 8a4w

f85339a

Jack-Khuu approved these changes Jul 17, 2024

View reviewed changes

larryliu0820 requested a review from byjlw July 17, 2024 22:34

larryliu0820 merged commit e1914fa into main Jul 17, 2024
50 of 51 checks passed

jackzhxng deleted the use_ao branch July 17, 2024 23:45

jackzhxng restored the use_ao branch July 17, 2024 23:57

jackzhxng mentioned this pull request Jul 18, 2024

Update quantize.py to use AO's int4 quantizer #919

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update quantize.py to use torchao Quantizers #882

Update quantize.py to use torchao Quantizers #882

Uh oh!

larryliu0820 commented Jul 3, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 3, 2024 •

edited

Loading

Uh oh!

Jack-Khuu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

byjlw left a comment

Uh oh!

Uh oh!

larryliu0820 commented Jul 15, 2024

Uh oh!

jackzhxng left a comment

Uh oh!

Uh oh!

Jack-Khuu left a comment

Uh oh!

larryliu0820 commented Jul 17, 2024

Uh oh!

Uh oh!

Uh oh!

Update quantize.py to use torchao Quantizers #882

Update quantize.py to use torchao Quantizers #882

Uh oh!

Conversation

larryliu0820 commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/882

❌ 1 Cancelled Job

Uh oh!

Jack-Khuu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

byjlw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

larryliu0820 commented Jul 15, 2024

Uh oh!

jackzhxng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jack-Khuu left a comment

Choose a reason for hiding this comment

Uh oh!

larryliu0820 commented Jul 17, 2024

Uh oh!

Uh oh!

Uh oh!

larryliu0820 commented Jul 3, 2024 •

edited

Loading

pytorch-bot bot commented Jul 3, 2024 •

edited

Loading