Fix LinearInt8 recursive quantization #791

malfet · 2024-05-13T23:57:26Z

By balancing else: to the right if isinstance(child, nn.Linear):
Test plan:

% python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps
Using device=mps 
Loading model...
Time to load model: 29.03 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}}
Time to quantize model: 14.37 seconds

Fixes #788

Test plan: ``` % python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps Using device=mps Loading model... Time to load model: 29.03 seconds Quantizing the model with: {'linear:int8': {'groupsize': 0}} Time to quantize model: 14.37 seconds ``` Fixes #788

pytorch-bot · 2024-05-13T23:57:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/791

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Pending

As of commit e9e1b24 with merge base 50e61b5 ():

NEW FAILURES - The following jobs have failed:

.github/workflows/run-readme-pr-mps.yml (gh)
Run the aoti runner with CUDA using stories / test-runner-aot-cuda / linux-job (gh)
RuntimeError: Command docker exec -t 8c99568a3b3375f782a8cbe92800f3b820f85d96c677044263feec02fb92ceb1 /exec failed with exit code 134

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikekgfb

T H A N K Y O U !!! *

Test plan: ``` % python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps Using device=mps Loading model... Time to load model: 29.03 seconds Quantizing the model with: {'linear:int8': {'groupsize': 0}} Time to quantize model: 14.37 seconds ``` Fixes #788

malfet requested a review from mikekgfb May 13, 2024 23:57

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 13, 2024

mikekgfb approved these changes May 14, 2024

View reviewed changes

malfet merged commit 1298983 into main May 14, 2024

malfet deleted the malfet-patch-2 branch May 14, 2024 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix LinearInt8 recursive quantization #791

Fix LinearInt8 recursive quantization #791

Uh oh!

malfet commented May 13, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 13, 2024 •

edited

Loading

Uh oh!

mikekgfb left a comment

Uh oh!

Uh oh!

Fix LinearInt8 recursive quantization #791

Fix LinearInt8 recursive quantization #791

Uh oh!

Conversation

malfet commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/791

❌ 2 New Failures, 1 Pending

Uh oh!

mikekgfb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

malfet commented May 13, 2024 •

edited

Loading

pytorch-bot bot commented May 13, 2024 •

edited

Loading