Skip to content

Fix LinearInt8 recursive quantization #791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 14, 2024
Merged

Fix LinearInt8 recursive quantization #791

merged 1 commit into from
May 14, 2024

Conversation

malfet
Copy link
Contributor

@malfet malfet commented May 13, 2024

By balancing else: to the right if isinstance(child, nn.Linear):
Test plan:

% python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps
Using device=mps 
Loading model...
Time to load model: 29.03 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}}
Time to quantize model: 14.37 seconds

Fixes #788

Test plan:
```
% python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps
Using device=mps 
Loading model...
Time to load model: 29.03 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}}
Time to quantize model: 14.37 seconds
```

Fixes #788
@malfet malfet requested a review from mikekgfb May 13, 2024 23:57
Copy link

pytorch-bot bot commented May 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/791

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Pending

As of commit e9e1b24 with merge base 50e61b5 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 13, 2024
Copy link
Contributor

@mikekgfb mikekgfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • T H A N K Y O U !!! *

@malfet malfet merged commit 1298983 into main May 14, 2024
@malfet malfet deleted the malfet-patch-2 branch May 14, 2024 00:41
malfet added a commit that referenced this pull request Jul 17, 2024
Test plan:
```
% python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps
Using device=mps 
Loading model...
Time to load model: 29.03 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}}
Time to quantize model: 14.37 seconds
```

Fixes #788
malfet added a commit that referenced this pull request Jul 17, 2024
Test plan:
```
% python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps
Using device=mps 
Loading model...
Time to load model: 29.03 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}}
Time to quantize model: 14.37 seconds
```

Fixes #788
malfet added a commit that referenced this pull request Jul 17, 2024
Test plan:
```
% python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps
Using device=mps 
Loading model...
Time to load model: 29.03 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}}
Time to quantize model: 14.37 seconds
```

Fixes #788
malfet added a commit that referenced this pull request Jul 17, 2024
Test plan:
```
% python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps
Using device=mps 
Loading model...
Time to load model: 29.03 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}}
Time to quantize model: 14.37 seconds
```

Fixes #788
malfet added a commit that referenced this pull request Jul 17, 2024
Test plan:
```
% python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps
Using device=mps 
Loading model...
Time to load model: 29.03 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}}
Time to quantize model: 14.37 seconds
```

Fixes #788
malfet added a commit that referenced this pull request Jul 17, 2024
Test plan:
```
% python3 torchchat.py generate llama2 --dtype float16 --quantize '{"linear:int8": {"groupsize": 0}}' --prompt "Once upon a time," --device mps
Using device=mps 
Loading model...
Time to load model: 29.03 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}}
Time to quantize model: 14.37 seconds
```

Fixes #788
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--quantize is doing something surprising
3 participants