Skip to content

Commit d3582a0

Browse files
mikekgfbkimishpatel
authored andcommitted
Fix quantization doc to specify dytpe limitation on a8w4dq (#629)
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]>
1 parent fe75b16 commit d3582a0

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

docs/quantization.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ While quantization can potentially degrade the model's performance, the methods
1111
| compression | FP Precision | bitwidth| group size | dynamic activation quantization | Eager | AOTI | ExecuTorch |
1212
|--|--|--|--|--|--|--|--|
1313
| linear (asymmetric) | fp32, fp16, bf16 | [8, 4]* | [32, 64, 128, 256]** | ||||
14-
| linear with dynamic activations (symmetric) | | | [32, 64, 128, 256]** | a8w4dq | | ||
14+
| linear with dynamic activations (symmetric) | fp32^ | | [32, 64, 128, 256]** | a8w4dq | 🚧 |🚧 ||
1515
| linear with GPTQ*** (asymmetric) | | |[32, 64, 128, 256]** | ||||
1616
| linear with HQQ*** (asymmetric) | | |[32, 64, 128, 256]** | ||||
1717

@@ -22,6 +22,8 @@ Due to the larger vocabulary size of llama3, we also recommend quantizing the em
2222
|--|--|--|--|--|--|--|--|
2323
| embedding (symmetric) | fp32, fp16, bf16 | [8, 4]* | [32, 64, 128, 256]** | ||||
2424

25+
^a8w4dq quantization scheme requires model to be converted to fp32, due to lack of support for fp16 and bf16.
26+
2527
*These are the only valid bitwidth options.
2628

2729
**There are many valid group size options, including 512, 1024, etc. Note that smaller groupsize tends to be better for preserving model quality and accuracy, and larger groupsize for further improving performance. Set 0 for channelwise quantization.
@@ -65,13 +67,13 @@ python3 generate.py [--compile] llama3 --prompt "Hello, my name is" --quantize '
6567
```
6668
### AOTI
6769
```
68-
python3 torchchat.py export llama3 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}' --output-dso-path llama3.dso
70+
python3 torchchat.py export llama3 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:int4": {"groupsize" : 256}}' --output-dso-path llama3.dso
6971
7072
python3 generate.py --dso-path llama3.dso --prompt "Hello my name is"
7173
```
7274
### ExecuTorch
7375
```
74-
python3 torchchat.py export llama3 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}' --output-pte-path llama3.pte
76+
python3 torchchat.py export llama3 --dtype fp32 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}' --output-pte-path llama3.pte
7577
7678
python3 generate.py --pte-path llama3.pte --prompt "Hello my name is"
7779
```

0 commit comments

Comments
 (0)