Skip to content

Commit 09d6889

Browse files
jerryzh168malfet
authored andcommitted
Update quantization.md (#483)
* Update quantization.md * Update quantization.md
1 parent c0f3caf commit 09d6889

File tree

1 file changed

+2
-4
lines changed

1 file changed

+2
-4
lines changed

docs/quantization.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -239,15 +239,13 @@ python3 generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_a8w4dq.pte | ...dso.
239239
## 4-bit Integer Linear Quantization with GPTQ (gptq)
240240
Compression offers smaller memory footprints (to fit on memory-constrained accelerators and mobile/edge devices) and reduced memory bandwidth (for better performance), but often at the price of quality degradation. GPTQ 4-bit integer quantization may be used to reduce the quality impact. To achieve good accuracy, we recommend the use of groupwise quantization where (small to mid-sized) groups of int4 weights share a scale.
241241

242-
**TODO (Jerry): GPTQ quantization documentation [#336](https://github.com/pytorch/torchchat/issues/336) **
243-
244242
We can use GPTQ with eager execution, optionally in conjunction with torch.compile:
245243
```
246-
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quantize '{"linear:int4" : {"groupsize": 32}}' --device [ cpu | cuda | mps ]
244+
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quantize '{"linear:int4-gptq" : {"groupsize": 32}}' --device [ cpu | cuda | mps ]
247245
```
248246

249247
```
250-
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:gptq": {"groupsize" : 32} }' [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso... ]
248+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:int4-gptq": {"groupsize" : 32} }' [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso... ]
251249
```
252250
Now you can run your model with the same command as before:
253251

0 commit comments

Comments
 (0)