Skip to content

Commit 93db116

Browse files
committed
quantization
1 parent 16d7d83 commit 93db116

File tree

3 files changed

+7
-7
lines changed

3 files changed

+7
-7
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,7 @@ python3 torchchat.py export stories15M --output-pte-path stories15M.pte
233233
# Execute
234234
python3 torchchat.py generate --device cpu --pte-path stories15M.pte --prompt "Hello my name is"
235235
```
236+
* Note, to export a llama model, it's recommended to quantize with `--quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, “linear:a8w4dq”: {“groupsize” : 256}}'`
236237

237238
See below under [Mobile Execution](#run-mobile) if you want to deploy and execute a model in your iOS or Android app.
238239

docs/quantization.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -214,12 +214,12 @@ python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello,
214214
```
215215

216216
```
217-
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:int4": {"groupsize" : 32} }' [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.pte | --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso]
217+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:int4": {"groupsize" : 32} }' --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso
218218
```
219219
Now you can run your model with the same command as before:
220220

221221
```
222-
python3 generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.pte | --dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso] --prompt "Hello my name is"
222+
python3 generate.py --dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso --prompt "Hello my name is"
223223
```
224224

225225
## 4-Bit Integer Linear Quantization (a8w4dq)
@@ -247,12 +247,12 @@ python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello,
247247
```
248248

249249
```
250-
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:gptq": {"groupsize" : 32} }' [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso... ]
250+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:gptq": {"groupsize" : 32} }' --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_gptq.dso
251251
```
252252
Now you can run your model with the same command as before:
253253

254254
```
255-
python3 generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso...] --prompt "Hello my name is"
255+
python3 generate.py --dso-path ${MODEL_OUT}/${MODEL_NAME}_gptq.dso --prompt "Hello my name is"
256256
```
257257

258258
## 4-bit Integer Linear Quantization with HQQ (hqq)
@@ -267,12 +267,12 @@ python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello,
267267
```
268268

269269
```
270-
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:hqq": {"groupsize" : 32} }' [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_hqq.pte | ...dso... ]
270+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:hqq": {"groupsize" : 32} }' --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_hqq.dso
271271
```
272272
Now you can run your model with the same command as before:
273273

274274
```
275-
python3 generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_hqq.pte | ...dso...] --prompt "Hello my name is"
275+
python3 generate.py --dso-path ${MODEL_OUT}/${MODEL_NAME}_hqq.dso --prompt "Hello my name is"
276276
277277
278278
## Adding additional quantization schemes

download.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,6 @@ def download_and_convert(
9393
# overwriting if necessary.
9494
if os.path.isdir(model_dir):
9595
shutil.rmtree(model_dir)
96-
os.makedirs(model_dir, exist_ok=True)
9796
shutil.move(temp_dir, model_dir)
9897

9998
finally:

0 commit comments

Comments
 (0)