Skip to content

Commit 587c8a1

Browse files
committed
create dir on download
1 parent f6e99bb commit 587c8a1

File tree

3 files changed

+8
-8
lines changed

3 files changed

+8
-8
lines changed

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Most models use HuggingFace as the distribution channel, so you will need to cre
4141
account.
4242

4343
Create a HuggingFace user access token [as documented here](https://huggingface.co/docs/hub/en/security-tokens).
44-
Run `huggingface-cli login`, which will prompt for the newly created token.
44+
Run `huggingface-cli login`, which will prompt for the newly created token.
4545

4646
Once this is done, torchchat will be able to download model artifacts from
4747
HuggingFace.
@@ -63,7 +63,6 @@ python3 torchchat.py download llama3
6363
* in Chat mode
6464
* in Generate mode
6565
* [Exporting for mobile via ExecuTorch](#export-executorch)
66-
* in Chat mode
6766
* in Generate mode
6867
* [Running exported executorch file on iOS or Android](#run-mobile)
6968

docs/quantization.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,7 @@ python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello,
214214
```
215215

216216
```
217-
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize "{'linear:int4': {'groupsize' : 32} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.pte | --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso]
217+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:int4": {"groupsize" : 32} }' [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.pte | --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso]
218218
```
219219
Now you can run your model with the same command as before:
220220

@@ -227,7 +227,7 @@ To compress your model even more, 4-bit integer quantization may be used. To ach
227227

228228
**TODO (Digant): a8w4dq eager mode support [#335](https://github.com/pytorch/torchchat/issues/335) **
229229
```
230-
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize "{'linear:a8w4dq': {'groupsize' : 7} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_8da4w.pte | ...dso... ]
230+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:a8w4dq": {"groupsize" : 7} }' [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_8da4w.pte | ...dso... ]
231231
```
232232

233233
Now you can run your model with the same command as before:
@@ -237,7 +237,7 @@ python3 generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_a8w4dq.pte | ...dso.
237237
```
238238

239239
## 4-bit Integer Linear Quantization with GPTQ (gptq)
240-
Compression offers smaller memory footprints (to fit on memory-constrained accelerators and mobile/edge devices) and reduced memory bandwidth (for better performance), but often at the price of quality degradation. GPTQ 4-bit integer quantization may be used to reduce the quality impact. To achieve good accuracy, we recommend the use of groupwise quantization where (small to mid-sized) groups of int4 weights share a scale.
240+
Compression offers smaller memory footprints (to fit on memory-constrained accelerators and mobile/edge devices) and reduced memory bandwidth (for better performance), but often at the price of quality degradation. GPTQ 4-bit integer quantization may be used to reduce the quality impact. To achieve good accuracy, we recommend the use of groupwise quantization where (small to mid-sized) groups of int4 weights share a scale.
241241

242242
**TODO (Jerry): GPTQ quantization documentation [#336](https://github.com/pytorch/torchchat/issues/336) **
243243

@@ -247,7 +247,7 @@ python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello,
247247
```
248248

249249
```
250-
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize "{'linear:gptq': {'groupsize' : 32} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso... ]
250+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:gptq": {"groupsize" : 32} }' [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso... ]
251251
```
252252
Now you can run your model with the same command as before:
253253

@@ -257,7 +257,7 @@ python3 generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso...
257257

258258
## 4-bit Integer Linear Quantization with HQQ (hqq)
259259

260-
Compression offers smaller memory footprints (to fit on memory-constrained accelerators and mobile/edge devices) and reduced memory bandwidth (for better performance), but often at the price of quality degradation. GPTQ 4-bit integer quantization may be used to reduce the quality impact, but at the cost of significant additional computation time. HQQ Quantization balances performance, accuracy, and runtime, we recommend the use of groupwise quantization where (small to mid-sized) groups of int4 weights share a scale.
260+
Compression offers smaller memory footprints (to fit on memory-constrained accelerators and mobile/edge devices) and reduced memory bandwidth (for better performance), but often at the price of quality degradation. GPTQ 4-bit integer quantization may be used to reduce the quality impact, but at the cost of significant additional computation time. HQQ Quantization balances performance, accuracy, and runtime, we recommend the use of groupwise quantization where (small to mid-sized) groups of int4 weights share a scale.
261261

262262
**TODO (Zhengxu): HQQ quantization documentation [#337](https://github.com/pytorch/torchchat/issues/336) **
263263

@@ -267,7 +267,7 @@ python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello,
267267
```
268268

269269
```
270-
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize "{'linear:hqq': {'groupsize' : 32} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_hqq.pte | ...dso... ]
270+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quantize '{"linear:hqq": {"groupsize" : 32} }' [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_hqq.pte | ...dso... ]
271271
```
272272
Now you can run your model with the same command as before:
273273

download.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ def download_and_convert(
9292
# overwriting if necessary.
9393
if os.path.isdir(model_dir):
9494
shutil.rmtree(model_dir)
95+
os.makedirs(model_dir, exist_ok=True)
9596
shutil.move(temp_dir, model_dir)
9697

9798
finally:

0 commit comments

Comments
 (0)