Skip to content

Commit ea62e84

Browse files
authored
clean up gguf doc (#416)
* clean up gguf doc * update readme * typo * switch to python3
1 parent 06ac1da commit ea62e84

File tree

3 files changed

+30
-13
lines changed

3 files changed

+30
-13
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Torchchat is a small codebase to showcase running large language models (LLMs) w
33

44
## Highlights
55
- Command line interaction with popular LLMs such as Llama 3, Llama 2, Stories, Mistral and more
6-
- Supporting both GGUF fp32/16 and the Hugging Face checkpoint format
6+
- Supporting [some GGUF files](docs/GGUF.md) and the Hugging Face checkpoint format
77
- PyTorch-native execution with performance
88
- Supports popular hardware and OS
99
- Linux (x86)
@@ -41,7 +41,7 @@ Most models use HuggingFace as the distribution channel, so you will need to cre
4141
account.
4242

4343
Create a HuggingFace user access token [as documented here](https://huggingface.co/docs/hub/en/security-tokens).
44-
Run `huggingface-cli login`, which will prompt for the newly created token.
44+
Run `huggingface-cli login`, which will prompt for the newly created token.
4545

4646
Once this is done, torchchat will be able to download model artifacts from
4747
HuggingFace.

build/gguf_loader.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,9 @@
1515

1616
import torch
1717

18-
wd = Path(__file__).parent.resolve()
19-
sys.path.append(str(wd))
2018

2119
from gguf import GGUFValueType
22-
from model import ModelArgs, Transformer
20+
from .model import ModelArgs, Transformer
2321
from quantize import pack_scales_and_zeros, WeightOnlyInt4Linear
2422

2523
from build.gguf_util import Q4_0, to_float

docs/GGUF.md

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,19 @@
11
# Using GGUF Models
2-
We currently support the following models
2+
We support parsing [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) files with the following tensor types:
33
- F16
44
- F32
55
- Q4_0
66
- Q6_K
77

8+
If an unsupported type is encountered while parsing a GGUF file, an exception is raised.
89

9-
### Download
10-
First download a GGUF model and tokenizer. In this example, we use GGUF Q4_0 format.
10+
We now go over an example of using GGUF files in the torchchat flow.
11+
12+
### Download resources
13+
First download a GGUF model and tokenizer. In this example, we use a Q4_0 GGUF file. (Note that Q4_0 is only the dominant tensor type in the file, but the file also contains GGUF tensors of types Q6_K, F16, and F32.)
1114

1215
```
16+
# Download resources
1317
mkdir -p ggufs/open_orca
1418
cd ggufs/open_orca
1519
wget -O open_orca.Q4_0.gguf "https://huggingface.co/TheBloke/TinyLlama-1.1B-1T-OpenOrca-GGUF/resolve/main/tinyllama-1.1b-1t-openorca.Q4_0.gguf?download=true"
@@ -19,20 +23,35 @@ cd ../..
1923
2024
export GGUF_MODEL_PATH=ggufs/open_orca/open_orca.Q4_0.gguf
2125
export GGUF_TOKENIZER_PATH=ggufs/open_orca/tokenizer.model
26+
27+
# Define export paths for examples below
28+
export GGUF_SO_PATH=/tmp/gguf_model.so
2229
export GGUF_PTE_PATH=/tmp/gguf_model.pte
2330
```
2431

25-
### Generate eager
32+
### Eager generate
33+
We can generate text in eager mode as we did before, but we now pass a GGUF file path.
2634
```
27-
python torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "In a faraway land" --max-new-tokens 20
35+
python3 torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "Once upon a time" --max-new-tokens 15
2836
```
2937

30-
### ExecuTorch export + generate
38+
### AOTI export + generate
3139
```
3240
# Convert the model for use
33-
python torchchat.py export --gguf-path ${GGUF_MODEL_PATH} --output-pte-path ${GGUF_PTE_PATH}
41+
python3 torchchat.py export --gguf-path ${GGUF_MODEL_PATH} --output-dso-path ${GGUF_SO_PATH}
3442
3543
# Generate using the PTE model that was created by the export command
36-
python torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --pte-path ${GGUF_PTE_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "In a faraway land" --max-new-tokens 20
44+
python3 torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --dso-path ${GGUF_SO_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "Once upon a time" --max-new-tokens 15
45+
46+
```
47+
3748

49+
### ExecuTorch export + generate
50+
Before running this example, you must first [Set-up ExecuTorch](executorch_setup.md).
51+
```
52+
# Convert the model for use
53+
python3 torchchat.py export --gguf-path ${GGUF_MODEL_PATH} --output-pte-path ${GGUF_PTE_PATH}
54+
55+
# Generate using the PTE model that was created by the export command
56+
python3 torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --pte-path ${GGUF_PTE_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "Once upon a time" --max-new-tokens 15
3857
```

0 commit comments

Comments
 (0)