clean up gguf doc (#416)

metascroy · web-flow · commit ea62e84df8a1 · 2024-04-23T16:08:56.000-07:00
* clean up gguf doc

* update readme

* typo

* switch to python3
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@ Torchchat is a small codebase to showcase running large language models (LLMs) w
 
 ## Highlights
 - Command line interaction with popular LLMs such as Llama 3, Llama 2, Stories, Mistral and more
-  - Supporting both GGUF fp32/16 and the Hugging Face checkpoint format
+  - Supporting [some GGUF files](docs/GGUF.md) and the Hugging Face checkpoint format
 - PyTorch-native execution with performance
 - Supports popular hardware and OS
   - Linux (x86)
@@ -41,7 +41,7 @@ Most models use HuggingFace as the distribution channel, so you will need to cre
 account.
 
 Create a HuggingFace user access token [as documented here](https://huggingface.co/docs/hub/en/security-tokens).
-Run `huggingface-cli login`, which will prompt for the newly created token.  
+Run `huggingface-cli login`, which will prompt for the newly created token.
 
 Once this is done, torchchat will be able to download model artifacts from
 HuggingFace.
diff --git a/build/gguf_loader.py b/build/gguf_loader.py
@@ -15,11 +15,9 @@
 
 import torch
 
-wd = Path(__file__).parent.resolve()
-sys.path.append(str(wd))
 
 from gguf import GGUFValueType
-from model import ModelArgs, Transformer
+from .model import ModelArgs, Transformer
 from quantize import pack_scales_and_zeros, WeightOnlyInt4Linear
 
 from build.gguf_util import Q4_0, to_float
diff --git a/docs/GGUF.md b/docs/GGUF.md
@@ -1,15 +1,19 @@
 # Using GGUF Models
-We currently support the following models
+We support parsing [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) files with the following tensor types:
 - F16
 - F32
 - Q4_0
 - Q6_K
 
+If an unsupported type is encountered while parsing a GGUF file, an exception is raised.
 
-### Download
-First download a GGUF model and tokenizer.  In this example, we use GGUF Q4_0 format.
+We now go over an example of using GGUF files in the torchchat flow.
+
+### Download resources
+First download a GGUF model and tokenizer.  In this example, we use a Q4_0 GGUF file.  (Note that Q4_0 is only the dominant tensor type in the file, but the file also contains GGUF tensors of types Q6_K, F16, and F32.)
 
 ```
+# Download resources
 mkdir -p ggufs/open_orca
 cd ggufs/open_orca
 wget -O open_orca.Q4_0.gguf "https://huggingface.co/TheBloke/TinyLlama-1.1B-1T-OpenOrca-GGUF/resolve/main/tinyllama-1.1b-1t-openorca.Q4_0.gguf?download=true"
@@ -19,20 +23,35 @@ cd ../..
 
 export GGUF_MODEL_PATH=ggufs/open_orca/open_orca.Q4_0.gguf
 export GGUF_TOKENIZER_PATH=ggufs/open_orca/tokenizer.model
+
+# Define export paths for examples below
+export GGUF_SO_PATH=/tmp/gguf_model.so
 export GGUF_PTE_PATH=/tmp/gguf_model.pte
 ```
 
-### Generate eager
+### Eager generate
+We can generate text in eager mode as we did before, but we now pass a GGUF file path.
 ```
-python torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "In a faraway land" --max-new-tokens 20
+python3 torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "Once upon a time" --max-new-tokens 15
 ```
 
-### ExecuTorch export + generate
+### AOTI export + generate
 ```
 # Convert the model for use
-python torchchat.py export --gguf-path ${GGUF_MODEL_PATH} --output-pte-path ${GGUF_PTE_PATH}
+python3 torchchat.py export --gguf-path ${GGUF_MODEL_PATH} --output-dso-path ${GGUF_SO_PATH}
 
 # Generate using the PTE model that was created by the export command
-python torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --pte-path ${GGUF_PTE_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "In a faraway land" --max-new-tokens 20
+python3 torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --dso-path ${GGUF_SO_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "Once upon a time" --max-new-tokens 15
+
+```
+
 
+### ExecuTorch export + generate
+Before running this example, you must first [Set-up ExecuTorch](executorch_setup.md).
+```
+# Convert the model for use
+python3 torchchat.py export --gguf-path ${GGUF_MODEL_PATH} --output-pte-path ${GGUF_PTE_PATH}
+
+# Generate using the PTE model that was created by the export command
+python3 torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --pte-path ${GGUF_PTE_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "Once upon a time" --max-new-tokens 15
 ```