Skip to content

Commit 43a69ba

Browse files
jackzhxngpytorchbot
authored andcommitted
Update Llama README.md for Stories110M tokenizer (#5960)
Summary: The tokenizer from `wget "https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model"` is TikToken, so we do not need to generate a `tokenizer.bin` and instead can just use the `tokenizer.model` as is. Pull Request resolved: #5960 Reviewed By: tarun292 Differential Revision: D64014160 Pulled By: dvorjackz fbshipit-source-id: 16474a73ed77192f58a5bb9e07426ba58216351e (cherry picked from commit 12cb9ca)
1 parent 7b93aa2 commit 43a69ba

File tree

1 file changed

+7
-8
lines changed

1 file changed

+7
-8
lines changed

examples/models/llama2/README.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -113,11 +113,6 @@ If you want to deploy and run a smaller model for educational purposes. From `ex
113113
```
114114
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
115115
```
116-
4. Create tokenizer.bin.
117-
118-
```
119-
python -m extension.llm.tokenizer.tokenizer -t <tokenizer.model> -o tokenizer.bin
120-
```
121116
122117
### Option C: Download and export Llama 3 8B instruct model
123118
@@ -127,7 +122,11 @@ You can export and run the original Llama 3 8B instruct model.
127122
128123
2. Export model and generate `.pte` file
129124
```
130-
python -m examples.models.llama2.export_llama --checkpoint <consolidated.00.pth> -p <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
125+
python -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32
126+
```
127+
4. Create tokenizer.bin.
128+
```
129+
python -m extension.llm.tokenizer.tokenizer -t <tokenizer.model> -o tokenizer.bin
131130
```
132131
133132
Due to the larger vocabulary size of Llama 3, we recommend quantizing the embeddings with `--embedding-quantize 4,32` as shown above to further reduce the model size.
@@ -187,7 +186,7 @@ tokenizer.path=<path_to_checkpoint_folder>/tokenizer.model
187186
188187
Using the same arguments from above
189188
```
190-
python -m examples.models.llama2.eval_llama -c <checkpoint.pth> -p <params.json> -t <tokenizer.model> -d fp32 --max_seq_len <max sequence length> --limit <number of samples>
189+
python -m examples.models.llama2.eval_llama -c <checkpoint.pth> -p <params.json> -t <tokenizer.model/bin> -d fp32 --max_seq_len <max sequence length> --limit <number of samples>
191190
```
192191

193192
The Uncyclotext results generated above used: `{max_seq_len: 2048, limit: 1000}`
@@ -233,7 +232,7 @@ Note for Mac users: There's a known linking issue with Xcode 15.1. Refer to the
233232
cmake-out/examples/models/llama2/llama_main --model_path=<model pte file> --tokenizer_path=<tokenizer.bin> --prompt=<prompt>
234233
```
235234
236-
For Llama3, you can pass the original `tokenizer.model` (without converting to `.bin` file).
235+
For Llama2 models, pass the converted `tokenizer.bin` file instead of `tokenizer.model`.
237236
238237
To build for CoreML backend and validate on Mac, replace `-DEXECUTORCH_BUILD_XNNPACK=ON` with `-DEXECUTORCH_BUILD_COREML=ON`
239238

0 commit comments

Comments
 (0)