Skip to content

Commit c256940

Browse files
mikekgfbmalfet
authored andcommitted
Update ADVANCED-USERS.md (#529)
Update Advanced Users description to reflect changes in the repo since the description was initially created.
1 parent 035b05f commit c256940

File tree

1 file changed

+2
-68
lines changed

1 file changed

+2
-68
lines changed

docs/ADVANCED-USERS.md

Lines changed: 2 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -190,41 +190,6 @@ We use `[ optional input ]` to indicate optional inputs, and `[ choice
190190
1 | choice 2 | ... ]` to indicate a choice
191191

192192

193-
### A note on tokenizers
194-
195-
There are two different formats for tokenizers, and both are used in this repo.
196-
197-
1 - for generate.py and Python bindings, we use the Google
198-
sentencepiece Python operator and the TikToken tokenizer (for
199-
llama3). This operator consumes a tokenization model in the
200-
`tokenizer.model` format.
201-
202-
2 - for C/C++ inference, we use @Andrej Karpathy's C tokenizer
203-
function, as well as a C++ TikToken tokenizer (for llama3). This
204-
tokenizer consumes a tokenization model in the 'tokenizer.bin'
205-
format.
206-
207-
You can convert a SentencePiece tokenizer.model into tokenizer.bin
208-
using Andrej's tokenizer.py utility to convert the tokenizer.model to
209-
tokenizer.bin format:
210-
211-
```
212-
python3 utils/tokenizer.py --tokenizer-model=${MODEL_DIR}tokenizer.model
213-
```
214-
215-
We will later disucss how to use this model, as described under *STANDALONE EXECUTION* in a Python-free
216-
environment:
217-
```
218-
runner-{et,aoti}/build/run ${MODEL_OUT}/model.{so,pte} -z ${MODEL_OUT}/tokenizer.bin
219-
```
220-
221-
### Llama 3 tokenizer
222-
223-
Add option to load tiktoken tokenizer
224-
```
225-
--tiktoken
226-
```
227-
228193
## Generate
229194

230195
Model definition in model.py, generation code in generate.py. The
@@ -246,7 +211,7 @@ which are not available for exported DSO and PTE models.
246211

247212
## Eval
248213

249-
To be added. For basic eval instructions, please see the introductury
214+
For an introduction to the model evaluation tool `eval`, please see the introductury
250215
README.
251216

252217
In addition to running eval on models in eager mode (optionally
@@ -406,38 +371,7 @@ you can, for example, convert a quantized model to f16 format:
406371
${GGUF}/quantize --allow-requantize your_quantized_model.gguf fake_unquantized_model.gguf f16
407372
```
408373

409-
# Standalone Execution
410-
411-
In addition to running the exported and compiled models for server,
412-
desktop/laptop and mobile/edge devices by loading them in a PyTorch
413-
environment under the Python interpreter, these models can also be
414-
executed directly
415-
416-
## Desktop and Server Execution
417-
418-
This has been tested with Linux and x86 (using CPU ~and GPU~), and
419-
MacOS and ARM/Apple Silicon.
420-
421-
The runner-* directories show how to integrate AOTI- and ET-exported
422-
models in a C/C++ application when no Python environment is available.
423-
Integrate it with your own applications and adapt it to your own
424-
application and model needs! Each runner directory comes with a cmake
425-
build script. Please refer to this file for detailed build
426-
instructions, and adapt as appropriate for your system.
427-
428-
Build the runner like this
429-
```
430-
cd ./runner-aoti
431-
cmake -Bbuild -DCMAKE_PREFIX_PATH=`python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'`
432-
cmake --build build
433-
```
434-
435-
To run, use the following command (assuming you already generated the
436-
tokenizer.bin tokenizer model):
437-
438-
```
439-
LD_LIBRARY_PATH=$CONDA_PREFIX/lib ./build/run ../${MODEL_NAME}.so -z ../${MODEL_NAME}.bin
440-
```
374+
# Mobile Execution
441375

442376
## Mobile and Edge Execution Test (x86)
443377

0 commit comments

Comments
 (0)