@@ -190,41 +190,6 @@ We use `[ optional input ]` to indicate optional inputs, and `[ choice
190
190
1 | choice 2 | ... ] ` to indicate a choice
191
191
192
192
193
- ### A note on tokenizers
194
-
195
- There are two different formats for tokenizers, and both are used in this repo.
196
-
197
- 1 - for generate.py and Python bindings, we use the Google
198
- sentencepiece Python operator and the TikToken tokenizer (for
199
- llama3). This operator consumes a tokenization model in the
200
- ` tokenizer.model ` format.
201
-
202
- 2 - for C/C++ inference, we use @Andrej Karpathy's C tokenizer
203
- function, as well as a C++ TikToken tokenizer (for llama3). This
204
- tokenizer consumes a tokenization model in the 'tokenizer.bin'
205
- format.
206
-
207
- You can convert a SentencePiece tokenizer.model into tokenizer.bin
208
- using Andrej's tokenizer.py utility to convert the tokenizer.model to
209
- tokenizer.bin format:
210
-
211
- ```
212
- python3 utils/tokenizer.py --tokenizer-model=${MODEL_DIR}tokenizer.model
213
- ```
214
-
215
- We will later disucss how to use this model, as described under * STANDALONE EXECUTION* in a Python-free
216
- environment:
217
- ```
218
- runner-{et,aoti}/build/run ${MODEL_OUT}/model.{so,pte} -z ${MODEL_OUT}/tokenizer.bin
219
- ```
220
-
221
- ### Llama 3 tokenizer
222
-
223
- Add option to load tiktoken tokenizer
224
- ```
225
- --tiktoken
226
- ```
227
-
228
193
## Generate
229
194
230
195
Model definition in model.py, generation code in generate.py. The
@@ -246,7 +211,7 @@ which are not available for exported DSO and PTE models.
246
211
247
212
## Eval
248
213
249
- To be added. For basic eval instructions , please see the introductury
214
+ For an introduction to the model evaluation tool ` eval ` , please see the introductury
250
215
README.
251
216
252
217
In addition to running eval on models in eager mode (optionally
@@ -406,38 +371,7 @@ you can, for example, convert a quantized model to f16 format:
406
371
${GGUF}/quantize --allow-requantize your_quantized_model.gguf fake_unquantized_model.gguf f16
407
372
```
408
373
409
- # Standalone Execution
410
-
411
- In addition to running the exported and compiled models for server,
412
- desktop/laptop and mobile/edge devices by loading them in a PyTorch
413
- environment under the Python interpreter, these models can also be
414
- executed directly
415
-
416
- ## Desktop and Server Execution
417
-
418
- This has been tested with Linux and x86 (using CPU ~ and GPU~ ), and
419
- MacOS and ARM/Apple Silicon.
420
-
421
- The runner-* directories show how to integrate AOTI- and ET-exported
422
- models in a C/C++ application when no Python environment is available.
423
- Integrate it with your own applications and adapt it to your own
424
- application and model needs! Each runner directory comes with a cmake
425
- build script. Please refer to this file for detailed build
426
- instructions, and adapt as appropriate for your system.
427
-
428
- Build the runner like this
429
- ```
430
- cd ./runner-aoti
431
- cmake -Bbuild -DCMAKE_PREFIX_PATH=`python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'`
432
- cmake --build build
433
- ```
434
-
435
- To run, use the following command (assuming you already generated the
436
- tokenizer.bin tokenizer model):
437
-
438
- ```
439
- LD_LIBRARY_PATH=$CONDA_PREFIX/lib ./build/run ../${MODEL_NAME}.so -z ../${MODEL_NAME}.bin
440
- ```
374
+ # Mobile Execution
441
375
442
376
## Mobile and Edge Execution Test (x86)
443
377
0 commit comments