Skip to content

Commit 09e695c

Browse files
committed
Switch all python command references to python3
1 parent a5af214 commit 09e695c

File tree

3 files changed

+45
-45
lines changed

3 files changed

+45
-45
lines changed

docs/GGUF.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,15 @@ export GGUF_PTE_PATH=/tmp/gguf_model.pte
2424

2525
### Generate eager
2626
```
27-
python torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "In a faraway land" --max-new-tokens 20
27+
python3 torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "In a faraway land" --max-new-tokens 20
2828
```
2929

3030
### ExecuTorch export + generate
3131
```
3232
# Convert the model for use
33-
python torchchat.py export --gguf-path ${GGUF_MODEL_PATH} --output-pte-path ${GGUF_PTE_PATH}
33+
python3 torchchat.py export --gguf-path ${GGUF_MODEL_PATH} --output-pte-path ${GGUF_PTE_PATH}
3434
3535
# Generate using the PTE model that was created by the export command
36-
python torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --pte-path ${GGUF_PTE_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "In a faraway land" --max-new-tokens 20
36+
python3 torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --pte-path ${GGUF_PTE_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "In a faraway land" --max-new-tokens 20
3737
3838
```

docs/MISC.md

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ tokenizer.py utility to convert the tokenizer.model to tokenizer.bin
202202
format:
203203

204204
```
205-
python utils/tokenizer.py --tokenizer-model=${MODEL_DIR}tokenizer.model
205+
python3 utils/tokenizer.py --tokenizer-model=${MODEL_DIR}tokenizer.model
206206
```
207207

208208
We will later disucss how to use this model, as described under *STANDALONE EXECUTION* in a Python-free
@@ -226,7 +226,7 @@ At present, we always use the torchchat model for export and import the checkpoi
226226
because we have tested that model with the export descriptions described herein.
227227

228228
```
229-
python generate.py --compile --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --device [ cuda | cpu | mps]
229+
python3 generate.py --compile --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --device [ cuda | cpu | mps]
230230
```
231231

232232
To squeeze out a little bit more performance, you can also compile the
@@ -240,12 +240,12 @@ though.
240240
Let's start by exporting and running a small model like stories15M.
241241

242242
```
243-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --output-pte-path ${MODEL_OUT}/model.pte
243+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --output-pte-path ${MODEL_OUT}/model.pte
244244
```
245245

246246
### AOT Inductor compilation and execution
247247
```
248-
python export.py --checkpoint-path ${MODEL_PATH} --device {cuda,cpu} --output-dso-path ${MODEL_OUT}/${MODEL_NAME}.so
248+
python3 export.py --checkpoint-path ${MODEL_PATH} --device {cuda,cpu} --output-dso-path ${MODEL_OUT}/${MODEL_NAME}.so
249249
```
250250

251251
When you have exported the model, you can test the model with the
@@ -256,7 +256,7 @@ exported model with the same interface, and support additional
256256
experiments to confirm model quality and speed.
257257

258258
```
259-
python generate.py --device {cuda,cpu} --dso-path ${MODEL_OUT}/${MODEL_NAME}.so --prompt "Hello my name is"
259+
python3 generate.py --device {cuda,cpu} --dso-path ${MODEL_OUT}/${MODEL_NAME}.so --prompt "Hello my name is"
260260
```
261261

262262
While we have shown the export and execution of a small model on CPU
@@ -278,7 +278,7 @@ delegates such as Vulkan, CoreML, MPS, HTP in addition to Xnnpack as they are re
278278
With the model exported, you can now generate text with the executorch runtime pybindings. Feel free to play around with the prompt.
279279

280280
```
281-
python generate.py --checkpoint-path ${MODEL_PATH} --pte ${MODEL_OUT}/model.pte --device cpu --prompt "Once upon a time"
281+
python3 generate.py --checkpoint-path ${MODEL_PATH} --pte ${MODEL_OUT}/model.pte --device cpu --prompt "Once upon a time"
282282
```
283283

284284
You can also run the model with the runner-et. See below under "Standalone Execution".
@@ -322,8 +322,8 @@ linear operator (asymmetric) with HQQ | n/a | work in progress | n/a |
322322
You can generate models (for both export and generate, with eager, torch.compile, AOTI, ET, for all backends - mobile at present will primarily support fp32, with all options)
323323
specify the precision of the model with
324324
```
325-
python generate.py --dtype [bf16 | fp16 | fp32] ...
326-
python export.py --dtype [bf16 | fp16 | fp32] ...
325+
python3 generate.py --dtype [bf16 | fp16 | fp32] ...
326+
python3 export.py --dtype [bf16 | fp16 | fp32] ...
327327
```
328328

329329
Unlike gpt-fast which uses bfloat16 as default, Torch@ uses float32 as the default. As a consequence you will have to set to `--dtype bf16` or `--dtype fp16` on server / desktop for best performance.
@@ -366,35 +366,35 @@ We can do this in eager mode (optionally with torch.compile), we use the `embedd
366366
groupsize set to 0 which uses channelwise quantization:
367367

368368
```
369-
python generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"embedding" : {"bitwidth": 8, "groupsize": 0}}' --device cpu
369+
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"embedding" : {"bitwidth": 8, "groupsize": 0}}' --device cpu
370370
```
371371

372372
Then, export as follows:
373373
```
374-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"embedding": {"bitwidth": 8, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte
374+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"embedding": {"bitwidth": 8, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte
375375
```
376376

377377
Now you can run your model with the same command as before:
378378
```
379-
python generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.pte --prompt "Hello my name is"
379+
python3 generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.pte --prompt "Hello my name is"
380380
```
381381

382382
*Groupwise quantization*:
383383

384384
We can do this in eager mode (optionally with `torch.compile`), we use the `embedding` quantizer by specifying the group size:
385385

386386
```
387-
python generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"embedding" : {"bitwidth": 8, "groupsize": 8}}' --device cpu
387+
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"embedding" : {"bitwidth": 8, "groupsize": 8}}' --device cpu
388388
```
389389

390390
Then, export as follows:
391391
```
392-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"embedding": {"bitwidth": 8, "groupsize": 8} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte
392+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"embedding": {"bitwidth": 8, "groupsize": 8} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte
393393
```
394394

395395
Now you can run your model with the same command as before:
396396
```
397-
python generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte --prompt "Hello my name is"
397+
python3 generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte --prompt "Hello my name is"
398398
```
399399

400400
#### Embedding quantization (4 bit integer, channelwise & groupwise)
@@ -410,35 +410,35 @@ We can do this in eager mode (optionally with torch.compile), we use the `embedd
410410
groupsize set to 0 which uses channelwise quantization:
411411

412412
```
413-
python generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"embedding" : {"bitwidth": 4, "groupsize": 0}}' --device cpu
413+
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"embedding" : {"bitwidth": 4, "groupsize": 0}}' --device cpu
414414
```
415415

416416
Then, export as follows:
417417
```
418-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"embedding": {"bitwidth": 4, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte
418+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"embedding": {"bitwidth": 4, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte
419419
```
420420

421421
Now you can run your model with the same command as before:
422422
```
423-
python generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.pte --prompt "Hello my name is"
423+
python3 generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.pte --prompt "Hello my name is"
424424
```
425425

426426
*Groupwise quantization*:
427427

428428
We can do this in eager mode (optionally with `torch.compile`), we use the `embedding` quantizer by specifying the group size:
429429

430430
```
431-
python generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"embedding" : {"bitwidth": 4, "groupsize": 8}}' --device cpu
431+
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"embedding" : {"bitwidth": 4, "groupsize": 8}}' --device cpu
432432
```
433433

434434
Then, export as follows:
435435
```
436-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"embedding": {"bitwidth": 4, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte
436+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"embedding": {"bitwidth": 4, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte
437437
```
438438

439439
Now you can run your model with the same command as before:
440440
```
441-
python generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte --prompt "Hello my name is"
441+
python3 generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_emb8b-gw256.pte --prompt "Hello my name is"
442442
```
443443

444444
#### Linear 8 bit integer quantization (channel-wise and groupwise)
@@ -455,55 +455,55 @@ We can do this in eager mode (optionally with torch.compile), we use the `linear
455455
groupsize set to 0 which uses channelwise quantization:
456456

457457
```
458-
python generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"linear:int8" : {"bitwidth": 8, "groupsize": 0}}' --device cpu
458+
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"linear:int8" : {"bitwidth": 8, "groupsize": 0}}' --device cpu
459459
```
460460

461461
Then, export as follows using ExecuTorch for mobile backends:
462462
```
463-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"linear:int8": {"bitwidth": 8, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.pte
463+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"linear:int8": {"bitwidth": 8, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.pte
464464
```
465465

466466
Now you can run your model with the same command as before:
467467
```
468-
python generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.pte --checkpoint-path ${MODEL_PATH} --prompt "Hello my name is"
468+
python3 generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.pte --checkpoint-path ${MODEL_PATH} --prompt "Hello my name is"
469469
```
470470

471471
Or, export as follows for server/desktop deployments:
472472
```
473-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"linear:int8": {"bitwidth": 8, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.so
473+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"linear:int8": {"bitwidth": 8, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int8.so
474474
```
475475

476476
Now you can run your model with the same command as before:
477477
```
478-
python generate.py --dso-path ${MODEL_OUT}/${MODEL_NAME}_int8.so --checkpoint-path ${MODEL_PATH} --prompt "Hello my name is"
478+
python3 generate.py --dso-path ${MODEL_OUT}/${MODEL_NAME}_int8.so --checkpoint-path ${MODEL_PATH} --prompt "Hello my name is"
479479
```
480480

481481
*Groupwise quantization*:
482482

483483
We can do this in eager mode (optionally with `torch.compile`), we use the `linear:int8` quantizer by specifying the group size:
484484

485485
```
486-
python generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"linear:int8" : {"bitwidth": 8, "groupsize": 8}}' --device cpu
486+
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --quant '{"linear:int8" : {"bitwidth": 8, "groupsize": 8}}' --device cpu
487487
```
488488

489489
Then, export as follows using ExecuTorch:
490490
```
491-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"linear:int8": {"bitwidth": 8, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int8-gw256.pte
491+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"linear:int8": {"bitwidth": 8, "groupsize": 0} }' --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int8-gw256.pte
492492
```
493493

494494
Now you can run your model with the same command as before:
495495
```
496-
python generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8-gw256.pte --checkpoint-path ${MODEL_PATH} --prompt "Hello my name is"
496+
python3 generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8-gw256.pte --checkpoint-path ${MODEL_PATH} --prompt "Hello my name is"
497497
```
498498

499499
Or, export as follows for :
500500
```
501-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"linear:int8": {"bitwidth": 8, "groupsize": 0} }' --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_int8-gw256.so
501+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant '{"linear:int8": {"bitwidth": 8, "groupsize": 0} }' --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_int8-gw256.so
502502
```
503503

504504
Now you can run your model with the same command as before:
505505
```
506-
python generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8-gw256.so --checkpoint-path ${MODEL_PATH} -d fp32 --prompt "Hello my name is"
506+
python3 generate.py --pte-path ${MODEL_OUT}/${MODEL_NAME}_int8-gw256.so --checkpoint-path ${MODEL_PATH} -d fp32 --prompt "Hello my name is"
507507
```
508508

509509
Please note that group-wise quantization works functionally, but has
@@ -515,36 +515,36 @@ operator.
515515
To compress your model even more, 4-bit integer quantization may be used. To achieve good accuracy, we recommend the use
516516
of groupwise quantization where (small to mid-sized) groups of int4 weights share a scale.
517517
```
518-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant "{'linear:int4': {'groupsize' : 32} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.pte | --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso]
518+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant "{'linear:int4': {'groupsize' : 32} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.pte | --output-dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso]
519519
```
520520

521521
Now you can run your model with the same command as before:
522522
```
523-
python generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.pte | --dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso] --prompt "Hello my name is"
523+
python3 generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.pte | --dso-path ${MODEL_OUT}/${MODEL_NAME}_int4-gw32.dso] --prompt "Hello my name is"
524524
```
525525

526526
#### 4-bit integer quantization (8da4w)
527527
To compress your model even more, 4-bit integer quantization may be used. To achieve good accuracy, we recommend the use
528528
of groupwise quantization where (small to mid-sized) groups of int4 weights share a scale. We also quantize activations to 8-bit, giving
529529
this scheme its name (8da4w = 8b dynamically quantized activations with 4b weights), and boost performance.
530530
```
531-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant "{'linear:8da4w': {'groupsize' : 7} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_8da4w.pte | ...dso... ]
531+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant "{'linear:8da4w': {'groupsize' : 7} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_8da4w.pte | ...dso... ]
532532
```
533533

534534
Now you can run your model with the same command as before:
535535
```
536-
python generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_8da4w.pte | ...dso...] --prompt "Hello my name is"
536+
python3 generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_8da4w.pte | ...dso...] --prompt "Hello my name is"
537537
```
538538

539539
#### Quantization with GPTQ (gptq)
540540

541541
```
542-
python export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant "{'linear:gptq': {'groupsize' : 32} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso... ] # may require additional options, check with AO team
542+
python3 export.py --checkpoint-path ${MODEL_PATH} -d fp32 --quant "{'linear:gptq': {'groupsize' : 32} }" [ --output-pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso... ] # may require additional options, check with AO team
543543
```
544544

545545
Now you can run your model with the same command as before:
546546
```
547-
python generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso...] --prompt "Hello my name is"
547+
python3 generate.py [ --pte-path ${MODEL_OUT}/${MODEL_NAME}_gptq.pte | ...dso...] --prompt "Hello my name is"
548548
```
549549

550550
#### Adding additional quantization schemes (hqq)

docs/runner_build.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Options:
1919
To build runner-aoti, run the following commands *from the torchchat root directory*
2020

2121
```
22-
cmake -S ./runner-aoti -B ./runner-aoti/cmake-out -G Ninja -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'`
22+
cmake -S ./runner-aoti -B ./runner-aoti/cmake-out -G Ninja -DCMAKE_PREFIX_PATH=`python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'`
2323
cmake --build ./runner-aoti/cmake-out
2424
```
2525

@@ -29,8 +29,8 @@ Let us try using it with an example.
2929
We first download stories15M and export it to AOTI.
3030

3131
```
32-
python torchchat.py download stories15M
33-
python torchchat.py export stories15M --output-dso-path ./model.so
32+
python3 torchchat.py download stories15M
33+
python3 torchchat.py export stories15M --output-dso-path ./model.so
3434
```
3535

3636
We can now execute the runner with:
@@ -41,7 +41,7 @@ wget -O ./tokenizer.bin https://github.com/karpathy/llama2.c/raw/master/tokenize
4141
```
4242

4343
## Building and running runner-et
44-
Before building runner-et, you must first set-up ExecuTorch by following [Set-up Executorch](executorch_setup.md).
44+
Before building runner-et, you must first setup ExecuTorch by following [setup ExecuTorch steps](executorch_setup.md).
4545

4646

4747
To build runner-et, run the following commands *from the torchchat root directory*
@@ -58,8 +58,8 @@ Let us try using it with an example.
5858
We first download stories15M and export it to ExecuTorch.
5959

6060
```
61-
python torchchat.py download stories15M
62-
python torchchat.py export stories15M --output-pte-path ./model.pte
61+
python3 torchchat.py download stories15M
62+
python3 torchchat.py export stories15M --output-pte-path ./model.pte
6363
```
6464

6565
We can now execute the runner with:

0 commit comments

Comments
 (0)