@@ -18,10 +18,10 @@ Torchchat is currently in a pre-release state and under extensive development.
18
18
[ shell default] : TORCHCHAT_ROOT=${PWD} ./torchchat/utils/scripts/install_et.sh
19
19
20
20
21
- This is the advanced users guide, if you're looking to get started
21
+ This is the advanced users' guide, if you're looking to get started
22
22
with LLMs, please refer to the README at the root directory of the
23
23
torchchat distro. This is an advanced user guide, so we will have
24
- many more concepts and options to discuss and taking advantage of them
24
+ many more concepts and options to discuss and take advantage of them
25
25
may take some effort.
26
26
27
27
We welcome community contributions of all kinds. If you find
@@ -41,7 +41,7 @@ While we strive to support a broad range of models, we can't test them
41
41
all. We classify supported models as tested ✅, work in progress 🚧 or
42
42
some restrictions ❹.
43
43
44
- We invite community contributions of new model suport and test results!
44
+ We invite community contributions of new model support and test results!
45
45
46
46
| Model | Tested | Eager | torch.compile | AOT Inductor | ExecuTorch | Fits on Mobile |
47
47
| -----| --------| -------| -----| -----| -----| -----|
@@ -86,7 +86,7 @@ Server C++ runtime | n/a | run.cpp model.pte | ✅ |
86
86
Mobile C++ runtime | n/a | app model.pte | ✅ |
87
87
Mobile C++ runtime | n/a | app + AOTI | 🚧 |
88
88
89
- ** Getting help:** Each command implements the --help option to give addititonal information about available options:
89
+ ** Getting help:** Each command implements the --help option to give additional information about available options:
90
90
91
91
[ skip default ] : begin
92
92
```
@@ -96,8 +96,8 @@ python3 torchchat.py [ export | generate | chat | eval | ... ] --help
96
96
97
97
Exported models can be loaded back into torchchat for chat or text
98
98
generation, letting you experiment with the exported model and valid
99
- model quality. The python interface is the same in all cases and is
100
- used for testing nad test harnesses too.
99
+ model quality. The Python interface is the same in all cases and is
100
+ used for testing and test harnesses, too.
101
101
102
102
Torchchat comes with server C++ runtimes to execute AOT Inductor and
103
103
ExecuTorch models. A mobile C++ runtimes allow you to deploy
@@ -115,7 +115,7 @@ Some common models are recognized by torchchat based on their filename
115
115
through ` Model.from_name() ` to perform a fuzzy match against a
116
116
table of known model architectures. Alternatively, you can specify the
117
117
index into that table with the option ` --params-table ${INDEX} ` where
118
- the index is the lookup key key in the [ the list of known
118
+ the index is the lookup key in the [ the list of known
119
119
pconfigurations] ( https://github.com/pytorch/torchchat/tree/main/torchchat/model_params )
120
120
For example, for the stories15M model, this would be expressed as
121
121
` --params-table stories15M ` . (We use the model constructor
@@ -237,7 +237,7 @@ which chooses the best 16-bit floating point type.
237
237
238
238
The virtual device fast and virtual floating point data types fast and
239
239
fast16 are best used for eager/torch.compiled execution. For export,
240
- specify the your device choice for the target system with --device for
240
+ specify your device choice for the target system with --device for
241
241
AOTI-exported DSO models, and using ExecuTorch delegate selection for
242
242
ExecuTorch-exported PTE models.
243
243
@@ -250,8 +250,7 @@ python3 torchchat.py generate [--compile] --checkpoint-path ${MODEL_PATH} --prom
250
250
To improve performance, you can compile the model with ` --compile `
251
251
trading off the time to first token processed with time per token. To
252
252
improve performance further, you may also compile the prefill with
253
- ` --compile_prefill ` . This will increase further compilation times though. The
254
- ` --compile-prefill ` option is not compatible with ` --prefill-prefill ` .
253
+ ` --compile-prefill ` . This will increase further compilation times though.
255
254
256
255
Parallel prefill is not yet supported by exported models, and may be
257
256
supported in a future release.
@@ -265,7 +264,7 @@ the introductory README.
265
264
In addition to running eval on models in eager mode and JIT-compiled
266
265
mode with ` torch.compile() ` , you can also load dso and pte models back
267
266
into the PyTorch to evaluate the accuracy of exported model objects
268
- (e.g., after applying quantization or other traqnsformations to
267
+ (e.g., after applying quantization or other transformations to
269
268
improve speed or reduce model size).
270
269
271
270
Loading exported models back into a Python-based Pytorch allows you to
@@ -297,14 +296,14 @@ for ExecuTorch.)
297
296
298
297
We export the stories15M model with the following command for
299
298
execution with the ExecuTorch runtime (and enabling execution on a
300
- wide range of community and vendor supported backends):
299
+ wide range of community and vendor- supported backends):
301
300
302
301
```
303
302
python3 torchchat.py export --checkpoint-path ${MODEL_PATH} --output-pte-path ${MODEL_NAME}.pte
304
303
```
305
304
306
305
Alternatively, we may generate a native instruction stream binary
307
- using AOT Inductor for CPU oor GPUs (the latter using Triton for
306
+ using AOT Inductor for CPU or GPUs (the latter using Triton for
308
307
optimizations such as operator fusion):
309
308
310
309
```
@@ -319,10 +318,10 @@ the exported model artifact back into a model container with a
319
318
compatible API surface for the ` model.forward() ` function. This
320
319
enables users to test, evaluate and exercise the exported model
321
320
artifact with familiar interfaces, and in conjunction with
322
- pre-exiisting Python model unit tests and common environments such as
321
+ pre-existing Python model unit tests and common environments such as
323
322
Jupyter notebooks and/or Google colab.
324
323
325
- Here is how to load an exported model into the python environment on the example of using an exported model with ` generate.oy ` .
324
+ Here is how to load an exported model into the Python environment using an exported model with the ` generate ` command .
326
325
327
326
```
328
327
python3 torchchat.py generate --checkpoint-path ${MODEL_PATH} --pte-path ${MODEL_NAME}.pte --device cpu --prompt "Once upon a time"
@@ -452,7 +451,7 @@ strategies:
452
451
You can find instructions for quantizing models in
453
452
[ docs/quantization.md] ( file:///./quantization.md ) . Advantageously,
454
453
quantization is available in eager mode as well as during export,
455
- enabling you to do an early exploration of your quantization setttings
454
+ enabling you to do an early exploration of your quantization settings
456
455
in eager mode. However, final accuracy should always be confirmed on
457
456
the actual execution target, since all targets have different build
458
457
processes, compilers, and kernel implementations with potentially
@@ -464,9 +463,8 @@ significant impact on accuracy.
464
463
465
464
## Native (Stand-Alone) Execution of Exported Models
466
465
467
- Refer to the [ README] (README.md] for an introduction toNative
468
- execution on servers, desktops and laptops is described under
469
- [ runner-build.md] . Mobile and Edge executipon for Android and iOS are
466
+ Refer to the [ README] (README.md] for an introduction to native
467
+ execution on servers, desktops, and laptops. Mobile and Edge execution for Android and iOS are
470
468
described under [ torchchat/edge/docs/Android.md] and [ torchchat/edge/docs/iOS.md] , respectively.
471
469
472
470
@@ -475,7 +473,7 @@ described under [torchchat/edge/docs/Android.md] and [torchchat/edge/docs/iOS.md
475
473
476
474
PyTorch and ExecuTorch support a broad range of devices for running
477
475
PyTorch with python (using either eager or eager + ` torch.compile ` ) or
478
- in a python -free environment with AOT Inductor and ExecuTorch.
476
+ in a Python -free environment with AOT Inductor and ExecuTorch.
479
477
480
478
481
479
| Hardware | OS | Eager | Eager + Compile | AOT Compile | ET Runtime |
@@ -499,58 +497,6 @@ in a python-free environment with AOT Inductor and ExecuTorch.
499
497
* Key* : n/t -- not tested
500
498
501
499
502
- ## Runtime performance with Llama 7B, in tokens per second (4b quantization)
503
-
504
- | Hardware | OS | eager | eager + compile | AOT compile | ET Runtime |
505
- | -----| ------| -----| -----| -----| -----|
506
- | x86 | Linux | ? | ? | ? | ? |
507
- | x86 | macOS | ? | ? | ? | ? |
508
- | aarch64 | Linux | ? | ? | ? | ? |
509
- | aarch64 | macOS | ? | ? | ? | ? |
510
- | AMD GPU | Linux | ? | ? | ? | ? |
511
- | Nvidia GPU | Linux | ? | ? | ? | ? |
512
- | MPS | macOS | ? | ? | ? | ? |
513
- | MPS | iOS | ? | ? | ? | ? |
514
- | aarch64 | Android | ? | ? | ? | ? |
515
- | Mobile GPU (Vulkan) | Android | ? | ? | ? | ? |
516
- | CoreML | iOS | | ? | ? | ? | ? |
517
- | Hexagon DSP | Android | | ? | ? | ? | ? |
518
- | Raspberry Pi 4/5 | Raspbian | ? | ? | ? | ? |
519
- | Raspberry Pi 4/5 | Android | ? | ? | ? | ? |
520
- | ARM 32b (up to v7) | any | | ? | ? | ? | ? |
521
-
522
-
523
- ## Runtime performance with Llama3, in tokens per second (4b quantization)
524
-
525
- | Hardware | OS | eager | eager + compile | AOT compile | ET Runtime |
526
- | -----| ------| -----| -----| -----| -----|
527
- | x86 | Linux | ? | ? | ? | ? |
528
- | x86 | macOS | ? | ? | ? | ? |
529
- | aarch64 | Linux | ? | ? | ? | ? |
530
- | aarch64 | macOS | ? | ? | ? | ? |
531
- | AMD GPU | Linux | ? | ? | ? | ? |
532
- | Nvidia GPU | Linux | ? | ? | ? | ? |
533
- | MPS | macOS | ? | ? | ? | ? |
534
- | MPS | iOS | ? | ? | ? | ? |
535
- | aarch64 | Android | ? | ? | ? | ? |
536
- | Mobile GPU (Vulkan) | Android | ? | ? | ? | ? |
537
- | CoreML | iOS | | ? | ? | ? | ? |
538
- | Hexagon DSP | Android | | ? | ? | ? | ? |
539
- | Raspberry Pi 4/5 | Raspbian | ? | ? | ? | ? |
540
- | Raspberry Pi 4/5 | Android | ? | ? | ? | ? |
541
- | ARM 32b (up to v7) | any | | ? | ? | ? | ? |
542
-
543
-
544
-
545
-
546
- # CONTRIBUTING to torchchat
547
-
548
- We welcome any feature requests, bug reports, or pull requests from
549
- the community. See the [ CONTRIBUTING] ( CONTRIBUTING.md ) for
550
- instructions how to contribute to torchchat.
551
-
552
-
553
-
554
500
# LICENSE
555
501
556
502
Torchchat is released under the [ BSD 3 license] ( ./LICENSE ) . However
0 commit comments