Skip to content

Commit 3ff18bd

Browse files
ali-khoshAli Khoshmikekgfb
authored andcommitted
Doc fixes (#371)
* testing a small fix on the readme * various fixes of README.md - ignoring sections that clearly look WIP --------- Co-authored-by: Ali Khosh <[email protected]> Co-authored-by: Michael Gschwind <[email protected]>
1 parent 365cf56 commit 3ff18bd

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,10 @@ Torchchat is a small codebase to showcase running large language models (LLMs) w
1414
- Multiple quantization schemes
1515
- Multiple execution modes including: Python (Eager, Compile) or Native (AOT Inductor (AOTI), ExecuTorch)
1616

17+
1718
## Installation
1819

20+
1921
The following steps require that you have [Python 3.10](https://www.python.org/downloads/release/python-3100/) installed.
2022

2123
```
@@ -136,7 +138,7 @@ python3 torchchat.py export stories15M --output-pte-path stories15M.pte
136138
```
137139

138140
### Browser
139-
Run a chatbot in your browser that’s supported by the model you specify in the command
141+
Run a chatbot in your browser that’s supported by the model you specify in the command.
140142

141143
**Examples**
142144

@@ -146,7 +148,7 @@ python3 torchchat.py browser stories15M --temperature 0 --num-samples 10
146148

147149
*Running on http://127.0.0.1:5000* should be printed out on the terminal. Click the link or go to [http://127.0.0.1:5000](http://127.0.0.1:5000) on your browser to start interacting with it.
148150

149-
Enter some text in the input box, then hit the enter key or click the “SEND” button. After 1 second or 2, the text you entered together with the generated text will be displayed. Repeat to have a conversation.
151+
Enter some text in the input box, then hit the enter key or click the “SEND” button. After a second or two, the text you entered together with the generated text will be displayed. Repeat to have a conversation.
150152

151153
### Eval
152154
Uses lm_eval library to evaluate model accuracy on a variety of tasks. Defaults to wikitext and can be manually controlled using the tasks and limit args.
@@ -160,14 +162,14 @@ Eager mode:
160162
python3 torchchat.py eval stories15M -d fp32 --limit 5
161163
```
162164

163-
To test the perplexity for lowered or quantized model, pass it in the same way you would to generate:
165+
To test the perplexity for a lowered or quantized model, pass it in the same way you would to generate:
164166

165167
```
166168
python3 torchchat.py eval stories15M --pte-path stories15M.pte --limit 5
167169
```
168170

169171
## Models
170-
These are the supported models
172+
The following models are the supported by torchchat:
171173
| Model | Mobile Friendly | Notes |
172174
|------------------|---|---------------------|
173175
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|||
@@ -223,6 +225,7 @@ python3 torchchat.py generate --dso-path stories15M.so --prompt "Hello my name i
223225
NOTE: The exported model will be large. We suggest you quantize the model, explained further down, before deploying the model on device.
224226

225227
### ExecuTorch
228+
226229
ExecuTorch enables you to optimize your model for execution on a mobile or embedded device, but can also be used on desktop for testing.
227230
Before running ExecuTorch commands, you must first set-up ExecuTorch in torchchat, see [Set-up Executorch](docs/executorch_setup.md).
228231

@@ -238,6 +241,7 @@ python3 torchchat.py generate --device cpu --pte-path stories15M.pte --prompt "H
238241

239242
See below under Mobile Execution if you want to deploy and execute a model in your iOS or Android app.
240243

244+
241245
## Quantization
242246
Quantization focuses on reducing the precision of model parameters and computations from floating-point to lower-bit integers, such as 8-bit and 4-bit integers. This approach aims to minimize memory requirements, accelerate inference speeds, and decrease power consumption, making models more feasible for deployment on edge devices with limited computational resources. While quantization can potentially degrade the model's performance, the methods supported by torchchat are designed to mitigate this effect, maintaining a balance between efficiency and accuracy.
243247

0 commit comments

Comments
 (0)