Skip to content

Commit 4e8d31e

Browse files
mikekgfbmalfet
authored andcommitted
Update README.md (#454)
Edits
1 parent 1982133 commit 4e8d31e

File tree

1 file changed

+29
-36
lines changed

1 file changed

+29
-36
lines changed

README.md

Lines changed: 29 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Torchchat is a compact codebase to showcase the capability of running large lang
2020

2121
The following steps require that you have [Python 3.10](https://www.python.org/downloads/release/python-3100/) installed.
2222

23-
```
23+
```bash
2424
# get the code
2525
git clone https://github.com/pytorch/torchchat.git
2626
cd torchchat
@@ -37,8 +37,7 @@ python3 torchchat.py --help
3737
```
3838

3939
### Download Weights
40-
Most models use HuggingFace as the distribution channel, so you will need to create a HuggingFace
41-
account.
40+
Most models use HuggingFace as the distribution channel, so you will need to create a HuggingFace account.
4241

4342
Create a HuggingFace user access token [as documented here](https://huggingface.co/docs/hub/en/security-tokens).
4443
Run `huggingface-cli login`, which will prompt for the newly created token.
@@ -59,7 +58,7 @@ with `python3 torchchat.py remove llama3`.
5958
* [Chat](#chat)
6059
* [Generate](#generate)
6160
* [Run via Browser](#browser)
62-
* [Quantize your models (suggested for mobile)](#quantization)
61+
* [Quantizing your model (suggested for mobile)](#quantizing-your-model-suggested-for-mobile)
6362
* Export and run models in native environments (C++, your own app, mobile, etc.)
6463
* [Export for desktop/servers via AOTInductor](#export-server)
6564
* [Run exported .so file via your own C++ application](#run-server)
@@ -70,53 +69,45 @@ with `python3 torchchat.py remove llama3`.
7069
* in Generate mode
7170
* [Run exported ExecuTorch file on iOS or Android](#run-mobile)
7271

73-
## Models
74-
These are the supported models
75-
| Model | Mobile Friendly | Notes |
76-
|------------------|---|---------------------|
77-
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|||
78-
|[meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)|||
79-
|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|||
80-
|[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|||
81-
|[meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)|||
82-
|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|||
83-
|[meta-llama/CodeLlama-7b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-7b-Python-hf)|||
84-
|[meta-llama/CodeLlama-34b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Python-hf)|||
85-
|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)|||
86-
|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)|||
87-
|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|||
88-
|[tinyllamas/stories15M](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
89-
|[tinyllamas/stories42M](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
90-
|[tinyllamas/stories110M](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
91-
|[openlm-research/open_llama_7b](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
92-
93-
See the [documentation on GGUF](docs/GGUF.md) to learn how to use GGUF files.
94-
9572

9673
## Running via PyTorch / Python
9774

9875
### Chat
9976
Designed for interactive and conversational use.
10077
In chat mode, the LLM engages in a back-and-forth dialogue with the user. It responds to queries, participates in discussions, provides explanations, and can adapt to the flow of conversation.
10178

102-
For more information run `python3 torchchat.py chat --help`
103-
10479
**Examples**
105-
```
80+
```bash
10681
python3 torchchat.py chat llama3
10782
```
10883

84+
For more information run `python3 torchchat.py chat --help`
85+
10986
### Generate
11087
Aimed at producing content based on specific prompts or instructions.
11188
In generate mode, the LLM focuses on creating text based on a detailed prompt or instruction. This mode is often used for generating written content like articles, stories, reports, or even creative writing like poetry.
11289

113-
For more information run `python3 torchchat.py generate --help`
11490

11591
**Examples**
116-
```
117-
python3 torchchat.py generate llama3 --dtype=fp16
92+
```bash
93+
python3 torchchat.py generate llama3
11894
```
11995

96+
For more information run `python3 torchchat.py generate --help`
97+
98+
### Browser
99+
100+
Designed for interactive graphical conversations using the familiar web browser GUI. The browser command provides a GUI-based experience to engage with the LLM in a back-and-forth dialogue with the user. It responds to queries, participates in discussions, provides explanations, and can adapt to the flow of conversation.
101+
102+
## Quantizing your model (suggested for mobile)
103+
104+
Quantization is the process of converting a model into a more memory-efficient representation. Quantization is particularly important for accelerators -- to take advantage of the available memory bandwidth, and fit in the often limited high-speed memory in accelerators – and mobile devices – to fit in the typically very limited memory of mobile devices.
105+
106+
With quantization, 32-bit floating numbers can be represented with as few as 8 or even 4 bits, and a scale shared by a group of these weights. This transformation is lossy and modifies the behavior of models. While research is being conducted on how to efficiently quantize large language models for use in mobile devices, this transformation invariable results in both quality loss and a reduced amount of control over the output of the models, leading to an increased risk of undesirable responses, hallucinations and stuttering.
107+
108+
In effect an a developer quantizing a model, has much control and even more responsibility to quantize a model to quantify and reduce these effects.
109+
110+
120111
## Exporting your model
121112
Compiles a model and saves it to run later.
122113

@@ -146,10 +137,10 @@ Run a chatbot in your browser that’s supported by the model you specify in the
146137
**Examples**
147138

148139
```
149-
python3 torchchat.py browser stories15M --temperature 0 --num-samples 100
140+
python3 torchchat.py browser stories15M --temperature 0 --num-samples 10
150141
```
151142

152-
*Running on http://127.0.0.1:5000* should be printed out on the terminal. Click the link or go to [http://127.0.0.1:5000](http://127.0.0.1:5000) on your browser to start interacting with it. If port 5000 has already been taken, run the command again with `--port`, e.g. `--port 5001`.
143+
*Running on http://127.0.0.1:5000* should be printed out on the terminal. Click the link or go to [http://127.0.0.1:5000](http://127.0.0.1:5000) on your browser to start interacting with it.
153144

154145
Enter some text in the input box, then hit the enter key or click the “SEND” button. After a second or two, the text you entered together with the generated text will be displayed. Repeat to have a conversation.
155146

@@ -171,8 +162,9 @@ To test the perplexity for a lowered or quantized model, pass it in the same way
171162
python3 torchchat.py eval stories15M --pte-path stories15M.pte --limit 5
172163
```
173164

165+
174166
## Models
175-
The following models are the supported by torchchat:
167+
The following models are supported by torchchat:
176168
| Model | Mobile Friendly | Notes |
177169
|------------------|---|---------------------|
178170
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|||
@@ -191,7 +183,7 @@ The following models are the supported by torchchat:
191183
|[tinyllamas/stories110M](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
192184
|[openlm-research/open_llama_7b](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
193185

194-
See the [documentation on GGUF](docs/GGUF.md) to learn how to use GGUF files.
186+
Torchchat also supports loading of many models in the GGUF format. See the [documentation on GGUF](docs/GGUF.md) to learn how to use GGUF files.
195187

196188
**Examples**
197189

@@ -305,3 +297,4 @@ you've built around local LLM inference.
305297
## License
306298
Torchchat is released under the [BSD 3 license](LICENSE). However you may have other legal obligations
307299
that govern your use of content, such as the terms of service for third-party models.
300+
![image](https://github.com/pytorch/torchchat/assets/61328285/1cfccb53-c025-43d7-8475-94b34cf92339)

0 commit comments

Comments
 (0)