Skip to content

Commit 6da36ee

Browse files
orionrmalfet
authored andcommitted
Add model details to the README and remove extra model section (#410)
1 parent 4eeba97 commit 6da36ee

File tree

2 files changed

+37
-18
lines changed

2 files changed

+37
-18
lines changed

README.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -101,9 +101,9 @@ Designed for interactive graphical conversations using the familiar web browser
101101

102102
## Quantizing your model (suggested for mobile)
103103

104-
Quantization is the process of converting a model into a more memory-efficient representation. Quantization is particularly important for accelerators -- to take advantage of the available memory bandwidth, and fit in the often limited high-speed memory in accelerators – and mobile devices – to fit in the typically very limited memory of mobile devices.
104+
Quantization is the process of converting a model into a more memory-efficient representation. Quantization is particularly important for accelerators -- to take advantage of the available memory bandwidth, and fit in the often limited high-speed memory in accelerators – and mobile devices – to fit in the typically very limited memory of mobile devices.
105105

106-
With quantization, 32-bit floating numbers can be represented with as few as 8 or even 4 bits, and a scale shared by a group of these weights. This transformation is lossy and modifies the behavior of models. While research is being conducted on how to efficiently quantize large language models for use in mobile devices, this transformation invariable results in both quality loss and a reduced amount of control over the output of the models, leading to an increased risk of undesirable responses, hallucinations and stuttering.
106+
With quantization, 32-bit floating numbers can be represented with as few as 8 or even 4 bits, and a scale shared by a group of these weights. This transformation is lossy and modifies the behavior of models. While research is being conducted on how to efficiently quantize large language models for use in mobile devices, this transformation invariable results in both quality loss and a reduced amount of control over the output of the models, leading to an increased risk of undesirable responses, hallucinations and stuttering.
107107

108108
In effect an a developer quantizing a model, has much control and even more responsibility to quantize a model to quantify and reduce these effects.
109109

@@ -164,24 +164,26 @@ python3 torchchat.py eval stories15M --pte-path stories15M.pte --limit 5
164164

165165

166166
## Models
167-
The following models are supported by torchchat:
167+
168+
The following models are supported by torchchat and have associated aliases. Other models, including GGUF format, can be run by specifying a URL directly.
169+
168170
| Model | Mobile Friendly | Notes |
169171
|------------------|---|---------------------|
170-
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|||
171-
|[meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)|||
172-
|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|||
173-
|[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|||
174-
|[meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)|||
175-
|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|||
176-
|[meta-llama/CodeLlama-7b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-7b-Python-hf)|||
177-
|[meta-llama/CodeLlama-34b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Python-hf)|||
178-
|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)|||
179-
|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)|||
180-
|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|||
181-
|[tinyllamas/stories15M](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
182-
|[tinyllamas/stories42M](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
183-
|[tinyllamas/stories110M](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
184-
|[openlm-research/open_llama_7b](https://huggingface.co/karpathy/tinyllamas/tree/main)|||
172+
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)||Tuned for `chat` . Alias to `llama3`.|
173+
|[meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)||Best for `generate`. Alias to `llama3-base`.|
174+
|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)||Tuned for `chat`. Alias to `llama2`.|
175+
|[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)||Tuned for `chat`. Alias to `llama2-13b-chat`.|
176+
|[meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)||Tuned for `chat`. Alias to `llama2-70b-chat`.|
177+
|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)||Best for `generate`. Alias to `llama2-base`.|
178+
|[meta-llama/CodeLlama-7b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-7b-Python-hf)||Tuned for Python and `generate`. Alias to `codellama`.|
179+
|[meta-llama/CodeLlama-34b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Python-hf)||Tuned for Python and `generate`. Alias to `codellama-34b`.|
180+
|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)||Best for `generate`. Alias to `mistral-7b-v01-base`.|
181+
|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)||Tuned for `chat`. Alias to `mistral-7b-v01-instruct`.|
182+
|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)||Tuned for `chat`. Alias to `mistral`.|
183+
|[tinyllamas/stories15M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories15M`.|
184+
|[tinyllamas/stories42M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories42M`.|
185+
|[tinyllamas/stories110M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories110M`.|
186+
|[openlm-research/open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b)||Best for `generate`. Alias to `open-llama`.|
185187

186188
Torchchat also supports loading of many models in the GGUF format. See the [documentation on GGUF](docs/GGUF.md) to learn how to use GGUF files.
187189

config/data/models.json

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,23 @@
3939
"distribution_channel": "HuggingFaceSnapshot",
4040
"distribution_path": "meta-llama/CodeLlama-7b-Python-hf"
4141
},
42+
"meta-llama/CodeLlama-34b-Python-hf": {
43+
"aliases": ["codellama-34b"],
44+
"distribution_channel": "HuggingFaceSnapshot",
45+
"distribution_path": "meta-llama/CodeLlama-34b-Python-hf"
46+
},
47+
"mistralai/Mistral-7B-v0.1": {
48+
"aliases": ["mistral-7b-v01-base"],
49+
"distribution_channel": "HuggingFaceSnapshot",
50+
"distribution_path": "mistralai/Mistral-7B-Instruct-v0.1",
51+
"transformer_params_key": "Mistral-7B"
52+
},
53+
"mistralai/Mistral-7B-Instruct-v0.1": {
54+
"aliases": ["mistral-7b-v01-instruct"],
55+
"distribution_channel": "HuggingFaceSnapshot",
56+
"distribution_path": "mistralai/Mistral-7B-Instruct-v0.1",
57+
"transformer_params_key": "Mistral-7B"
58+
},
4259
"mistralai/Mistral-7B-Instruct-v0.2": {
4360
"aliases": ["mistral", "mistral-7b", "mistral-7b-instruct"],
4461
"distribution_channel": "HuggingFaceSnapshot",

0 commit comments

Comments
 (0)