Skip to content

Commit efea63a

Browse files
mikekgfbmalfet
authored andcommitted
Update README.md (#502)
* Update README.md Add Disclaimer to README * Update README.md
1 parent 3bf0abb commit efea63a

File tree

1 file changed

+7
-6
lines changed

1 file changed

+7
-6
lines changed

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Chat with LLMs Everywhere
2-
Torchchat is a compact codebase to showcase the capability of running large language models (LLMs) seamlessly across diverse platforms. With Torchchat, you could run LLMs from with Python, your own (C/C++) application on mobile (iOS/Android), desktop or servers.
2+
torchchat is a compact codebase to showcase the capability of running large language models (LLMs) seamlessly across diverse platforms. With torchchat, you could run LLMs from with Python, your own (C/C++) application on mobile (iOS/Android), desktop or servers.
33

44
## Highlights
55
- Command line interaction with popular LLMs such as Llama 3, Llama 2, Stories, Mistral and more
@@ -14,6 +14,8 @@ Torchchat is a compact codebase to showcase the capability of running large lang
1414
- Multiple quantization schemes
1515
- Multiple execution modes including: Python (Eager, Compile) or Native (AOT Inductor (AOTI), ExecuTorch)
1616

17+
*Disclaimer:* The torchchat Repository Content is provided without any guarantees about performance or compatibility. In particular, torchchat makes available model architectures written in Python for PyTorch that may not perform in the same manner or meet the same standards as the original versions of those models. When using the torchchat Repository Content, including any model architectures, you are solely responsible for determining the appropriateness of using or redistributing the torchchat Repository Content and assume any risks associated with your use of the torchchat Repository Content or any models, outputs, or results, both alone and in combination with any other technologies. Additionally, you may have other legal obligations that govern your use of other content, such as the terms of service for third-party models, weights, data, or other technologies, and you are solely responsible for complying with all such obligations.
18+
1719

1820
## Installation
1921

@@ -132,17 +134,16 @@ Enter some text in the input box, then hit the enter key or click the “SEND”
132134

133135
Quantization is the process of converting a model into a more memory-efficient representation. Quantization is particularly important for accelerators -- to take advantage of the available memory bandwidth, and fit in the often limited high-speed memory in accelerators – and mobile devices – to fit in the typically very limited memory of mobile devices.
134136

135-
Depending on the model and the target device, different quantization recipes may be applied. Torchchat contains two example configurations to optimize performance for GPU-based systems `config/data/qconfig_gpu.json`, and mobile systems `config/data/qconfig_mobile.json`. The GPU configuration is targeted towards optimizing for memory bandwidth which is a scarce resource in powerful GPUs (and to a less degree, memory footprint to fit large models into a device's memory). The mobile configuration is targeted towards optimizing for memory fotoprint because in many devices, a single application is limited to as little as GB or less of memory.
137+
Depending on the model and the target device, different quantization recipes may be applied. torchchat contains two example configurations to optimize performance for GPU-based systems `config/data/qconfig_gpu.json`, and mobile systems `config/data/qconfig_mobile.json`. The GPU configuration is targeted towards optimizing for memory bandwidth which is a scarce resource in powerful GPUs (and to a less degree, memory footprint to fit large models into a device's memory). The mobile configuration is targeted towards optimizing for memory fotoprint because in many devices, a single application is limited to as little as GB or less of memory.
136138

137139
You can use the quantization recipes in conjunction with any of the `chat`, `generate` and `browser` commands to test their impact and accelerate model execution. You will apply these recipes to the `export` comamnds below, to optimize the exported models. For example:
138140
```
139141
python3 torchchat.py chat llama3 --quantize config/data/qconfig_gpu.json
140142
```
141143
To adapt these recipes or wrote your own, please refer to the [quantization overview](docs/quantization.md).
142144

143-
*TO BE REPLACED BY SUITABLE ORDING PROVIDED BY LEGAL:*
144145

145-
With quantization, 32-bit floating numbers can be represented with as few as 8 or even 4 bits, and a scale shared by a group of these weights. This transformation is lossy and modifies the behavior of models. While research is being conducted on how to efficiently quantize large language models for use in mobile devices, this transformation invariable results in both quality loss and a reduced amount of control over the output of the models, leading to an increased risk of undesirable responses, hallucinations and stuttering. In effect an a developer quantizing a model, has much control and even more responsibility to quantize a model to quantify and reduce these effects.
146+
With quantization, 32-bit floating numbers can be represented with as few as 8 or even 4 bits, and a scale shared by a group of these weights. This transformation is lossy and modifies the behavior of models. While research is being conducted on how to efficiently quantize large language models for use in mobile devices, this transformation invariably results in both quality loss and a reduced amount of control over the output of the models, leading to an increased risk of undesirable responses, hallucination and stuttering. In effect, a developer quantizing a model has a responsibility to understand and reduce these effects.
146147

147148
## Desktop Execution
148149

@@ -293,7 +294,7 @@ The following models are supported by torchchat and have associated aliases. Oth
293294
|[tinyllamas/stories110M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories110M`.|
294295
|[openlm-research/open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b)||Best for `generate`. Alias to `open-llama`.|
295296

296-
Torchchat also supports loading of many models in the GGUF format. See the [documentation on GGUF](docs/GGUF.md) to learn how to use GGUF files.
297+
torchchat also supports loading of many models in the GGUF format. See the [documentation on GGUF](docs/GGUF.md) to learn how to use GGUF files.
297298

298299
While we describe how to use torchchat using the popular llama3 model, you can perform the example commands with any of these models.
299300

@@ -335,7 +336,7 @@ you've built around local LLM inference.
335336

336337

337338
## License
338-
Torchchat is released under the [BSD 3 license](LICENSE). (Additional code in this
339+
torchchat is released under the [BSD 3 license](LICENSE). (Additional code in this
339340
distribution is covered by the MIT and Apache Open Source licenses.) However you may have other legal obligations
340341
that govern your use of content, such as the terms of service for third-party models.
341342
![image](https://github.com/pytorch/torchchat/assets/61328285/1cfccb53-c025-43d7-8475-94b34cf92339)

0 commit comments

Comments
 (0)