Skip to content

Doc fixes #453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Chat with LLMs Everywhere
Torchchat is a small codebase to showcase running large language models (LLMs) within Python OR within your own (C/C++) application on mobile (iOS/Android), desktop and servers.
Torchchat is a compact codebase to showcase the capability of running large language models (LLMs) seamlessly across diverse platforms. With Torchchat, you could run LLMs from with Python, your own (C/C++) application on mobile (iOS/Android), desktop or servers.

## Highlights
- Command line interaction with popular LLMs such as Llama 3, Llama 2, Stories, Mistral and more
- Supporting [some GGUF files](docs/GGUF.md) and the Hugging Face checkpoint format
- Supports [common GGUF formats](docs/GGUF.md) and the Hugging Face checkpoint format
- PyTorch-native execution with performance
- Supports popular hardware and OS
- Linux (x86)
Expand Down Expand Up @@ -59,16 +59,16 @@ with `python3 torchchat.py remove llama3`.
* [Chat](#chat)
* [Generate](#generate)
* [Run via Browser](#browser)
* [Quantizing your model (suggested for mobile)](#quantization)
* [Quantize your models (suggested for mobile)](#quantization)
* Export and run models in native environments (C++, your own app, mobile, etc.)
* [Exporting for desktop/servers via AOTInductor](#export-server)
* [Running exported .so file via your own C++ application](#run-server)
* [Export for desktop/servers via AOTInductor](#export-server)
* [Run exported .so file via your own C++ application](#run-server)
* in Chat mode
* in Generate mode
* [Exporting for mobile via ExecuTorch](#export-executorch)
* [Export for mobile via ExecuTorch](#export-executorch)
* in Chat mode
* in Generate mode
* [Running exported executorch file on iOS or Android](#run-mobile)
* [Run exported ExecuTorch file on iOS or Android](#run-mobile)

## Models
These are the supported models
Expand Down Expand Up @@ -242,7 +242,7 @@ python3 torchchat.py export stories15M --output-pte-path stories15M.pte
python3 torchchat.py generate --device cpu --pte-path stories15M.pte --prompt "Hello my name is"
```

See below under Mobile Execution if you want to deploy and execute a model in your iOS or Android app.
See below under [Mobile Execution](#run-mobile) if you want to deploy and execute a model in your iOS or Android app.


## Quantization
Expand Down
2 changes: 1 addition & 1 deletion docs/Android.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Check out the [tutorial on how to build an Android app running your
PyTorch models with
ExecuTorch](https://pytorch.org/executorch/main/llm/llama-demo-android.html),
and give your torchat models a spin.
and give your torchchat models a spin.

![Screenshot](https://pytorch.org/executorch/main/_static/img/android_llama_app.png "Android app running Llama model")

Expand Down
2 changes: 1 addition & 1 deletion docs/executorch_setup.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Set-up executorch
# Set-up ExecuTorch

Before running any commands in torchchat that require ExecuTorch, you must first install ExecuTorch.

Expand Down
2 changes: 1 addition & 1 deletion docs/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,4 +284,4 @@ We invite contributors to submit established quantization schemes, with accuracy
- Describe how to choose a quantization scheme. Which factors should they take into account? Concrete recommendations for use cases, esp. mobile.
- Quantization reference, describe options for --quantize parameter
- Show a table with performance/accuracy metrics
- Quantization support matrix? torchat Quantization Support Matrix
- Quantization support matrix? torchchat Quantization Support Matrix