-
Notifications
You must be signed in to change notification settings - Fork 608
Update iOS Llama demo app readme docs #5359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,65 +1,71 @@ | ||
# Building ExecuTorch Llama and Llava iOS Demo App | ||
# ExecuTorch Llama iOS Demo App | ||
|
||
This app demonstrates the use of the LLM chat app demonstrating local inference use case with ExecuTorch, using [Llama 3.1](https://github.com/meta-llama/llama-models) for text only chat and [Llava](https://github.com/haotian-liu/LLaVA) for image and text chat. | ||
We’re excited to share that the newly revamped iOS demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an iOS demo app and how to exercise the many features ExecuTorch and Llama models have to offer. | ||
|
||
## Prerequisites | ||
* [Xcode 15](https://developer.apple.com/xcode) | ||
* [iOS 17 SDK](https://developer.apple.com/ios) | ||
* Set up your ExecuTorch repo and environment if you haven’t done so by following the [Setting up ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup) to set up the repo and dev environment: | ||
This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case. | ||
|
||
```bash | ||
git clone https://github.com/pytorch/executorch.git --recursive && cd executorch | ||
``` | ||
Please dive in and start exploring our demo app today! We look forward to any feedback and are excited to see your innovative ideas. | ||
|
||
Then create a virtual or conda environment using either | ||
```bash | ||
python3 -m venv .venv && source .venv/bin/activate | ||
``` | ||
or | ||
```bash | ||
conda create -n executorch python=3.10 | ||
conda activate executorch | ||
``` | ||
## Key Concepts | ||
From this demo app, you will learn many key concepts such as: | ||
* How to prepare Llama models, build the ExecuTorch library, and perform model inference across delegates | ||
* Expose the ExecuTorch library via Swift Package Manager | ||
* Familiarity with current ExecuTorch app-facing capabilities | ||
|
||
After that, run: | ||
```bash | ||
./install_requirements.sh --pybind coreml mps xnnpack | ||
./backends/apple/coreml/scripts/install_requirements.sh | ||
./backends/apple/mps/install_requirements.sh | ||
``` | ||
The goal is for you to see the type of support ExecuTorch provides and feel comfortable with leveraging it for your use cases. | ||
|
||
## Supported Models | ||
|
||
## Exporting models | ||
Please refer to the [ExecuTorch Llama2 docs](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md) to export the Llama 3.1 model. | ||
As a whole, the models that this app supports are (varies by delegate): | ||
* Llama 3.1 8B | ||
* Llama 3 8B | ||
* Llama 2 7B | ||
* Llava 1.5 (only XNNPACK) | ||
|
||
## Run the App | ||
## Building the application | ||
First it’s important to note that currently ExecuTorch provides support across several delegates. Once you identify the delegate of your choice, select the README link to get a complete end-to-end instructions for environment set-up to export the models to build ExecuTorch libraries and apps to run on device: | ||
|
||
1. Open the [project](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/LLaMA.xcodeproj) in Xcode. | ||
2. Run the app (cmd+R). | ||
3. In app UI pick a model and tokenizer to use, type a prompt and tap the arrow buton. | ||
| Delegate | Resource | | ||
| ------------------------------ | --------------------------------- | | ||
| XNNPACK (CPU-based library) | [link](docs/delegates/xnnpack_README.md)| | ||
| MPS (Metal Performance Shader) | [link](docs/delegates/mps_README.md) | | ||
|
||
## How to Use the App | ||
This section will provide the main steps to use the app, along with a code snippet of the ExecuTorch API. | ||
|
||
```{note} | ||
ExecuTorch runtime is distributed as a Swift package providing some .xcframework as prebuilt binary targets. | ||
Xcode will dowload and cache the package on the first run, which will take some time. | ||
Xcode will download and cache the package on the first run, which will take some time. | ||
``` | ||
|
||
* Open XCode and select "Open an existing project" to open `examples/demo-apps/apple_ios/LLama`. | ||
* Ensure that the ExecuTorch package dependencies are installed correctly. | ||
* Run the app. This builds and launches the app on the phone. | ||
* In app UI pick a model and tokenizer to use, type a prompt and tap the arrow buton | ||
|
||
|
||
## Copy the model to Simulator | ||
|
||
1. Drag and drop the Llama 3.1 and Llava models and tokenizer files onto the Simulator window and save them somewhere inside the iLLaMA folder. | ||
2. Pick the files in the app dialog, type a prompt and click the arrow-up button. | ||
* Drag&drop the model and tokenizer files onto the Simulator window and save them somewhere inside the iLLaMA folder. | ||
* Pick the files in the app dialog, type a prompt and click the arrow-up button. | ||
|
||
## Copy the model to Device | ||
|
||
1. Wire-connect the device and open the contents in Finder. | ||
2. Navigate to the Files tab and drag and drop the models and tokenizer files onto the iLLaMA folder. | ||
3. Wait until the files are copied. | ||
* Wire-connect the device and open the contents in Finder. | ||
* Navigate to the Files tab and drag&drop the model and tokenizer files onto the iLLaMA folder. | ||
* Wait until the files are copied. | ||
|
||
If the app successfully run on your device, you should see something like below: | ||
|
||
<p align="center"> | ||
<img src="./docs/screenshots/ios_demo_app.jpg" alt="iOS LLaMA App" width="300"> | ||
</p> | ||
|
||
Click the image below to see a demo video of the app running Llama 3.1 and Llava on an iPhone 15 Pro device: | ||
For Llava 1.5 models, you can select and image (via image/camera selector button) before typing prompt and send button. | ||
|
||
<a href="https://drive.google.com/file/d/1yQ7UoB79vMEBuBaoYvO53dosYTjpOZhd/view?usp=sharing"> | ||
<img src="llama31.png" width="350" alt="iOS app running Llama 3.1"> | ||
</a> <a href="https://drive.google.com/file/d/1yQ7UoB79vMEBuBaoYvO53dosYTjpOZhd/view?usp=sharing"> | ||
<img src="llava.png" width="350" alt="iOS app running Llava"> | ||
</a> | ||
<p align="center"> | ||
<img src="./docs/screenshots/ios_demo_app_llava.jpg" alt="iOS LLaMA App" width="300"> | ||
</p> | ||
|
||
## Reporting Issues | ||
If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new). |
98 changes: 98 additions & 0 deletions
98
examples/demo-apps/apple_ios/LLaMA/docs/delegates/mps_README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# Building Llama iOS Demo for MPS Backend | ||
|
||
This tutorial covers the end to end workflow for building an iOS demo app using MPS backend on device. | ||
More specifically, it covers: | ||
1. Export and quantization of Llama models against the MPS backend. | ||
2. Building and linking libraries that are required to inference on-device for iOS platform using MPS. | ||
3. Building the iOS demo app itself. | ||
|
||
## Prerequisites | ||
* [Xcode 15](https://developer.apple.com/xcode) | ||
* [iOS 18 SDK](https://developer.apple.com/ios) | ||
* Set up your ExecuTorch repo and environment if you haven’t done so by following the [Setting up ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup) to set up the repo and dev environment: | ||
|
||
## Setup ExecuTorch | ||
In this section, we will need to set up the ExecuTorch repo first with Conda environment management. Make sure you have Conda available in your system (or follow the instructions to install it [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)). The commands below are running on Linux (CentOS). | ||
|
||
Create a Conda environment | ||
|
||
``` | ||
conda create -n et_mps python=3.10.0 | ||
conda activate et_mps | ||
``` | ||
|
||
Checkout ExecuTorch repo and sync submodules | ||
|
||
``` | ||
git clone https://github.com/pytorch/executorch.git | ||
cd executorch | ||
git submodule sync | ||
git submodule update --init | ||
``` | ||
|
||
Install dependencies | ||
|
||
``` | ||
./install_requirements.sh | ||
``` | ||
|
||
## Prepare Models | ||
In this demo app, we support text-only inference with Llama 3.1, Llama 3, and Llama 2 models. | ||
|
||
Install the required packages | ||
|
||
``` | ||
executorch/examples/models/llama2/install_requirements.sh | ||
``` | ||
|
||
Export the model | ||
``` | ||
python -m examples.models.llama2.export_llama --checkpoint "${MODEL_DIR}/consolidated.00.pth" --params "${MODEL_DIR}/params.json" -kv --use_sdpa_with_kv_cache --mps -d fp32 --disable_dynamic_shape -qmode 8da4w -G 32 | ||
``` | ||
|
||
## Pushing Model and Tokenizer | ||
|
||
### Copy the model to Simulator | ||
* Drag&drop the model and tokenizer files onto the Simulator window and save them somewhere inside the iLLaMA folder. | ||
* Pick the files in the app dialog, type a prompt and click the arrow-up button. | ||
|
||
### Copy the model to Device | ||
* Wire-connect the device and open the contents in Finder. | ||
* Navigate to the Files tab and drag & drop the model and tokenizer files onto the iLLaMA folder. | ||
* Wait until the files are copied. | ||
|
||
## Configure the XCode Project | ||
|
||
### Install CMake | ||
Download and open the macOS .dmg installer at https://cmake.org/download and move the Cmake app to /Applications folder. | ||
Install Cmake command line tools: | ||
|
||
``` | ||
sudo /Applications/CMake.app/Contents/bin/cmake-gui --install | ||
``` | ||
|
||
|
||
### Swift Package Manager | ||
The prebuilt ExecuTorch runtime, backend, and kernels are available as a Swift PM package. | ||
|
||
### Xcode | ||
Open the project in Xcode.In Xcode, go to `File > Add Package Dependencies`. Paste the URL of the ExecuTorch repo into the search bar and select it. Make sure to change the branch name to the desired ExecuTorch version, e.g., “0.3.0”, or just use the “latest” branch name for the latest stable build. | ||
|
||
<p align="center"> | ||
<img src="../screenshots/ios_demo_app_swift_pm.png" alt="iOS LLaMA App Swift PM" width="600"> | ||
</p> | ||
|
||
Then select which ExecuTorch framework should link against which target. | ||
|
||
<p align="center"> | ||
<img src="../screenshots/ios_demo_app_choosing_package.png" alt="iOS LLaMA App Choosing package" width="600"> | ||
</p> | ||
|
||
Click “Run” to build the app and run in on your iPhone. If the app successfully run on your device, you should see something like below: | ||
|
||
<p align="center"> | ||
<img src="../screenshots/ios_demo_app_mps.jpg" alt="iOS LLaMA App mps" width="300"> | ||
</p> | ||
|
||
## Reporting Issues | ||
If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new). |
104 changes: 104 additions & 0 deletions
104
examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# Building Llama iOS Demo for XNNPack Backend | ||
|
||
This tutorial covers the end to end workflow for building an iOS demo app using XNNPack backend on device. | ||
More specifically, it covers: | ||
1. Export and quantization of Llama models against the XNNPack backend. | ||
2. Building and linking libraries that are required to inference on-device for iOS platform using XNNPack. | ||
3. Building the iOS demo app itself. | ||
|
||
## Prerequisites | ||
* [Xcode 15](https://developer.apple.com/xcode) | ||
* [iOS 17 SDK](https://developer.apple.com/ios) | ||
* Set up your ExecuTorch repo and environment if you haven’t done so by following the [Setting up ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup) to set up the repo and dev environment: | ||
|
||
## Setup ExecuTorch | ||
In this section, we will need to set up the ExecuTorch repo first with Conda environment management. Make sure you have Conda available in your system (or follow the instructions to install it [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)). The commands below are running on Linux (CentOS). | ||
|
||
Create a Conda environment | ||
|
||
``` | ||
conda create -n et_xnnpack python=3.10.0 | ||
conda activate et_xnnpack | ||
``` | ||
|
||
Checkout ExecuTorch repo and sync submodules | ||
|
||
``` | ||
git clone https://github.com/pytorch/executorch.git | ||
cd executorch | ||
git submodule sync | ||
git submodule update --init | ||
``` | ||
|
||
Install dependencies | ||
|
||
``` | ||
./install_requirements.sh | ||
``` | ||
|
||
## Prepare Models | ||
In this demo app, we support text-only inference with up-to-date Llama models. | ||
|
||
Install the required packages | ||
|
||
``` | ||
executorch/examples/models/llama2/install_requirements.sh | ||
``` | ||
|
||
Export the model | ||
``` | ||
python -m examples.models.llama2.export_llama --checkpoint <consolidated.00.pth> -p <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte" | ||
``` | ||
|
||
## Pushing Model and Tokenizer | ||
|
||
### Copy the model to Simulator | ||
* Drag&drop the model and tokenizer files onto the Simulator window and save them somewhere inside the iLLaMA folder. | ||
* Pick the files in the app dialog, type a prompt and click the arrow-up button. | ||
|
||
### Copy the model to Device | ||
* Wire-connect the device and open the contents in Finder. | ||
* Navigate to the Files tab and drag & drop the model and tokenizer files onto the iLLaMA folder. | ||
* Wait until the files are copied. | ||
|
||
## Configure the XCode Project | ||
|
||
### Install CMake | ||
Download and open the macOS .dmg installer at https://cmake.org/download and move the Cmake app to /Applications folder. | ||
Install Cmake command line tools: | ||
|
||
``` | ||
sudo /Applications/CMake.app/Contents/bin/cmake-gui --install | ||
``` | ||
|
||
|
||
### Swift Package Manager | ||
The prebuilt ExecuTorch runtime, backend, and kernels are available as a Swift PM package. | ||
|
||
### Xcode | ||
Open the project in Xcode.In Xcode, go to `File > Add Package Dependencies`. Paste the URL of the ExecuTorch repo into the search bar and select it. Make sure to change the branch name to the desired ExecuTorch version, e.g., “0.3.0”, or just use the “latest” branch name for the latest stable build. | ||
|
||
<p align="center"> | ||
<img src="../screenshots/ios_demo_app_swift_pm.png" alt="iOS LLaMA App Swift PM" width="600"> | ||
</p> | ||
|
||
Then select which ExecuTorch framework should link against which target. | ||
|
||
<p align="center"> | ||
<img src="../screenshots/ios_demo_app_choosing_package.png" alt="iOS LLaMA App Choosing package" width="600"> | ||
</p> | ||
|
||
Click “Run” to build the app and run in on your iPhone. If the app successfully run on your device, you should see something like below: | ||
|
||
<p align="center"> | ||
<img src="../screenshots/ios_demo_app.jpg" alt="iOS LLaMA App" width="300"> | ||
</p> | ||
|
||
For Llava 1.5 models, you can select and image (via image/camera selector button) before typing prompt and send button. | ||
|
||
<p align="center"> | ||
<img src="../screenshots/ios_demo_app_llava.jpg" alt="iOS LLaMA App" width="300"> | ||
</p> | ||
|
||
## Reporting Issues | ||
If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new). |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+159 KB
...es/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_choosing_package.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+281 KB
examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_llava.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+252 KB
examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_mps.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+89.2 KB
examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_swift_pm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have instructions on how to export Llava?