You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update Android and iOS demo app readme for Spinquant and QAT+LoRA model support (#6485)
Summary:
1. Added export command for spinquant and QAT+LoRA using prequantized model we are going to release at 10/24
2. Clean up on duplicated commands
3. Renaming the path in the export command to avoid confusion
4. Remove old information
Reviewed By: cmodi-meta
Differential Revision: D64784695
Co-authored-by: Chester Hu <[email protected]>
Copy file name to clipboardExpand all lines: examples/demo-apps/android/LlamaDemo/README.md
+4-6Lines changed: 4 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,7 @@
1
1
# ExecuTorch Llama Android Demo App
2
2
3
+
**[UPDATE - 10/24]** We have added support for running quantized Llama 3.2 1B/3B models in demo apps on the [XNNPACK backend](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md). We currently support inference with SpinQuant and QAT+LoRA quantization methods.
4
+
3
5
We’re excited to share that the newly revamped Android demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an Android demo app and how to exercise the many features ExecuTorch and Llama models have to offer.
4
6
5
7
This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case.
@@ -17,7 +19,8 @@ The goal is for you to see the type of support ExecuTorch provides and feel comf
17
19
18
20
## Supporting Models
19
21
As a whole, the models that this app supports are (varies by delegate):
20
-
* Llama 3.2 1B/3B
22
+
* Llama 3.2 Quantized 1B/3B
23
+
* Llama 3.2 1B/3B in BF16
21
24
* Llama Guard 3 1B
22
25
* Llama 3.1 8B
23
26
* Llama 3 8B
@@ -34,11 +37,6 @@ First it’s important to note that currently ExecuTorch provides support across
34
37
| QNN (Qualcomm AI Accelerators) |[link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md)|
35
38
| MediaTek (MediaTek AI Accelerators) |[link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/mediatek_README.md)|
36
39
37
-
**WARNING** NDK r27 will cause issues like:
38
-
```
39
-
java.lang.UnsatisfiedLinkError: dlopen failed: cannot locate symbol "_ZTVNSt6__ndk114basic_ifstreamIcNS_11char_traitsIcEEEE" referenced by "/data/app/~~F5IwquaXUZPdLpSEYA-JGA==/com.example.executorchllamademo-FSyx80gEhsQCsxz7hvS2Ew==/lib/arm64/libexecutorch.so"...
Copy file name to clipboardExpand all lines: examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md
+26-31Lines changed: 26 additions & 31 deletions
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,4 @@
1
1
# Building ExecuTorch Android Demo App for Llama/Llava running XNNPACK
2
-
3
-
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPACK backend. We currently support inference on their original data type (BFloat16). We have also added instructions to run [Llama Guard 1B models](#for-llama-guard-1b-models) on-device.
4
-
5
2
This tutorial covers the end to end workflow for building an android demo app using CPU on device via XNNPACK framework.
6
3
More specifically, it covers:
7
4
1. Export and quantization of Llama and Llava models against the XNNPACK backend.
* Install the [Android SDK API Level 34](https://developer.android.com/about/versions/15/setup-sdk) and [Android NDK 26.3.11579264](https://developer.android.com/studio/projects/install-ndk). **WARNING** NDK r27 will cause issues like:
20
-
```
21
-
java.lang.UnsatisfiedLinkError: dlopen failed: cannot locate symbol "_ZTVNSt6__ndk114basic_ifstreamIcNS_11char_traitsIcEEEE" referenced by "/data/app/~~F5IwquaXUZPdLpSEYA-JGA==/com.example.executorchllamademo-FSyx80gEhsQCsxz7hvS2Ew==/lib/arm64/libexecutorch.so"...
22
-
```
23
-
Please downgrade to version 26.3.11579264.
12
+
* Install the [Android SDK API Level 34](https://developer.android.com/about/versions/15/setup-sdk) and [Android NDK r27b](https://github.com/android/ndk/releases/tag/r27b).
13
+
* Note: This demo app and tutorial has only been validated with arm64-v8a [ABI](https://developer.android.com/ndk/guides/abis), with NDK 26.3.11579264 and r27b.
24
14
* If you have Android Studio set up, you can install them with
25
15
* Android Studio Settings -> Language & Frameworks -> Android SDK -> SDK Platforms -> Check the row with API Level 34.
26
16
* Android Studio Settings -> Language & Frameworks -> Android SDK -> SDK Tools -> Check NDK (Side by side) row.
27
17
* Alternatively, you can follow [this guide](https://github.com/pytorch/executorch/blob/856e085b9344c8b0bf220a97976140a5b76356aa/examples/demo-apps/android/LlamaDemo/SDK.md) to set up Java/SDK/NDK with CLI.
28
-
Supported Host OS: CentOS, macOS Sonoma on Apple Silicon.
29
-
30
-
31
-
Note: This demo app and tutorial has only been validated with arm64-v8a [ABI](https://developer.android.com/ndk/guides/abis), with NDK 26.3.11579264.
32
-
18
+
* Supported Host OS: CentOS, macOS Sonoma on Apple Silicon.
33
19
34
20
35
21
## Setup ExecuTorch
@@ -61,20 +47,33 @@ Optional: Use the --pybind flag to install with pybindings.
61
47
62
48
## Prepare Models
63
49
In this demo app, we support text-only inference with up-to-date Llama models and image reasoning inference with LLaVA 1.5.
64
-
65
-
### For Llama 3.2 1B and 3B models
66
-
We have supported BFloat16 as a data type on the XNNPACK backend for Llama 3.2 1B/3B models.
67
50
* You can request and download model weights for Llama through Meta official [website](https://llama.meta.com/).
68
51
* For chat use-cases, download the instruct models instead of pretrained.
69
52
* Run `examples/models/llama/install_requirements.sh` to install dependencies.
70
-
* The 1B model in BFloat16 format can run on mobile devices with 8GB RAM. The 3B model will require 12GB+ RAM.
53
+
* Rename tokenizer for Llama3.x with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly.
54
+
55
+
### For Llama 3.2 1B and 3B SpinQuant models
56
+
Meta has released prequantized INT4 SpinQuant Llama 3.2 models that ExecuTorch supports on the XNNPACK backend.
71
57
* Export Llama model and generate .pte file as below:
* Rename tokenizer for Llama 3.2 with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly.
69
+
### For Llama 3.2 1B and 3B BF16 models
70
+
We have supported BF16 as a data type on the XNNPACK backend for Llama 3.2 1B/3B models.
71
+
* The 1B model in BF16 format can run on mobile devices with 8GB RAM. The 3B model will require 12GB+ RAM.
72
+
* Export Llama model and generate .pte file as below:
For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-lightweight-models-(1b/3b)-).
80
79
@@ -88,19 +87,17 @@ To safeguard your application, you can use our Llama Guard models for prompt cla
88
87
* We prepared this model using the following command
You may wonder what the ‘--metadata’ flag is doing. This flag helps export the model with proper special tokens added that the runner can detect EOS tokens easily.
@@ -109,8 +106,6 @@ You may wonder what the ‘--metadata’ flag is doing. This flag helps export t
* Rename tokenizer for Llama 3.1 with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly.
113
-
114
109
115
110
### For LLaVA model
116
111
* For the Llava 1.5 model, you can get it from Huggingface [here](https://huggingface.co/llava-hf/llava-1.5-7b-hf).
Copy file name to clipboardExpand all lines: examples/demo-apps/apple_ios/LLaMA/README.md
+4-1Lines changed: 4 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,7 @@
1
1
# ExecuTorch Llama iOS Demo App
2
2
3
+
**[UPDATE - 10/24]** We have added support for running quantized Llama 3.2 1B/3B models in demo apps on the [XNNPACK backend](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md). We currently support inference with SpinQuant and QAT+LoRA quantization methods.
4
+
3
5
We’re excited to share that the newly revamped iOS demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an iOS demo app and how to exercise the many features ExecuTorch and Llama models have to offer.
4
6
5
7
This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case.
@@ -17,7 +19,8 @@ The goal is for you to see the type of support ExecuTorch provides and feel comf
17
19
## Supported Models
18
20
19
21
As a whole, the models that this app supports are (varies by delegate):
Copy file name to clipboardExpand all lines: examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md
+22-13Lines changed: 22 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,5 @@
1
1
# Building Llama iOS Demo for XNNPACK Backend
2
2
3
-
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPACK backend. We currently support inference on their original data type (BFloat16).
4
-
5
3
This tutorial covers the end to end workflow for building an iOS demo app using XNNPACK backend on device.
6
4
More specifically, it covers:
7
5
1. Export and quantization of Llama models against the XNNPACK backend.
@@ -38,24 +36,35 @@ Install dependencies
38
36
```
39
37
40
38
## Prepare Models
41
-
In this demo app, we support text-only inference with up-to-date Llama models.
42
-
43
-
Install the required packages to export the model
39
+
In this demo app, we support text-only inference with up-to-date Llama models and image reasoning inference with LLaVA 1.5.
40
+
* You can request and download model weights for Llama through Meta official [website](https://llama.meta.com/).
41
+
* For chat use-cases, download the instruct models instead of pretrained.
42
+
* Install the required packages to export the model:
44
43
45
44
```
46
45
sh examples/models/llama/install_requirements.sh
47
46
```
47
+
### For Llama 3.2 1B and 3B SpinQuant models
48
+
Meta has released prequantized INT4 SpinQuant Llama 3.2 models that ExecuTorch supports on the XNNPACK backend.
49
+
* Export Llama model and generate .pte file as below:
We have supported BFloat16 as a data type on the XNNPACK backend for Llama 3.2 1B/3B models.
51
-
* You can download original model weights for Llama through Meta official [website](https://llama.meta.com/).
52
-
* For chat use-cases, download the instruct models instead of pretrained.
53
-
* Run “examples/models/llama/install_requirements.sh” to install dependencies.
54
-
* The 1B model in BFloat16 format can run on mobile devices with 8GB RAM (iPhone 15 Pro and later). The 3B model will require 12GB+ RAM and hence will not fit on 8GB RAM phones.
54
+
### For Llama 3.2 1B and 3B QAT+LoRA models
55
+
Meta has released prequantized INT4 QAT+LoRA Llama 3.2 models that ExecuTorch supports on the XNNPACK backend.
56
+
* Export Llama model and generate .pte file as below:
We have supported BF16 as a data type on the XNNPACK backend for Llama 3.2 1B/3B models.
63
+
* The 1B model in BF16 format can run on mobile devices with 8GB RAM (iPhone 15 Pro and later). The 3B model will require 12GB+ RAM and hence will not fit on 8GB RAM phones.
55
64
* Export Llama model and generate .pte file as below:
For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-lightweight-models-(1b/3b)-).
@@ -64,7 +73,7 @@ For more detail using Llama 3.2 lightweight models including prompt template, pl
0 commit comments