You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/models/llama2/README.md
+30-9Lines changed: 30 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,7 @@ This example demonstrates how to run a [llama models](https://www.llama.com/) on
3
3
4
4
Here are supported models:
5
5
6
+
- Llama 3.2 1B and 3B
6
7
- Llama 3.1 8B
7
8
- Llama 3 8B
8
9
- Llama 2 7B
@@ -93,7 +94,27 @@ Llama 2 7B performance was measured on the Samsung Galaxy S22, S24, and OnePlus
93
94
94
95
## Step 2: Prepare model
95
96
96
-
### Option A: Download and export Llama 3 8B instruct model
97
+
### Option A: Download and export Llama3.2 1B/3B model.
98
+
99
+
1. Download `consolidated.00.pth`, `params.json` and `tokenizer.model` from [Llama website](https://www.llama.com/llama-downloads/) or (Hugging Face)[https://huggingface.co/meta-llama/Llama-3.2-1B]. For chat use-cases, download the instruct models.
100
+
101
+
2. Export model and generate `.pte` file. Use original bfloat16 version, without any quantization.
102
+
103
+
```
104
+
# Set these paths to point to the downloaded files
### Option B: Download and export Llama 3 8B instruct model
97
118
98
119
You can export and run the original Llama 3 8B instruct model.
99
120
@@ -108,7 +129,7 @@ You can export and run the original Llama 3 8B instruct model.
108
129
109
130
3. SpinQuant [Optional]. If you want to improve accuracy, you can use [SpinQuant](https://github.com/facebookresearch/SpinQuant). Namely, (1) you can generate a new checkpoint via `31_optimize_rotation_executorch.sh` and `32_eval_ptq_executorch.sh` commands in [SpinQuant repo](https://github.com/facebookresearch/SpinQuant/tree/main?tab=readme-ov-file#3-export-to-executorch) (2) pass in an extra `--use_spin_quant native` argument in `export_llama` script above.
110
131
111
-
### Option B: Download and export stories110M model
132
+
### Option C: Download and export stories110M model
112
133
113
134
If you want to deploy and run a smaller model for educational purposes. From `executorch` root:
114
135
@@ -131,7 +152,7 @@ If you want to deploy and run a smaller model for educational purposes. From `ex
### Option D: Download models from Hugging Face and convert from safetensor format to state dict
173
+
### Option E: Download models from Hugging Face and convert from safetensor format to state dict
153
174
154
175
155
176
You can also download above models from [Hugging Face](https://huggingface.co/). Since ExecuTorch starts from a PyTorch model, a script like below can be used to convert the Hugging Face safetensors format to PyTorch's state dict. It leverages the utils provided by [TorchTune](https://github.com/pytorch/torchtune).
@@ -249,10 +270,10 @@ Note for Mac users: There's a known linking issue with Xcode 15.1. Refer to the
249
270
250
271
3. Run model. Run options available [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/main.cpp#L18-L40).
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.bin> --prompt \"Once upon a time\" --seq_len 120"
343
+
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.model> --prompt \"Once upon a time\" --seq_len 120"
323
344
```
324
345
## Step 6: Build Mobile apps
325
346
326
347
### iOS
327
348
328
-
Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) to for full instructions on building the iOS LLAMA Demo App. Note that to use Llama 3 8B instruct in the iOS demo app, you don't need to convert the downloaded `tokenizer.model` to `tokenizer.bin`, required for Llama 2 (shown in Step 2 - Option A - 4 above), but you need to rename `tokenizer.model` file to `tokenizer.bin` because the demo app looks for the tokenizer file with .bin extension.
349
+
Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) to for full instructions on building the iOS LLAMA Demo App. Rename `tokenizer.model` file to `tokenizer.bin` because the demo app looks for the tokenizer file with .bin extension.
329
350
330
351
### Android
331
352
Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-android.html) to for full instructions on building the Android LLAMA Demo App.
0 commit comments