Skip to content

Commit b9dadee

Browse files
mergennachinfacebook-github-bot
authored andcommitted
Add llama3.2 1B and 3B instructions (#5647)
Summary: Pull Request resolved: #5647 Reviewed By: helunwencser, dbort Differential Revision: D63401079 fbshipit-source-id: 35fad5e31d867570f22f0c3aa9c48cfd17b09afc
1 parent cd46721 commit b9dadee

File tree

1 file changed

+30
-9
lines changed

1 file changed

+30
-9
lines changed

examples/models/llama2/README.md

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ This example demonstrates how to run a [llama models](https://www.llama.com/) on
33

44
Here are supported models:
55

6+
- Llama 3.2 1B and 3B
67
- Llama 3.1 8B
78
- Llama 3 8B
89
- Llama 2 7B
@@ -93,7 +94,27 @@ Llama 2 7B performance was measured on the Samsung Galaxy S22, S24, and OnePlus
9394

9495
## Step 2: Prepare model
9596

96-
### Option A: Download and export Llama 3 8B instruct model
97+
### Option A: Download and export Llama3.2 1B/3B model.
98+
99+
1. Download `consolidated.00.pth`, `params.json` and `tokenizer.model` from [Llama website](https://www.llama.com/llama-downloads/) or (Hugging Face)[https://huggingface.co/meta-llama/Llama-3.2-1B]. For chat use-cases, download the instruct models.
100+
101+
2. Export model and generate `.pte` file. Use original bfloat16 version, without any quantization.
102+
103+
```
104+
# Set these paths to point to the downloaded files
105+
LLAMA_CHECKPOINT=path/to/checkpoint.pth
106+
LLAMA_PARAMS=path/to/params.json
107+
108+
python -m examples.models.llama2.export_llama \
109+
--checkpoint "${LLAMA_CHECKPOINT:?}" \
110+
--params "${LLAMA_PARAMS:?}" \
111+
-kv -X \
112+
-d bf16 \
113+
--metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' \
114+
--output_name="llama3_2.pte"
115+
```
116+
117+
### Option B: Download and export Llama 3 8B instruct model
97118

98119
You can export and run the original Llama 3 8B instruct model.
99120

@@ -108,7 +129,7 @@ You can export and run the original Llama 3 8B instruct model.
108129
109130
3. SpinQuant [Optional]. If you want to improve accuracy, you can use [SpinQuant](https://github.com/facebookresearch/SpinQuant). Namely, (1) you can generate a new checkpoint via `31_optimize_rotation_executorch.sh` and `32_eval_ptq_executorch.sh` commands in [SpinQuant repo](https://github.com/facebookresearch/SpinQuant/tree/main?tab=readme-ov-file#3-export-to-executorch) (2) pass in an extra `--use_spin_quant native` argument in `export_llama` script above.
110131
111-
### Option B: Download and export stories110M model
132+
### Option C: Download and export stories110M model
112133
113134
If you want to deploy and run a smaller model for educational purposes. From `executorch` root:
114135
@@ -131,7 +152,7 @@ If you want to deploy and run a smaller model for educational purposes. From `ex
131152
python -m extension.llm.tokenizer.tokenizer -t <tokenizer.model> -o tokenizer.bin
132153
```
133154
134-
### Option C: Download and export Llama 2 7B model
155+
### Option D: Download and export Llama 2 7B model
135156
136157
You can export and run the original Llama 2 7B model.
137158
@@ -149,7 +170,7 @@ You can export and run the original Llama 2 7B model.
149170
python -m extension.llm.tokenizer.tokenizer -t <tokenizer.model> -o tokenizer.bin
150171
```
151172
152-
### Option D: Download models from Hugging Face and convert from safetensor format to state dict
173+
### Option E: Download models from Hugging Face and convert from safetensor format to state dict
153174
154175
155176
You can also download above models from [Hugging Face](https://huggingface.co/). Since ExecuTorch starts from a PyTorch model, a script like below can be used to convert the Hugging Face safetensors format to PyTorch's state dict. It leverages the utils provided by [TorchTune](https://github.com/pytorch/torchtune).
@@ -249,10 +270,10 @@ Note for Mac users: There's a known linking issue with Xcode 15.1. Refer to the
249270
250271
3. Run model. Run options available [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/main.cpp#L18-L40).
251272
```
252-
cmake-out/examples/models/llama2/llama_main --model_path=<model pte file> --tokenizer_path=<tokenizer.bin> --prompt=<prompt>
273+
cmake-out/examples/models/llama2/llama_main --model_path=<model pte file> --tokenizer_path=<tokenizer.model> --prompt=<prompt>
253274
```
254275
255-
For Llama3, you can pass the original `tokenizer.model` (without converting to `.bin` file).
276+
For Llama2 and stories models, pass the converted `tokenizer.bin` file instead of `tokenizer.model`.
256277
257278
## Step 5: Run benchmark on Android phone
258279
@@ -313,19 +334,19 @@ cmake --build cmake-out-android/examples/models/llama2 -j16 --config Release
313334
```
314335
adb shell mkdir -p /data/local/tmp/llama
315336
adb push <model.pte> /data/local/tmp/llama/
316-
adb push <tokenizer.bin> /data/local/tmp/llama/
337+
adb push <tokenizer.model> /data/local/tmp/llama/
317338
adb push cmake-out-android/examples/models/llama2/llama_main /data/local/tmp/llama/
318339
```
319340
320341
**2.3 Run model**
321342
```
322-
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.bin> --prompt \"Once upon a time\" --seq_len 120"
343+
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.model> --prompt \"Once upon a time\" --seq_len 120"
323344
```
324345
## Step 6: Build Mobile apps
325346
326347
### iOS
327348
328-
Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) to for full instructions on building the iOS LLAMA Demo App. Note that to use Llama 3 8B instruct in the iOS demo app, you don't need to convert the downloaded `tokenizer.model` to `tokenizer.bin`, required for Llama 2 (shown in Step 2 - Option A - 4 above), but you need to rename `tokenizer.model` file to `tokenizer.bin` because the demo app looks for the tokenizer file with .bin extension.
349+
Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) to for full instructions on building the iOS LLAMA Demo App. Rename `tokenizer.model` file to `tokenizer.bin` because the demo app looks for the tokenizer file with .bin extension.
329350
330351
### Android
331352
Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-android.html) to for full instructions on building the Android LLAMA Demo App.

0 commit comments

Comments
 (0)