You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Cleaning up old contents from Llama2. This is purely skeleton.
Follow-up diffs will contain fixing individual steps.
Differential Revision: D55703398
Copy file name to clipboardExpand all lines: examples/models/llama2/README.md
+32-20Lines changed: 32 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,7 @@
1
1
# Summary
2
-
This example demonstrates how to Export a [Llama 2](https://ai.meta.com/llama/) model in ExecuTorch such that it can be used in a mobile environment.
2
+
This example demonstrates how to run a [Llama 2](https://ai.meta.com/llama/) model on mobile via ExecuTorch. We use XNNPACK to accelerate the performance and 4-bit groupwise PTQ quantization to fit the model on a phone.
3
+
4
+
3
5
For Llama2, please refer to [the llama's github page](https://github.com/facebookresearch/llama) for details.
4
6
Pretrained parameters are not included in this repo. Users are suggested to download them through [the llama's download page](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
5
7
@@ -12,31 +14,28 @@ Overall, Llama models are powerful and versatile language models that can be use
12
14
13
15
Please note that the models are subject to the [acceptable use policy](https://github.com/facebookresearch/llama/blob/main/USE_POLICY.md) and the provided [responsible use guide](https://ai.meta.com/static-resource/responsible-use-guide/).
14
16
15
-
# Notes
16
-
1. This example is to show the feasibility of exporting a Llama2 model in ExecuTorch. There is no guarantee for performance.
17
-
2. The provided checkpoint, demo_rand_params.pth is a dummy checkpoint with random parameters. It does not provide meaningful results. It's only for the purpose of demonstration and fast iterations. Use the options `--checkpoint <checkpoint>` and `--params <params>` for custom checkpoints.
18
-
19
17
20
-
# Limitations
21
-
This example tries to reuse the Python code, with modifications to make it compatible with current ExecuTorch:
22
-
1. Since ExecuTorch does not support complex Tensor data type, use the customized functions to have rotary embedding with real numbers. Please see [GitHub issue: Support complex data type in ExecuTorch](https://github.com/pytorch/executorch/issues/886).
23
-
2. No KV cache. The current cache implementation in the original Llama2 repo is not supported by ExecuTorch, because ExecuTorch runtime assumes model data attributes being static. Please see [GitHub issue: Add support of mutable buffers in ExecuTorch](https://github.com/pytorch/executorch/issues/897).
24
-
3. No CUDA. ExecuTorch is focused on Edge use cases where CUDA is not available on most of the edge devices.
25
-
4. No dependencies on fairscale. The ColumnParallelLinear, ParallelEmbedding and training are not needed and supported in ExecuTorch.
18
+
# Results
26
19
20
+
TODO - Will fill in table of results.
27
21
28
22
# Instructions:
29
-
### Setup
30
-
1. Follow the [tutorial](https://pytorch.org/executorch/stable/getting-started-setup) to set up ExecuTorch
31
-
2.`cd examples/third-party/llama`
32
-
3.`pip install -e .`
33
-
4. Go back to `executorch` root, run `bash examples/models/llama2/install_requirements.sh`.
23
+
### Step 1: Setup
24
+
1. Follow the [tutorial](https://pytorch.org/executorch/main/getting-started-setup) to set up ExecuTorch
25
+
2. Run `examples/models/llama2/install_requirements.sh`.
26
+
27
+
### Step 2: Prepare model
28
+
29
+
#### Option A: Download and export llama2 model
30
+
31
+
You can export and run the original Llama2 model.
34
32
35
-
### Export llama2 models
36
-
2. From `executorch` root, run `python3 -m examples.models.llama2.export_llama`. The exported program, llama2.pte would be saved in current directory using the dummy checkpoint.
37
-
3. Llama2 pretrained parameters can be downloaded [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and run with `python3 -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json>`.
33
+
1. From `executorch` root, run `python3 -m examples.models.llama2.export_llama`. The exported program, llama2.pte would be saved in current directory using the dummy checkpoint.
34
+
2. Llama2 pretrained parameters can be downloaded [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and run with `python3 -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json>`.
38
35
39
-
### Export and run stories110M model
36
+
#### Option B: Export stories110M model
37
+
38
+
If you want to deploy and run a smaller model for education purposes
40
39
41
40
1. Download `stories110M.pt` and `tokenizer.model` from Github.
42
41
```
@@ -59,6 +58,8 @@ This example tries to reuse the Python code, with modifications to make it compa
59
58
```
60
59
Build with cmake: todo
61
60
61
+
### Step 3: Run on your computer to validate
62
+
62
63
5. Run model. Run options available [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/main.cpp#L13).
63
64
Build with buck2:
64
65
```
@@ -67,3 +68,14 @@ This example tries to reuse the Python code, with modifications to make it compa
67
68
Build with cmake: todo
68
69
69
70
See test script [here](https://github.com/pytorch/executorch/blob/main/.ci/scripts/test_llama.sh).
71
+
72
+
### Step 4: Run benchmark on a phone via adb shell.
73
+
74
+
### Step 5: Build iOS and Android apps
75
+
76
+
77
+
# Notes
78
+
This example tries to reuse the Python code, with minimal modifications to make it compatible with current ExecuTorch:
79
+
1. Since ExecuTorch does not support complex Tensor data type, use the customized functions to have rotary embedding with real numbers. Please see [GitHub issue: Support complex data type in ExecuTorch](https://github.com/pytorch/executorch/issues/886).
80
+
2. No CUDA. ExecuTorch is focused on Edge use cases where CUDA is not available on most of the edge devices.
81
+
3. No dependencies on fairscale. The ColumnParallelLinear, ParallelEmbedding and training are not needed and supported in ExecuTorch.
0 commit comments