You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Cleaning up old contents from Llama2. This is purely skeleton.
Follow-up diffs will contain fixing individual steps.
Differential Revision: D55703398
Copy file name to clipboardExpand all lines: examples/models/llama2/README.md
+31-17Lines changed: 31 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,7 @@
1
1
# Summary
2
-
This example demonstrates how to Export a [Llama 2](https://ai.meta.com/llama/) model in ExecuTorch such that it can be used in a mobile environment.
2
+
This example demonstrates how to run a [Llama 2](https://ai.meta.com/llama/) model on mobile via ExecuTorch. We use XNNPACK to accelerate the performance and 4-bit groupwise PTQ quantization to fit the model on a phone.
3
+
4
+
3
5
For Llama2, please refer to [the llama's github page](https://github.com/facebookresearch/llama) for details.
4
6
Pretrained parameters are not included in this repo. Users are suggested to download them through [the llama's download page](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
5
7
@@ -12,31 +14,30 @@ Overall, Llama models are powerful and versatile language models that can be use
12
14
13
15
Please note that the models are subject to the [acceptable use policy](https://github.com/facebookresearch/llama/blob/main/USE_POLICY.md) and the provided [responsible use guide](https://ai.meta.com/static-resource/responsible-use-guide/).
14
16
15
-
# Notes
16
-
1. This example is to show the feasibility of exporting a Llama2 model in ExecuTorch. There is no guarantee for performance.
17
-
2. The provided checkpoint, demo_rand_params.pth is a dummy checkpoint with random parameters. It does not provide meaningful results. It's only for the purpose of demonstration and fast iterations. Use the options `--checkpoint <checkpoint>` and `--params <params>` for custom checkpoints.
18
-
19
17
20
-
# Limitations
21
-
This example tries to reuse the Python code, with modifications to make it compatible with current ExecuTorch:
22
-
1. Since ExecuTorch does not support complex Tensor data type, use the customized functions to have rotary embedding with real numbers. Please see [GitHub issue: Support complex data type in ExecuTorch](https://github.com/pytorch/executorch/issues/886).
23
-
2. No KV cache. The current cache implementation in the original Llama2 repo is not supported by ExecuTorch, because ExecuTorch runtime assumes model data attributes being static. Please see [GitHub issue: Add support of mutable buffers in ExecuTorch](https://github.com/pytorch/executorch/issues/897).
24
-
3. No CUDA. ExecuTorch is focused on Edge use cases where CUDA is not available on most of the edge devices.
25
-
4. No dependencies on fairscale. The ColumnParallelLinear, ParallelEmbedding and training are not needed and supported in ExecuTorch.
18
+
# Results
26
19
20
+
TODO - Will fill in table of results.
27
21
28
22
# Instructions:
29
-
### Setup
30
-
1. Follow the [tutorial](https://pytorch.org/executorch/stable/getting-started-setup) to set up ExecuTorch
23
+
### Step 1: Setup
24
+
1. Follow the [tutorial](https://pytorch.org/executorch/main/getting-started-setup) to set up ExecuTorch
31
25
2.`cd examples/third-party/llama`
32
26
3.`pip install -e .`
33
27
4. Go back to `executorch` root, run `bash examples/models/llama2/install_requirements.sh`.
34
28
35
-
### Export llama2 models
36
-
2. From `executorch` root, run `python3 -m examples.models.llama2.export_llama`. The exported program, llama2.pte would be saved in current directory using the dummy checkpoint.
37
-
3. Llama2 pretrained parameters can be downloaded [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and run with `python3 -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json>`.
29
+
### Step 2: Prepare model
30
+
31
+
#### Option A: Download and export llama2 model
32
+
33
+
You can export and run the original Llama2 model.
34
+
35
+
1. From `executorch` root, run `python3 -m examples.models.llama2.export_llama`. The exported program, llama2.pte would be saved in current directory using the dummy checkpoint.
36
+
2. Llama2 pretrained parameters can be downloaded [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and run with `python3 -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json>`.
38
37
39
-
### Export and run stories110M model
38
+
#### Option B: Export stories110M model
39
+
40
+
If you want to deploy and run a smaller model for education purposes
40
41
41
42
1. Download `stories110M.pt` and `tokenizer.model` from Github.
42
43
```
@@ -59,6 +60,8 @@ This example tries to reuse the Python code, with modifications to make it compa
59
60
```
60
61
Build with cmake: todo
61
62
63
+
### Step 3: Run on your computer to validate
64
+
62
65
5. Run model. Run options available [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/main.cpp#L13).
63
66
Build with buck2:
64
67
```
@@ -67,3 +70,14 @@ This example tries to reuse the Python code, with modifications to make it compa
67
70
Build with cmake: todo
68
71
69
72
See test script [here](https://github.com/pytorch/executorch/blob/main/.ci/scripts/test_llama.sh).
73
+
74
+
### Step 4: Run benchmark on a phone via adb shell.
75
+
76
+
### Step 5: Build iOS and Android apps
77
+
78
+
79
+
# Notes
80
+
This example tries to reuse the Python code, with minimal modifications to make it compatible with current ExecuTorch:
81
+
1. Since ExecuTorch does not support complex Tensor data type, use the customized functions to have rotary embedding with real numbers. Please see [GitHub issue: Support complex data type in ExecuTorch](https://github.com/pytorch/executorch/issues/886).
82
+
2. No CUDA. ExecuTorch is focused on Edge use cases where CUDA is not available on most of the edge devices.
83
+
3. No dependencies on fairscale. The ColumnParallelLinear, ParallelEmbedding and training are not needed and supported in ExecuTorch.
0 commit comments