Skip to content

Commit c032005

Browse files
lucylqfacebook-github-bot
authored andcommitted
Cmake llama instructions (#2853)
Summary: Add cmake llama instructions Reviewed By: mergennachin Differential Revision: D55768596
1 parent bacc0c8 commit c032005

File tree

2 files changed

+51
-6
lines changed

2 files changed

+51
-6
lines changed

docs/source/getting-started-setup.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,11 @@ Follow these steps:
117117
./install_requirements.sh
118118
```
119119

120+
To install with pybindings and dependencies for other backends. See options [here](https://github.com/pytorch/executorch/blob/main/install_requirements.sh#L26-L29):
121+
```bash
122+
./install_requirements.sh --pybind <coreml | mps | xnnpack>
123+
```
124+
120125
You have successfully set up your environment to work with ExecuTorch. The next
121126
step is to generate a sample ExecuTorch program.
122127

examples/models/llama2/README.md

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,11 @@ Performance was measured on Samsung Galaxy S22, S23, S24 and One Plus 12. Measur
4444

4545
# Instructions
4646

47+
## Tested on
48+
49+
- MacOS M1/M2, Linux.
50+
- For Llama7b, your device may require at least 32GB RAM. If this is a constraint for you, please try the smaller stories model.
51+
4752
## Step 1: Setup
4853
1. Follow the [tutorial](https://pytorch.org/executorch/main/getting-started-setup) to set up ExecuTorch
4954
2. Run `examples/models/llama2/install_requirements.sh` to install a few dependencies.
@@ -82,21 +87,45 @@ If you want to deploy and run a smaller model for educational purposes. From `ex
8287
```
8388
4. Create tokenizer.bin.
8489
85-
Build with buck2:
8690
```
8791
python -m examples.models.llama2.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin
8892
```
8993
9094
## Step 3: Run on your computer to validate
9195
92-
1. Build llama runner. TODO
96+
1. Build executorch with XNNPACK enabled. Build options available [here](https://github.com/pytorch/executorch/blob/main/CMakeLists.txt#L59).
97+
```
98+
cmake -DBUCK2=/tmp/buck2 \
99+
-DPYTHON_EXECUTABLE=python \
100+
-DCMAKE_INSTALL_PREFIX=cmake-out \
101+
-DEXECUTORCH_ENABLE_LOGGING=1 \
102+
-DCMAKE_BUILD_TYPE=Release \
103+
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
104+
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
105+
-DEXECUTORCH_BUILD_XNNPACK=ON \
106+
-DEXECUTORCH_BUILD_OPTIMIZED=ON \
107+
-Bcmake-out .
108+
109+
cmake --build cmake-out -j16 --target install --config Release
110+
```
111+
112+
2. Build llama runner.
113+
```
114+
cmake -DBUCK2=/tmp/buck2 \
115+
-DPYTHON_EXECUTABLE=python \
116+
-DCMAKE_INSTALL_PREFIX=cmake-out \
117+
-DCMAKE_BUILD_TYPE=Release \
118+
-DEXECUTORCH_BUILD_OPTIMIZED=ON \
119+
-Bcmake-out/examples/models/llama2 \
120+
examples/models/llama2
121+
122+
cmake --build cmake-out/examples/models/llama2 -j16 --config Release
123+
```
93124
94-
2. Run model. Run options available [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/main.cpp#L13).
95-
Build with buck2:
125+
3. Run model. Run options available [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/main.cpp#L18-L40).
96126
```
97-
buck2 run examples/models/llama2:main -- --model_path=llama2.pte --tokenizer_path=tokenizer.bin --prompt="Once"
127+
cmake-out/examples/models/llama2/llama_main --model_path=<model pte file> --tokenizer_path=<tokenizer.bin> --prompt=<prompt>
98128
```
99-
Build with cmake: TODO
100129
101130
## Step 4: Run benchmark on Android phone
102131
@@ -117,3 +146,14 @@ This example tries to reuse the Python code, with minimal modifications to make
117146
1. Since ExecuTorch does not support complex Tensor data type, use the customized functions to have rotary embedding with real numbers. Please see [GitHub issue: Support complex data type in ExecuTorch](https://github.com/pytorch/executorch/issues/886).
118147
2. No CUDA. ExecuTorch is focused on Edge use cases where CUDA is not available on most of the edge devices.
119148
3. No dependencies on fairscale. The ColumnParallelLinear, ParallelEmbedding and training are not needed and supported in ExecuTorch.
149+
150+
151+
# Clean
152+
To clean your build:
153+
```
154+
git clean -xfd
155+
pip uninstall executorch
156+
./install_requirements.sh <options>
157+
158+
rm -rf cmake-out
159+
```

0 commit comments

Comments
 (0)