Skip to content

Commit 0dd1706

Browse files
kimishpatelfacebook-github-bot
authored andcommitted
Update android and what is coming next section (#2866)
Summary: - Instructions on building android binary and benchmarking it via phone - What is coming next Created from CodeHub with https://fburl.com/edit-in-codehub Reviewed By: cccclai, kirklandsign Differential Revision: D55784384
1 parent e5a8de0 commit 0dd1706

File tree

1 file changed

+66
-2
lines changed

1 file changed

+66
-2
lines changed

examples/models/llama2/README.md

Lines changed: 66 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,15 +140,79 @@ The Uncyclotext results generated above used: `{max_seq_len: 2048, limit: 1000}`
140140
141141
## Step 5: Run benchmark on Android phone
142142
143-
1. Build llama runner binary for Android
143+
**1. Build llama runner binary for Android**
144144
145-
2. Run on Android via adb shell
145+
*Pre-requisite*: Android NDK (tested with r26c) which can be downloaded from [here](https://developer.android.com/ndk/downloads). Note that the mac binary can be unpackaged and you can locate NDK folder from it.
146146
147+
**1.1 Set Android NDK**
148+
```
149+
export ANDROID_NDK=<path-to-android-ndk>
150+
```
151+
**1.2 Build executorch and associated libraries for android.**
152+
```
153+
cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
154+
-DANDROID_ABI=arm64-v8a \
155+
-DANDROID_PLATFORM=android-23 \
156+
-DCMAKE_INSTALL_PREFIX=cmake-out-android \
157+
-DCMAKE_BUILD_TYPE=Release \
158+
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
159+
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
160+
-DEXECUTORCH_ENABLE_LOGGING=1 \
161+
-DEXECUTORCH_BUILD_XNNPACK=ON \
162+
-DPYTHON_EXECUTABLE=python \
163+
-DEXECUTORCH_BUILD_OPTIMIZED=ON \
164+
-Bcmake-out-android .
165+
166+
cmake --build cmake-out-android -j16 --target install --config Release
167+
```
168+
169+
**1.2 Build llama runner for android**
170+
```
171+
cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
172+
-DANDROID_ABI=arm64-v8a \
173+
-DANDROID_PLATFORM=android-23 \
174+
-DCMAKE_INSTALL_PREFIX=cmake-out-android \
175+
-DCMAKE_BUILD_TYPE=Release \
176+
-DPYTHON_EXECUTABLE=python \
177+
-DEXECUTORCH_BUILD_OPTIMIZED=ON \
178+
-Bcmake-out-android/examples/models/llama2 \
179+
examples/models/llama2
180+
```
181+
182+
**2. Run on Android via adb shell**
183+
184+
*Pre-requisite*: Make sure you enable USB debugging via developer options on your phone
185+
186+
**2.1 Connect your android phone**
187+
188+
**2.2 Upload model, tokenizer and llama runner binary to phone**
189+
```
190+
adb push <model.pte> /data/local/tmp/
191+
adb push <tokenizer.bin> /data/local/tmp/
192+
adb push cmake-out-android/examples/models/llama2/llama_main /data/local/tmp/
193+
```
194+
195+
**2.3 Run model**
196+
```
197+
adb shell "cd /data/local/tmp && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.bin> --prompt "Once upon a time" --seq_len 120
198+
```
147199
## Step 6: Build iOS and/or Android apps
148200
149201
TODO
150202
151203
# What is coming next?
204+
## Quantization
205+
- Enabling FP16 model to leverage smaller groupsize for 4-bit quantization.
206+
- Enabling GPTQ for 4-bit groupwise quantization
207+
- Enabling custom quantization
208+
- Lower bit quantization
209+
## Models
210+
- Enabling more generative AI models and architectures.
211+
- Enable support for mult-modal models like LlaVa.
212+
## Performance
213+
- Performance improvement via techniques such as speculative decoding
214+
- Enabling LLama2 7b and other architectures via Vulkan
215+
- Enabling performant execution of widely used quantization schemes.
152216
153217
TODO
154218

0 commit comments

Comments
 (0)