Skip to content

Commit a2f1f14

Browse files
authored
Add MediaTek Llama Runner in Android App Readme
Differential Revision: D65149445 Pull Request resolved: #6548
1 parent 461d61d commit a2f1f14

File tree

1 file changed

+49
-42
lines changed

1 file changed

+49
-42
lines changed

examples/demo-apps/android/LlamaDemo/docs/delegates/mediatek_README.md

Lines changed: 49 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ This tutorial covers the end to end workflow for running Llama 3-8B-instruct inf
33
More specifically, it covers:
44
1. Export and quantization of Llama models against the MediaTek backend.
55
2. Building and linking libraries that are required to inference on-device for Android platform using MediaTek AI accelerators.
6-
3. Loading the needed files on the device and running inference.
6+
3. Loading the needed model files on the device and using the Android demo app to run inference.
77

88
Verified on MacOS, Linux CentOS (model export), Python 3.10, Android NDK 26.3.11579264
99
Phone verified: MediaTek Dimensity 9300 (D9300) chip.
@@ -51,19 +51,10 @@ zstd -cdq "<downloaded_buck2_file>.zst" > "<path_to_store_buck2>/buck2" && chmod
5151
export BUCK2=path_to_buck/buck2 # Download BUCK2 and create BUCK2 executable
5252
export ANDROID_NDK=path_to_android_ndk
5353
export NEURON_BUFFER_ALLOCATOR_LIB=path_to_buffer_allocator/libneuron_buffer_allocator.so
54+
export NEURON_USDK_ADAPTER_LIB=path_to_usdk_adapter/libneuronusdk_adapter.mtk.so
55+
export ANDROID_ABIS=arm64-v8a
5456
```
5557

56-
## Build Backend and MTK Llama Runner
57-
Next we need to build and compile the MTK backend and MTK Llama runner.
58-
```
59-
cd examples/mediatek
60-
./mtk_build_examples.sh
61-
```
62-
63-
This will generate a cmake-android-out folder that will contain a runner executable for inferring with Llama models and another library file:
64-
* `cmake-android-out/examples/mediatek/mtk_llama_executor_runner`
65-
* `cmake-android-out/backends/mediatek/libneuron_backend.so`
66-
6758
## Export Llama Model
6859
MTK currently supports Llama 3 exporting.
6960

@@ -104,52 +95,68 @@ Note: Exporting model flow can take 2.5 hours (114GB RAM for num_chunks=4) to co
10495

10596
Before continuing forward, make sure to modify the tokenizer, token embedding, and model paths in the examples/mediatek/executor_runner/run_llama3_sample.sh.
10697

107-
## Deploy Files on Device
98+
### Deploy
99+
First, make sure your Android phone’s chipset version is compatible with this demo (MediaTek Dimensity 9300 (D9300)) chip. Once you have the model, tokenizer, and runner generated ready, you can push them and the .so files to the device before we start running using the runner via shell.
108100

109-
### Prepare to Deploy
110-
Prior to deploying the files on device, make sure to modify the tokenizer, token embedding, and model file names in examples/mediatek/executor_runner/run_llama3_sample.sh reflect what was generated during the Export Llama Model step.
101+
```
102+
adb shell mkdir -p /data/local/tmp/et-mtk/ (or any other directory name)
103+
adb push embedding_<model_name>_fp32.bin /data/local/tmp/et-mtk
104+
adb push tokenizer.model /data/local/tmp/et-mtk
105+
adb push <exported_prompt_model_0>.pte /data/local/tmp/et-mtk
106+
adb push <exported_prompt_model_1>.pte /data/local/tmp/et-mtk
107+
...
108+
adb push <exported_prompt_model_n>.pte /data/local/tmp/et-mtk
109+
adb push <exported_gen_model_0>.pte /data/local/tmp/et-mtk
110+
adb push <exported_gen_model_1>.pte /data/local/tmp/et-mtk
111+
...
112+
adb push <exported_gen_model_n>.pte /data/local/tmp/et-mtk
113+
```
111114

112-
<p align="center">
113-
<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/mtk_changes_to_shell_file.png" style="width:600px">
114-
</p>
115+
## Populate Model Paths in Runner
115116

116-
In addition, create a sample_prompt.txt file with a prompt. This will be deployed to the device in the next step.
117-
* Example content of a sample_prompt.txt file:
118-
```
119-
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
117+
The Mediatek runner (`examples/mediatek/executor_runner/mtk_llama_runner.cpp`) contains the logic for implementing the function calls that come from the Android app.
120118

121-
You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>
119+
**Important!** Currently the model paths are set in the runner-level. Modify the values in `examples/mediatek/executor_runner/llama_runner/llm_helper/include/llama_runner_values.h` to set the model paths, tokenizer path, embedding file path, and other metadata.
122120

123-
What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
124-
```
125121

126-
### Deploy
127-
First, make sure your Android phone’s chipset version is compatible with this demo (MediaTek Dimensity 9300 (D9300)) chip. Once you have the model, tokenizer, and runner generated ready, you can push them and the .so files to the device before we start running using the runner via shell.
122+
## Build AAR Library
128123

124+
Next we need to build and compile the MediaTek backend and MediaTek Llama runner. By setting `NEURON_BUFFER_ALLOCATOR_LIB`, the script will build the MediaTek backend.
129125
```
130-
adb shell mkdir -p /data/local/tmp/llama
131-
adb push examples/mediatek/executor_runner/run_llama3_sample.sh /data/local/tmp/llama
132-
adb push sample_prompt.txt /data/local/tmp/llama
133-
adb push cmake-android-out/examples/mediatek/mtk_llama_executor_runner /data/local/tmp/llama
134-
adb push cmake-android-out/backends/mediatek/libneuron_backend.so /data/local/tmp/llama
135-
adb push libneuron_buffer_allocator.so /data/local/tmp/llama
136-
adb push libneuronusdk_adapter.mtk.so /data/local/tmp/llama
137-
adb push embedding_<model_name>_fp32.bin /data/local/tmp/llama
138-
adb push tokenizer.model /data/local/tmp/llama
126+
sh build/build_android_llm_demo.sh
139127
```
140128

129+
**Output**: This will generate an .aar file that is already imported into the expected directory for the Android app. It will live in `examples/demo-apps/android/Llamademo/app/libs`.
130+
131+
If you were to unzip the .aar file or open it in Android Studio, verify it contains the following related to MediaTek backend:
132+
* libneuron_buffer_allocator.so
133+
* libneuronusdk_adapter.mtk.so
134+
* libneuron_backend.so (generated during build)
135+
141136
## Run Demo
142-
At this point we have pushed all the required files on the device and we are ready to run the demo!
143-
```
144-
adb shell
145137

146-
<android_device>:/ $ cd data/local/tmp/llama
147-
<android_device>:/data/local/tmp/llama $ sh run_llama3_sample.sh
138+
### Alternative 1: Android Studio (Recommended)
139+
1. Open Android Studio and select “Open an existing Android Studio project” to open examples/demo-apps/android/LlamaDemo.
140+
2. Run the app (^R). This builds and launches the app on the phone.
141+
142+
### Alternative 2: Command line
143+
Without Android Studio UI, we can run gradle directly to build the app. We need to set up the Android SDK path and invoke gradle.
148144
```
145+
export ANDROID_HOME=<path_to_android_sdk_home>
146+
pushd examples/demo-apps/android/LlamaDemo
147+
./gradlew :app:installDebug
148+
popd
149+
```
150+
If the app successfully run on your device, you should see something like below:
149151

150152
<p align="center">
151-
<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/mtk_output.png" style="width:800px">
153+
<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/opening_the_app_details.png" style="width:800px">
152154
</p>
153155

156+
Once you've loaded the app on the device:
157+
1. Click on the Settings in the app
158+
2. Select MediaTek from the Backend dropdown
159+
3. Click the "Load Model" button. This will load the models from the Runner
160+
154161
## Reporting Issues
155162
If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new).

0 commit comments

Comments
 (0)