Skip to content

Commit 5d27df1

Browse files
authored
Merge branch 'release/0.4' into cherry-pick-5453-by-pytorch_bot_bot_
2 parents 7ae391a + dabf082 commit 5d27df1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+442
-419
lines changed

build/build_apple_frameworks.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ libcustom_ops.a,\
5757

5858
FRAMEWORK_KERNELS_OPTIMIZED="kernels_optimized:\
5959
liboptimized_kernels.a,\
60-
liboptimized_ops_lib.a,\
60+
liboptimized_native_cpu_ops_lib.a,\
6161
:"
6262

6363
FRAMEWORK_KERNELS_PORTABLE="kernels_portable:\

docs/source/getting-started-setup.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,23 @@ Alternatively, if you would like to experiment with ExecuTorch quickly and easil
110110
```
111111
After setting up your environment, you are ready to convert your PyTorch programs
112112
to ExecuTorch.
113+
114+
> **_NOTE:_** Cleaning the build system
115+
>
116+
> When fetching a new version of the upstream repo (via `git fetch` or `git
117+
> pull`) it is a good idea to clean the old build artifacts. The build system
118+
> does not currently adapt well to changes in build dependencies.
119+
>
120+
> You should also update and pull the submodules again, in case their versions
121+
> have changed.
122+
>
123+
> ```bash
124+
> # From the root of the executorch repo:
125+
> rm -rf cmake-out pip-out
126+
> git submodule sync
127+
> git submodule update --init
128+
> ```
129+
113130
## Create an ExecuTorch program
114131
115132
After setting up your environment, you are ready to convert your PyTorch programs
@@ -169,13 +186,30 @@ For now, let's use [`executor_runner`](https://github.com/pytorch/executorch/blo
169186
### Build Tooling Setup
170187
The ExecuTorch repo uses CMake to build its C++ code. Here, we'll configure it to build the `executor_runner` tool to run it on our desktop OS.
171188
```bash
172-
# Clean and configure the CMake build system. Compiled programs will appear in the executorch/cmake-out directory we create here.
189+
# Clean and configure the CMake build system. Compiled programs will
190+
# appear in the executorch/cmake-out directory we create here.
173191
(rm -rf cmake-out && mkdir cmake-out && cd cmake-out && cmake ..)
174192
175193
# Build the executor_runner target
176194
cmake --build cmake-out --target executor_runner -j9
177195
```
178196
197+
> **_NOTE:_** Cleaning the build system
198+
>
199+
> When fetching a new version of the upstream repo (via `git fetch` or `git
200+
> pull`) it is a good idea to clean the old build artifacts. The build system
201+
> does not currently adapt well to changes in build dependencies.
202+
>
203+
> You should also update and pull the submodules again, in case their versions
204+
> have changed.
205+
>
206+
> ```bash
207+
> # From the root of the executorch repo:
208+
> rm -rf cmake-out pip-out
209+
> git submodule sync
210+
> git submodule update --init
211+
> ```
212+
179213
### Run Your Program
180214
181215
Now that we've exported a program and built the runtime, let's execute it!

docs/source/index.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,9 @@ Topics in this section will help you get started with ExecuTorch.
117117
:hidden:
118118

119119
llm/getting-started
120+
llm/llama-demo-android
121+
llm/build-run-llama3-qualcomm-ai-engine-direct-backend
122+
llm/llama-demo-ios
120123

121124
.. toctree::
122125
:glob:

docs/source/llm/llama-demo-android.md

Lines changed: 1 addition & 140 deletions
Original file line numberDiff line numberDiff line change
@@ -1,141 +1,2 @@
1-
# ExecuTorch Llama Android Demo App
2-
3-
We’re excited to share that the newly revamped Android demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an Android demo app and how to exercise the many features ExecuTorch and Llama models have to offer.
4-
5-
This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case.
6-
7-
Please dive in and start exploring our demo app today! We look forward to any feedback and are excited to see your innovative ideas.
8-
9-
10-
## Key Concepts
11-
From this demo app, you will learn many key concepts such as:
12-
* How to prepare Llama models, build the ExecuTorch library, and model inferencing across delegates
13-
* Expose the ExecuTorch library via JNI layer
14-
* Familiarity with current ExecuTorch app-facing capabilities
15-
16-
The goal is for you to see the type of support ExecuTorch provides and feel comfortable with leveraging it for your use cases.
17-
18-
## Supporting Models
19-
As a whole, the models that this app supports are (varies by delegate):
20-
* Llama 3.1 8B
21-
* Llama 3 8B
22-
* Llama 2 7B
23-
* LLaVA-1.5 vision model (only XNNPACK)
24-
25-
26-
## Building the APK
27-
First it’s important to note that currently ExecuTorch provides support across 3 delegates. Once you identify the delegate of your choice, select the README link to get a complete end-to-end instructions for environment set-up to exporting the models to build ExecuTorch libraries and apps to run on device:
28-
29-
| Delegate | Resource |
30-
| ------------- | ------------- |
31-
| XNNPACK (CPU-based library) | [link](docs/delegates/xnnpack_README.md) |
32-
| QNN (Qualcomm AI Accelerators) | [link](docs/delegates/qualcomm_README.md) |
33-
| MediaTek (MediaTek AI Accelerators) | [link](docs/delegates/mediatek_README.md) |
34-
35-
## How to Use the App
36-
37-
This section will provide the main steps to use the app, along with a code snippet of the ExecuTorch API.
38-
39-
For loading the app, development, and running on device we recommend Android Studio:
40-
1. Open Android Studio and select "Open an existing Android Studio project" to open examples/demo-apps/android/LlamaDemo.
41-
2. Run the app (^R). This builds and launches the app on the phone.
42-
43-
### Opening the App
44-
45-
Below are the UI features for the app.
46-
47-
Select the settings widget to get started with picking a model, its parameters and any prompts.
48-
<p align="center">
49-
<img src="../_static/img/opening_the_app_details.png" width=800>
50-
</p>
51-
52-
53-
54-
### Select Models and Parameters
55-
56-
Once you've selected the model, tokenizer, and model type you are ready to click on "Load Model" to have the app load the model and go back to the main Chat activity.
57-
<p align="center">
58-
<img src="../_static/img/settings_menu.png" width=300>
59-
</p>
60-
61-
62-
63-
Optional Parameters:
64-
* Temperature: Defaulted to 0, you can adjust the temperature for the model as well. The model will reload upon any adjustments.
65-
* System Prompt: Without any formatting, you can enter in a system prompt. For example, "you are a travel assistant" or "give me a response in a few sentences".
66-
* User Prompt: More for the advanced user, if you would like to manually input a prompt then you can do so by modifying the `{{user prompt}}`. You can also modify the special tokens as well. Once changed then go back to the main Chat activity to send.
67-
68-
> [!TIP]
69-
> Helpful ExecuTorch API in app
70-
71-
```java
72-
// Upon returning to the Main Chat Activity
73-
mModule = new LlamaModule(
74-
ModelUtils.getModelCategory(mCurrentSettingsFields.getModelType()),
75-
modelPath,
76-
tokenizerPath,
77-
temperature);
78-
int loadResult = mModule.load();
1+
```{include} ../../../examples/demo-apps/android/LlamaDemo/README.md
792
```
80-
81-
* `modelCategory`: Indicate whether it’s a text-only or vision model
82-
* `modePath`: path to the .pte file
83-
* `tokenizerPath`: path to the tokenizer .bin file
84-
* `temperature`: model parameter to adjust the randomness of the model’s output
85-
86-
87-
### User Prompt
88-
Once model is successfully loaded then enter any prompt and click the send (i.e. generate) button to send it to the model.
89-
<p align="center">
90-
<img src="../_static/img/load_complete_and_start_prompt.png" width=300>
91-
</p>
92-
93-
You can provide it more follow-up questions as well.
94-
<p align="center">
95-
<img src="../_static/img/chat.png" width=300>
96-
</p>
97-
98-
> [!TIP]
99-
> Helpful ExecuTorch API in app
100-
```java
101-
mModule.generate(prompt,sequence_length, MainActivity.this);
102-
```
103-
* `prompt`: User formatted prompt
104-
* `sequence_length`: Number of tokens to generate in response to a prompt
105-
* `MainActivity.this`: Indicate that the callback functions (OnResult(), OnStats()) are present in this class.
106-
107-
[*LLaVA-1.5: Only for XNNPACK delegate*]
108-
109-
For LLaVA-1.5 implementation, select the exported LLaVA .pte and tokenizer file in the Settings menu and load the model. After this you can send an image from your gallery or take a live picture along with a text prompt to the model.
110-
111-
<p align="center">
112-
<img src="../_static/img/llava_example.png" width=300>
113-
</p>
114-
115-
116-
### Output Generated
117-
To show completion of the follow-up question, here is the complete detailed response from the model.
118-
<p align="center">
119-
<img src="../_static/img/chat_response.png" width=300>
120-
</p>
121-
122-
> [!TIP]
123-
> Helpful ExecuTorch API in app
124-
125-
Ensure you have the following functions in your callback class that you provided in the `mModule.generate()`. For this example, it is `MainActivity.this`.
126-
```java
127-
@Override
128-
public void onResult(String result) {
129-
//...result contains token from response
130-
//.. onResult will continue to be invoked until response is complete
131-
}
132-
133-
@Override
134-
public void onStats(float tps) {
135-
//...tps (tokens per second) stats is provided by framework
136-
}
137-
138-
```
139-
140-
## Reporting Issues
141-
If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new).

examples/apple/coreml/executor_runner/main.mm

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,25 @@ static inline id check_class(id obj, Class cls) {
2424

2525
#define SAFE_CAST(Object, Type) ((Type *)check_class(Object, [Type class]))
2626

27-
using namespace torch::executor;
28-
using torch::executor::util::FileDataLoader;
27+
using executorch::etdump::ETDumpGen;
28+
using executorch::etdump::ETDumpResult;
29+
using executorch::extension::FileDataLoader;
30+
using executorch::runtime::DataLoader;
31+
using executorch::runtime::EValue;
32+
using executorch::runtime::Error;
33+
using executorch::runtime::EventTracer;
34+
using executorch::runtime::EventTracerDebugLogLevel;
35+
using executorch::runtime::FreeableBuffer;
36+
using executorch::runtime::HierarchicalAllocator;
37+
using executorch::runtime::MemoryAllocator;
38+
using executorch::runtime::MemoryManager;
39+
using executorch::runtime::Method;
40+
using executorch::runtime::MethodMeta;
41+
using executorch::runtime::Program;
42+
using executorch::runtime::Result;
43+
using executorch::runtime::Span;
44+
using executorch::runtime::TensorInfo;
45+
using torch::executor::CoreMLBackendDelegate;
2946

3047
static constexpr size_t kRuntimeMemorySize = 16 * 1024U * 1024U; // 16 MB
3148

@@ -294,7 +311,7 @@ bool is_model_analysis_enabled(const Args& args) {
294311
}
295312

296313
void dump_etdump_gen(ETDumpGen *etdump_gen, const Buffer& debug_buffer, const Args& args) {
297-
etdump_result result = (etdump_gen != nullptr) ? etdump_gen->get_etdump_data() : etdump_result{.buf = nullptr, .size = 0};
314+
ETDumpResult result = (etdump_gen != nullptr) ? etdump_gen->get_etdump_data() : ETDumpResult{.buf = nullptr, .size = 0};
298315
if (result.size == 0) {
299316
return;
300317
}
@@ -316,7 +333,7 @@ void dump_etdump_gen(ETDumpGen *etdump_gen, const Buffer& debug_buffer, const Ar
316333

317334
int main(int argc, char * argv[]) {
318335
@autoreleasepool {
319-
runtime_init();
336+
executorch::runtime::runtime_init();
320337

321338
auto args = parse_command_line_args([[NSProcessInfo processInfo] arguments]);
322339
if (args.purge_models_cache) {

examples/apple/mps/executor_runner/mps_executor_runner.mm

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,26 @@
9797
262144, // 256 KB
9898
"Size of the debug buffer in bytes to allocate for intermediate outputs and program outputs logging.");
9999

100-
using namespace torch::executor;
101-
using torch::executor::util::FileDataLoader;
100+
using executorch::etdump::ETDumpGen;
101+
using executorch::etdump::ETDumpResult;
102+
using executorch::extension::BufferCleanup;
103+
using executorch::extension::BufferDataLoader;
104+
using executorch::extension::FileDataLoader;
105+
using executorch::runtime::DataLoader;
106+
using executorch::runtime::EValue;
107+
using executorch::runtime::Error;
108+
using executorch::runtime::EventTracerDebugLogLevel;
109+
using executorch::runtime::FreeableBuffer;
110+
using executorch::runtime::HierarchicalAllocator;
111+
using executorch::runtime::MemoryAllocator;
112+
using executorch::runtime::MemoryManager;
113+
using executorch::runtime::Method;
114+
using executorch::runtime::MethodMeta;
115+
using executorch::runtime::Program;
116+
using executorch::runtime::Result;
117+
using executorch::runtime::Span;
118+
119+
namespace bundled_program = executorch::bundled_program;
102120

103121
int main(int argc, char** argv) {
104122
{
@@ -113,7 +131,7 @@ int main(int argc, char** argv) {
113131
return 1;
114132
}
115133

116-
runtime_init();
134+
executorch::runtime::runtime_init();
117135

118136
gflags::ParseCommandLineFlags(&argc, &argv, true);
119137
if (argc != 1) {
@@ -144,20 +162,20 @@ int main(int argc, char** argv) {
144162
// Find the offset to the embedded Program.
145163
const void* program_data;
146164
size_t program_data_len;
147-
Error status = torch::executor::bundled_program::GetProgramData(
165+
Error status = bundled_program::get_program_data(
148166
const_cast<void*>(file_data->data()),
149167
file_data->size(),
150168
&program_data,
151169
&program_data_len);
152170
ET_CHECK_MSG(
153171
status == Error::Ok,
154-
"GetProgramData() failed on file '%s': 0x%x",
172+
"get_program_data() failed on file '%s': 0x%x",
155173
model_path,
156174
(unsigned int)status);
157175

158176
// Wrap the buffer in a DataLoader.
159177
auto buffer_data_loader =
160-
util::BufferDataLoader(program_data, program_data_len);
178+
BufferDataLoader(program_data, program_data_len);
161179

162180
// Parse the program file. This is immutable, and can also be reused between
163181
// multiple execution invocations across multiple threads.
@@ -239,7 +257,7 @@ HierarchicalAllocator planned_memory(
239257
// be used by a single thread at at time, but it can be reused.
240258
//
241259

242-
torch::executor::ETDumpGen etdump_gen = torch::executor::ETDumpGen();
260+
ETDumpGen etdump_gen;
243261
Result<Method> method =
244262
program->load_method(method_name, &memory_manager, &etdump_gen);
245263
ET_CHECK_MSG(
@@ -263,11 +281,11 @@ HierarchicalAllocator planned_memory(
263281
}
264282

265283
// Prepare the inputs.
266-
std::unique_ptr<util::BufferCleanup> inputs;
284+
std::unique_ptr<BufferCleanup> inputs;
267285
if (FLAGS_bundled_program) {
268286
ET_LOG(Info, "Loading bundled program...");
269287
// Use the inputs embedded in the bundled program.
270-
status = torch::executor::bundled_program::LoadBundledInput(
288+
status = bundled_program::load_bundled_input(
271289
*method,
272290
file_data->data(),
273291
FLAGS_testset_idx);
@@ -278,11 +296,11 @@ HierarchicalAllocator planned_memory(
278296
} else {
279297
ET_LOG(Info, "Loading non-bundled program...\n");
280298
// Use ones-initialized inputs.
281-
auto inputs_result = torch::executor::util::prepare_input_tensors(*method);
299+
auto inputs_result = executorch::extension::prepare_input_tensors(*method);
282300
if (inputs_result.ok()) {
283301
// Will free the inputs when destroyed.
284302
inputs =
285-
std::make_unique<util::BufferCleanup>(std::move(inputs_result.get()));
303+
std::make_unique<BufferCleanup>(std::move(inputs_result.get()));
286304
}
287305
}
288306
ET_LOG(Info, "Inputs prepared.");
@@ -322,14 +340,14 @@ HierarchicalAllocator planned_memory(
322340
status = method->get_outputs(outputs.data(), outputs.size());
323341
ET_CHECK(status == Error::Ok);
324342
// Print the first and last 100 elements of long lists of scalars.
325-
std::cout << torch::executor::util::evalue_edge_items(100);
343+
std::cout << executorch::extension::evalue_edge_items(100);
326344
for (int i = 0; i < outputs.size(); ++i) {
327345
std::cout << "Output " << i << ": " << outputs[i] << std::endl;
328346
}
329347

330348
// Dump the etdump data containing profiling/debugging data to the specified
331349
// file.
332-
etdump_result result = etdump_gen.get_etdump_data();
350+
ETDumpResult result = etdump_gen.get_etdump_data();
333351
if (result.buf != nullptr && result.size > 0) {
334352
FILE* f = fopen(FLAGS_etdump_path.c_str(), "w+");
335353
fwrite((uint8_t*)result.buf, 1, result.size, f);
@@ -362,7 +380,7 @@ HierarchicalAllocator planned_memory(
362380
atol = 1e-01;
363381
rtol = 1e-01;
364382
}
365-
status = torch::executor::bundled_program::VerifyResultWithBundledExpectedOutput(
383+
status = bundled_program::verify_method_outputs(
366384
*method,
367385
file_data->data(),
368386
FLAGS_testset_idx,

0 commit comments

Comments
 (0)