pytorch
diff --git a/‎build/build_apple_frameworks.sh
Lines changed: 1 addition & 1 deletion b/‎build/build_apple_frameworks.sh
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app.jpg renamed to ‎docs/source/_static/img/ios_demo_app.jpg b/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app.jpg renamed to ‎docs/source/_static/img/ios_demo_app.jpg
diff --git a/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_choosing_package.png renamed to ‎docs/source/_static/img/ios_demo_app_choosing_package.png b/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_choosing_package.png renamed to ‎docs/source/_static/img/ios_demo_app_choosing_package.png
diff --git a/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_llava.jpg renamed to ‎docs/source/_static/img/ios_demo_app_llava.jpg b/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_llava.jpg renamed to ‎docs/source/_static/img/ios_demo_app_llava.jpg
diff --git a/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_mps.jpg renamed to ‎docs/source/_static/img/ios_demo_app_mps.jpg b/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_mps.jpg renamed to ‎docs/source/_static/img/ios_demo_app_mps.jpg
diff --git a/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_swift_pm.png renamed to ‎docs/source/_static/img/ios_demo_app_swift_pm.png b/‎examples/demo-apps/apple_ios/LLaMA/docs/screenshots/ios_demo_app_swift_pm.png renamed to ‎docs/source/_static/img/ios_demo_app_swift_pm.png
diff --git a/‎docs/source/getting-started-setup.md
Lines changed: 35 additions & 1 deletion b/‎docs/source/getting-started-setup.md
Lines changed: 35 additions & 1 deletion
diff --git a/‎docs/source/index.rst
Lines changed: 3 additions & 0 deletions b/‎docs/source/index.rst
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/source/llm/llama-demo-android.md
Lines changed: 1 addition & 140 deletions b/‎docs/source/llm/llama-demo-android.md
Lines changed: 1 addition & 140 deletions
diff --git a/‎examples/apple/coreml/executor_runner/main.mm
Lines changed: 21 additions & 4 deletions b/‎examples/apple/coreml/executor_runner/main.mm
Lines changed: 21 additions & 4 deletions
diff --git a/‎examples/apple/mps/executor_runner/mps_executor_runner.mm
Lines changed: 32 additions & 14 deletions b/‎examples/apple/mps/executor_runner/mps_executor_runner.mm
Lines changed: 32 additions & 14 deletions
@@ -57,7 +57,7 @@ libcustom_ops.a,\
 
 FRAMEWORK_KERNELS_OPTIMIZED="kernels_optimized:\
 liboptimized_kernels.a,\
-liboptimized_ops_lib.a,\
+liboptimized_native_cpu_ops_lib.a,\
 :"
 
 FRAMEWORK_KERNELS_PORTABLE="kernels_portable:\
 
@@ -110,6 +110,23 @@ Alternatively, if you would like to experiment with ExecuTorch quickly and easil
    ```
 After setting up your environment, you are ready to convert your PyTorch programs
 to ExecuTorch.
+
+> **_NOTE:_**  Cleaning the build system
+>
+> When fetching a new version of the upstream repo (via `git fetch` or `git
+> pull`) it is a good idea to clean the old build artifacts. The build system
+> does not currently adapt well to changes in build dependencies.
+>
+> You should also update and pull the submodules again, in case their versions
+> have changed.
+>
+> ```bash
+> # From the root of the executorch repo:
+> rm -rf cmake-out pip-out
+> git submodule sync
+> git submodule update --init
+> ```
+
 ## Create an ExecuTorch program
 
 After setting up your environment, you are ready to convert your PyTorch programs
@@ -169,13 +186,30 @@ For now, let's use [`executor_runner`](https://github.com/pytorch/executorch/blo
 ### Build Tooling Setup
 The ExecuTorch repo uses CMake to build its C++ code. Here, we'll configure it to build the `executor_runner` tool to run it on our desktop OS.
   ```bash
-  # Clean and configure the CMake build system. Compiled programs will appear in the executorch/cmake-out directory we create here.
+  # Clean and configure the CMake build system. Compiled programs will
+  # appear in the executorch/cmake-out directory we create here.
   (rm -rf cmake-out && mkdir cmake-out && cd cmake-out && cmake ..)
 
   # Build the executor_runner target
   cmake --build cmake-out --target executor_runner -j9
   ```
 
+> **_NOTE:_**  Cleaning the build system
+>
+> When fetching a new version of the upstream repo (via `git fetch` or `git
+> pull`) it is a good idea to clean the old build artifacts. The build system
+> does not currently adapt well to changes in build dependencies.
+>
+> You should also update and pull the submodules again, in case their versions
+> have changed.
+>
+> ```bash
+> # From the root of the executorch repo:
+> rm -rf cmake-out pip-out
+> git submodule sync
+> git submodule update --init
+> ```
+
 ### Run Your Program
 
 Now that we've exported a program and built the runtime, let's execute it!
 
@@ -117,6 +117,9 @@ Topics in this section will help you get started with ExecuTorch.
    :hidden:
 
    llm/getting-started
+   llm/llama-demo-android
+   llm/build-run-llama3-qualcomm-ai-engine-direct-backend
+   llm/llama-demo-ios
 
 .. toctree::
    :glob:
 
@@ -1,141 +1,2 @@
-# ExecuTorch Llama Android Demo App
-
-We’re excited to share that the newly revamped Android demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an Android demo app and how to exercise the many features ExecuTorch and Llama models have to offer.
-
-This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case.
-
-Please dive in and start exploring our demo app today! We look forward to any feedback and are excited to see your innovative ideas.
-
-
-## Key Concepts
-From this demo app, you will learn many key concepts such as:
-* How to prepare Llama models, build the ExecuTorch library, and model inferencing across delegates
-* Expose the ExecuTorch library via JNI layer
-* Familiarity with current ExecuTorch app-facing capabilities
-
-The goal is for you to see the type of support ExecuTorch provides and feel comfortable with leveraging it for your use cases.
-
-## Supporting Models
-As a whole, the models that this app supports are (varies by delegate):
-* Llama 3.1 8B
-* Llama 3 8B
-* Llama 2 7B
-* LLaVA-1.5 vision model (only XNNPACK)
-
-
-## Building the APK
-First it’s important to note that currently ExecuTorch provides support across 3 delegates. Once you identify the delegate of your choice, select the README link to get a complete end-to-end instructions for environment set-up to exporting the models to build ExecuTorch libraries and apps to run on device:
-
-| Delegate      | Resource |
-| ------------- | ------------- |
-| XNNPACK (CPU-based library)  | [link](docs/delegates/xnnpack_README.md) |
-| QNN (Qualcomm AI Accelerators)  | [link](docs/delegates/qualcomm_README.md) |
-| MediaTek (MediaTek AI Accelerators)  | [link](docs/delegates/mediatek_README.md)  |
-
-## How to Use the App
-
-This section will provide the main steps to use the app, along with a code snippet of the ExecuTorch API.
-
-For loading the app, development, and running on device we recommend Android Studio:
-1. Open Android Studio and select "Open an existing Android Studio project" to open examples/demo-apps/android/LlamaDemo.
-2. Run the app (^R). This builds and launches the app on the phone.
-
-### Opening the App
-
-Below are the UI features for the app.
-
-Select the settings widget to get started with picking a model, its parameters and any prompts.
-<p align="center">
-<img src="../_static/img/opening_the_app_details.png" width=800>
-</p>
-
-
-
-### Select Models and Parameters
-
-Once you've selected the model, tokenizer, and model type you are ready to click on "Load Model" to have the app load the model and go back to the main Chat activity.
-<p align="center">
-      <img src="../_static/img/settings_menu.png" width=300>
-</p>
-
-
-
-Optional Parameters:
-* Temperature: Defaulted to 0, you can adjust the temperature for the model as well. The model will reload upon any adjustments.
-* System Prompt: Without any formatting, you can enter in a system prompt. For example, "you are a travel assistant" or "give me a response in a few sentences".
-* User Prompt: More for the advanced user, if you would like to manually input a prompt then you can do so by modifying the `{{user prompt}}`. You can also modify the special tokens as well. Once changed then go back to the main Chat activity to send.
-
-> [!TIP]
-> Helpful ExecuTorch API in app
-
-```java
-// Upon returning to the Main Chat Activity
-mModule = new LlamaModule(
-            ModelUtils.getModelCategory(mCurrentSettingsFields.getModelType()),
-            modelPath,
-            tokenizerPath,
-            temperature);
-int loadResult = mModule.load();
+```{include} ../../../examples/demo-apps/android/LlamaDemo/README.md
 ```
-
-* `modelCategory`: Indicate whether it’s a text-only or vision model
-* `modePath`: path to the .pte file
-* `tokenizerPath`: path to the tokenizer .bin file
-* `temperature`: model parameter to adjust the randomness of the model’s output
-
-
-### User Prompt
-Once model is successfully loaded then enter any prompt and click the send (i.e. generate) button to send it to the model.
-<p align="center">
-<img src="../_static/img/load_complete_and_start_prompt.png" width=300>
-</p>
-
-You can provide it more follow-up questions as well.
-<p align="center">
-<img src="../_static/img/chat.png" width=300>
-</p>
-
-> [!TIP]
-> Helpful ExecuTorch API in app
-```java
-mModule.generate(prompt,sequence_length, MainActivity.this);
-```
-* `prompt`: User formatted prompt
-* `sequence_length`: Number of tokens to generate in response to a prompt
-* `MainActivity.this`: Indicate that the callback functions (OnResult(), OnStats()) are present in this class.
-
-[*LLaVA-1.5: Only for XNNPACK delegate*]
-
-For LLaVA-1.5 implementation, select the exported LLaVA .pte and tokenizer file in the Settings menu and load the model. After this you can send an image from your gallery or take a live picture along with a text prompt to the model.
-
-<p align="center">
-<img src="../_static/img/llava_example.png" width=300>
-</p>
-
-
-### Output Generated
-To show completion of the follow-up question, here is the complete detailed response from the model.
-<p align="center">
-<img src="../_static/img/chat_response.png" width=300>
-</p>
-
-> [!TIP]
-> Helpful ExecuTorch API in app
-
-Ensure you have the following functions in your callback class that you provided in the `mModule.generate()`. For this example, it is `MainActivity.this`.
-```java
-  @Override
-  public void onResult(String result) {
-    //...result contains token from response
-    //.. onResult will continue to be invoked until response is complete
-  }
-
-  @Override
-  public void onStats(float tps) {
-    //...tps (tokens per second) stats is provided by framework
-  }
-
-```
-
-## Reporting Issues
-If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new).
@@ -24,8 +24,25 @@ static inline id check_class(id obj, Class cls) {
 
 #define SAFE_CAST(Object, Type) ((Type *)check_class(Object, [Type class]))
 
-using namespace torch::executor;
-using torch::executor::util::FileDataLoader;
+using executorch::etdump::ETDumpGen;
+using executorch::etdump::ETDumpResult;
+using executorch::extension::FileDataLoader;
+using executorch::runtime::DataLoader;
+using executorch::runtime::EValue;
+using executorch::runtime::Error;
+using executorch::runtime::EventTracer;
+using executorch::runtime::EventTracerDebugLogLevel;
+using executorch::runtime::FreeableBuffer;
+using executorch::runtime::HierarchicalAllocator;
+using executorch::runtime::MemoryAllocator;
+using executorch::runtime::MemoryManager;
+using executorch::runtime::Method;
+using executorch::runtime::MethodMeta;
+using executorch::runtime::Program;
+using executorch::runtime::Result;
+using executorch::runtime::Span;
+using executorch::runtime::TensorInfo;
+using torch::executor::CoreMLBackendDelegate;
 
 static constexpr size_t kRuntimeMemorySize = 16 * 1024U * 1024U; // 16 MB
 
@@ -294,7 +311,7 @@ bool is_model_analysis_enabled(const Args& args) {
 }
 
 void dump_etdump_gen(ETDumpGen *etdump_gen, const Buffer& debug_buffer, const Args& args) {
-    etdump_result result = (etdump_gen != nullptr) ? etdump_gen->get_etdump_data() : etdump_result{.buf = nullptr, .size = 0};
+    ETDumpResult result = (etdump_gen != nullptr) ? etdump_gen->get_etdump_data() : ETDumpResult{.buf = nullptr, .size = 0};
     if (result.size == 0) {
         return;
     }
@@ -316,7 +333,7 @@ void dump_etdump_gen(ETDumpGen *etdump_gen, const Buffer& debug_buffer, const Ar
 
 int main(int argc, char * argv[]) {
     @autoreleasepool {
-        runtime_init();
+        executorch::runtime::runtime_init();
 
         auto args = parse_command_line_args([[NSProcessInfo processInfo] arguments]);
         if (args.purge_models_cache) {
 
@@ -97,8 +97,26 @@
     262144, // 256 KB
     "Size of the debug buffer in bytes to allocate for intermediate outputs and program outputs logging.");
 
-using namespace torch::executor;
-using torch::executor::util::FileDataLoader;
+using executorch::etdump::ETDumpGen;
+using executorch::etdump::ETDumpResult;
+using executorch::extension::BufferCleanup;
+using executorch::extension::BufferDataLoader;
+using executorch::extension::FileDataLoader;
+using executorch::runtime::DataLoader;
+using executorch::runtime::EValue;
+using executorch::runtime::Error;
+using executorch::runtime::EventTracerDebugLogLevel;
+using executorch::runtime::FreeableBuffer;
+using executorch::runtime::HierarchicalAllocator;
+using executorch::runtime::MemoryAllocator;
+using executorch::runtime::MemoryManager;
+using executorch::runtime::Method;
+using executorch::runtime::MethodMeta;
+using executorch::runtime::Program;
+using executorch::runtime::Result;
+using executorch::runtime::Span;
+
+namespace bundled_program = executorch::bundled_program;
 
 int main(int argc, char** argv) {
   {
@@ -113,7 +131,7 @@ int main(int argc, char** argv) {
     return 1;
   }
 
-  runtime_init();
+  executorch::runtime::runtime_init();
 
   gflags::ParseCommandLineFlags(&argc, &argv, true);
   if (argc != 1) {
@@ -144,20 +162,20 @@ int main(int argc, char** argv) {
   // Find the offset to the embedded Program.
   const void* program_data;
   size_t program_data_len;
-  Error status = torch::executor::bundled_program::GetProgramData(
+  Error status = bundled_program::get_program_data(
       const_cast<void*>(file_data->data()),
       file_data->size(),
       &program_data,
       &program_data_len);
   ET_CHECK_MSG(
       status == Error::Ok,
-      "GetProgramData() failed on file '%s': 0x%x",
+      "get_program_data() failed on file '%s': 0x%x",
       model_path,
       (unsigned int)status);
 
   // Wrap the buffer in a DataLoader.
   auto buffer_data_loader =
-      util::BufferDataLoader(program_data, program_data_len);
+      BufferDataLoader(program_data, program_data_len);
 
   // Parse the program file. This is immutable, and can also be reused between
   // multiple execution invocations across multiple threads.
@@ -239,7 +257,7 @@ HierarchicalAllocator planned_memory(
   // be used by a single thread at at time, but it can be reused.
   //
 
-  torch::executor::ETDumpGen etdump_gen = torch::executor::ETDumpGen();
+  ETDumpGen etdump_gen;
   Result<Method> method =
       program->load_method(method_name, &memory_manager, &etdump_gen);
   ET_CHECK_MSG(
@@ -263,11 +281,11 @@ HierarchicalAllocator planned_memory(
   }
 
   // Prepare the inputs.
-  std::unique_ptr<util::BufferCleanup> inputs;
+  std::unique_ptr<BufferCleanup> inputs;
   if (FLAGS_bundled_program) {
     ET_LOG(Info, "Loading bundled program...");
     // Use the inputs embedded in the bundled program.
-    status = torch::executor::bundled_program::LoadBundledInput(
+    status = bundled_program::load_bundled_input(
         *method,
         file_data->data(),
         FLAGS_testset_idx);
@@ -278,11 +296,11 @@ HierarchicalAllocator planned_memory(
   } else {
     ET_LOG(Info, "Loading non-bundled program...\n");
     // Use ones-initialized inputs.
-    auto inputs_result = torch::executor::util::prepare_input_tensors(*method);
+    auto inputs_result = executorch::extension::prepare_input_tensors(*method);
     if (inputs_result.ok()) {
       // Will free the inputs when destroyed.
       inputs =
-          std::make_unique<util::BufferCleanup>(std::move(inputs_result.get()));
+          std::make_unique<BufferCleanup>(std::move(inputs_result.get()));
     }
   }
   ET_LOG(Info, "Inputs prepared.");
@@ -322,14 +340,14 @@ HierarchicalAllocator planned_memory(
   status = method->get_outputs(outputs.data(), outputs.size());
   ET_CHECK(status == Error::Ok);
   // Print the first and last 100 elements of long lists of scalars.
-  std::cout << torch::executor::util::evalue_edge_items(100);
+  std::cout << executorch::extension::evalue_edge_items(100);
   for (int i = 0; i < outputs.size(); ++i) {
     std::cout << "Output " << i << ": " << outputs[i] << std::endl;
   }
 
   // Dump the etdump data containing profiling/debugging data to the specified
   // file.
-  etdump_result result = etdump_gen.get_etdump_data();
+  ETDumpResult result = etdump_gen.get_etdump_data();
   if (result.buf != nullptr && result.size > 0) {
     FILE* f = fopen(FLAGS_etdump_path.c_str(), "w+");
     fwrite((uint8_t*)result.buf, 1, result.size, f);
@@ -362,7 +380,7 @@ HierarchicalAllocator planned_memory(
       atol = 1e-01;
       rtol = 1e-01;
     }
-    status = torch::executor::bundled_program::VerifyResultWithBundledExpectedOutput(
+    status = bundled_program::verify_method_outputs(
         *method,
         file_data->data(),
         FLAGS_testset_idx,