Add memory requirement and clarify image format for llava example

larryliu0820 · web-flow · commit 4ea6839b1591 · 2025-04-14T13:47:58.000-07:00
Differential Revision: D72988591 Pull Request resolved: #10153
diff --git a/examples/models/llava/README.md b/examples/models/llava/README.md
@@ -26,17 +26,26 @@ model) for general-purpose visual and language understanding, achieving
 impressive chat capabilities mimicking spirits of the cutting edge multimodal
 models and setting a high bar for accuracy on Science QA.
 
-## Instructions
+## Instructions to run Llava on Android/iOS
 
 First you need to generate a .PTE file for the model, along with input image,
 and other artifacts. Then you need either a C++ runner, or Android or iOS
 application to test things out on device.
 
+### Host machine requirements
+
+The biggest requirement is to have a host machine with at least 32GiB memory, preferably 64GiB.
+
+The model weights is 15GiB, and the other memory usage at export stage (`export_llava`) is around 10GiB. So you need at least 25GiB memory to run the export script.
+
+
 ### Generate ExecuTorch .PTE and other artifacts
 
 Run the following command to generate `llava.pte`, `tokenizer.bin` and an image
 tensor (serialized in TorchScript) `image.pt`.
 
+> **Warning**: The C++ runner `llava_main` binary cannot process raw image inputs such as JPEG, PNG, or BMP files directly. You must convert these images to a `.pt` file format using the `examples/models/llava/image_util.py` script before using them with `llava_main`.
+
 Prerequisite: run `install_executorch.sh` to install ExecuTorch and run
 `examples/models/llava/install_requirements.sh` to install dependencies.
 
@@ -69,6 +78,13 @@ cmake-out/examples/models/llava/llava_main
 
 ### Build Mobile Apps
 
+#### Device Requirements
+
+To run the Android/iOS apps, you need a device with at least 12GiB memory.
+
+- iPhone 13 Pro or above
+- Samsung Galaxy S23 or above
+
 #### Android
 
 We can run LLAVA using the LLAMA Demo Apps. Please refer to [this