|
| 1 | +# Executorch Benchmark App for Apple Platforms |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +The **Benchmark App** is a tool designed to help developers measure the performance of PyTorch models on Apple devices using the Executorch runtime. |
| 6 | +It provides a flexible framework for dynamically generating and running performance tests on your models, allowing you to assess metrics such as load times, inference speeds, memory usage, and more. |
| 7 | + |
| 8 | +<p align="center"> |
| 9 | +<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_benchmark_app.png" alt="Benchmark App" style="width:800px"> |
| 10 | +</p> |
| 11 | + |
| 12 | +## Prerequisites |
| 13 | + |
| 14 | +- [Xcode](https://apps.apple.com/us/app/xcode/id497799835?mt=12/) 15.0 or later with command-line tools if not already installed (`xcode-select --install`). |
| 15 | +- [CMake](https://cmake.org/download/) 3.19 or later |
| 16 | + - Download and open the macOS `.dmg` installer and move the CMake app to `/Applications` folder. |
| 17 | + - Install CMake command line tools: `sudo /Applications/CMake.app/Contents/bin/cmake-gui --install` |
| 18 | +- A development provisioning profile with the [`increased-memory-limit`](https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_developer_kernel_increased-memory-limit) entitlement if targeting iOS devices. |
| 19 | + |
| 20 | +## Setting Up the App |
| 21 | + |
| 22 | +### Get the Code |
| 23 | + |
| 24 | +To get started, clone the Executorch repository and cd into the source code directory: |
| 25 | + |
| 26 | +```bash |
| 27 | +git clone https://github.com/pytorch/executorch.git --depth 1 --recurse-submodules --shallow-submodules |
| 28 | +cd executorch |
| 29 | +``` |
| 30 | + |
| 31 | +This command performs a shallow clone to speed up the process. |
| 32 | + |
| 33 | +### Set Up the Frameworks |
| 34 | + |
| 35 | +The Benchmark App relies on prebuilt Executorch frameworks. |
| 36 | +You have two options: |
| 37 | + |
| 38 | +<details> |
| 39 | +<summary>Option 1: Download Prebuilt Frameworks</summary> |
| 40 | +<br/> |
| 41 | + |
| 42 | +Run the provided script to download the prebuilt frameworks: |
| 43 | + |
| 44 | +```bash |
| 45 | +./extension/apple/Benchmark/Frameworks/download_frameworks.sh |
| 46 | +``` |
| 47 | +</details> |
| 48 | + |
| 49 | +<details> |
| 50 | +<summary>Option 2: Build Frameworks Locally</summary> |
| 51 | +<br/> |
| 52 | + |
| 53 | +Alternatively, you can build the frameworks yourself by following the [guide](https://pytorch.org/executorch/main/apple-runtime.html#local-build). |
| 54 | +</details> |
| 55 | + |
| 56 | +Once the frameworks are downloaded or built, verify that the `Frameworks` directory contains the necessary `.xcframework` files: |
| 57 | + |
| 58 | +```bash |
| 59 | +ls extension/apple/Benchmark/Frameworks |
| 60 | +``` |
| 61 | + |
| 62 | +You should see: |
| 63 | + |
| 64 | +``` |
| 65 | +backend_coreml.xcframework |
| 66 | +backend_mps.xcframework |
| 67 | +backend_xnnpack.xcframework |
| 68 | +executorch.xcframework |
| 69 | +kernels_custom.xcframework |
| 70 | +kernels_optimized.xcframework |
| 71 | +kernels_portable.xcframework |
| 72 | +kernels_quantized.xcframework |
| 73 | +``` |
| 74 | + |
| 75 | +## Adding Models and Resources |
| 76 | + |
| 77 | +Place your exported model files (`.pte`) and any other resources (e.g., `tokenizer.bin`) into the `extension/apple/Benchmark/Resources` directory: |
| 78 | + |
| 79 | +```bash |
| 80 | +cp <path/to/my_model.pte> <path/to/llama3.pte> <path/to/tokenizer.bin> extension/apple/Benchmark/Resources |
| 81 | +``` |
| 82 | + |
| 83 | +Optionally, check that the files are there: |
| 84 | + |
| 85 | +```bash |
| 86 | +ls extension/apple/Benchmark/Resources |
| 87 | +``` |
| 88 | + |
| 89 | +For this example you should see: |
| 90 | + |
| 91 | +``` |
| 92 | +llama3.pte |
| 93 | +my_model.pte |
| 94 | +tokenizer.bin |
| 95 | +``` |
| 96 | + |
| 97 | +The app automatically bundles these resources and makes them available to the test suite. |
| 98 | + |
| 99 | +## Running the Tests |
| 100 | + |
| 101 | +### Build and Run the Tests |
| 102 | + |
| 103 | +Open the Benchmark Xcode project: |
| 104 | + |
| 105 | +```bash |
| 106 | +open extension/apple/Benchmark/Benchmark.xcodeproj |
| 107 | +``` |
| 108 | + |
| 109 | +Select the destination device or simulator and press `Command+U`, or click `Product` > `Test` in the menu to run the test suite. |
| 110 | + |
| 111 | +<p align="center"> |
| 112 | +<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_benchmark_app_tests.png" alt="Benchmark App Tests" style="width:800px"> |
| 113 | +</p> |
| 114 | + |
| 115 | +### Configure Signing (if necessary) |
| 116 | + |
| 117 | +If you plan to run the app on a physical device, you may need to set up code signing: |
| 118 | + |
| 119 | +1. Open the **Project Navigator** by pressing `Command+1` and click on the `Benchmark` root of the file tree. |
| 120 | +2. Under Targets section go to the **Signing & Capabilities** tab of both the `App` and `Tests` targets. |
| 121 | +3. Select your development team. Alternatively, manually pick a provisioning profile that supports the increased memory limit entitlement and modify the bundle identifier if needed. |
| 122 | + |
| 123 | +<p align="center"> |
| 124 | +<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_benchmark_app_signing.png" alt="Benchmark App Signing" style="width:800px"> |
| 125 | +</p> |
| 126 | + |
| 127 | +## Viewing Test Results and Metrics |
| 128 | + |
| 129 | +After running the tests, you can view the results in Xcode: |
| 130 | + |
| 131 | +1. Open the **Test Report Navigator** by pressing `Command+9`. |
| 132 | +2. Select the most recent test run. |
| 133 | +3. You'll see a list of tests that ran, along with their status (passed or failed). |
| 134 | +4. To view metrics for a specific test: |
| 135 | + - Double-click on the test in the list. |
| 136 | + - Switch to the **Metrics** tab to see detailed performance data. |
| 137 | + |
| 138 | +**Note**: The tests use `XCTMeasureOptions` to run each test multiple times (usually five) to obtain average performance metrics. |
| 139 | + |
| 140 | +<p align="center"> |
| 141 | +<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_benchmark_app_test_load.png" alt="Benchmark App Test Load" style="width:800px"> |
| 142 | +</p> |
| 143 | +<p align="center"> |
| 144 | +<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_benchmark_app_test_forward.png" alt="Benchmark App Test Forward" style="width:800px"> |
| 145 | +</p> |
| 146 | +<p align="center"> |
| 147 | +<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_benchmark_app_test_generate.png" alt="Benchmark App Test Generate" style="width:800px"> |
| 148 | +</p> |
| 149 | + |
| 150 | +## Understanding the Test Suite |
| 151 | + |
| 152 | +The Benchmark App uses a dynamic test generation framework to create tests based on the resources you provide. |
| 153 | + |
| 154 | +### Dynamic Test Generation |
| 155 | + |
| 156 | +The key components are: |
| 157 | + |
| 158 | +- **`DynamicTestCase`**: A subclass of `XCTestCase` that allows for the dynamic creation of test methods. |
| 159 | +- **`ResourceTestCase`**: Builds upon `DynamicTestCase` to generate tests based on resources that match specified criteria. |
| 160 | + |
| 161 | +### How It Works |
| 162 | + |
| 163 | +1. **Define Directories and Predicates**: Override the `directories` and `predicates` methods to specify where to look for resources and how to match them. |
| 164 | + |
| 165 | +2. **Generate Resource Combinations**: The framework searches the specified `directories` for files matching the `predicates`, generating all possible combinations. |
| 166 | + |
| 167 | +3. **Create Dynamic Tests**: For each combination of resources, it calls `dynamicTestsForResources`, where you define the tests to run. |
| 168 | + |
| 169 | +4. **Test Naming**: Test names are dynamically formed using the format: |
| 170 | + |
| 171 | + ``` |
| 172 | + test_<TestName>_<Resource1>_<Resource2>_..._<OS>_<Version>_<DeviceModel> |
| 173 | + ``` |
| 174 | + |
| 175 | + This ensures that each test is uniquely identifiable based on the resources and device. |
| 176 | + |
| 177 | +### Example: Generic Model Tests |
| 178 | + |
| 179 | +Here's how you might create a test to measure model load and inference times: |
| 180 | + |
| 181 | +```objective-c |
| 182 | +@interface GenericTests : ResourceTestCase |
| 183 | +@end |
| 184 | + |
| 185 | +@implementation GenericTests |
| 186 | + |
| 187 | ++ (NSArray<NSString *> *)directories { |
| 188 | + return @[@"Resources"]; |
| 189 | +} |
| 190 | + |
| 191 | ++ (NSDictionary<NSString *, BOOL (^)(NSString *)> *)predicates { |
| 192 | + return @{ |
| 193 | + @"model" : ^BOOL(NSString *filename) { |
| 194 | + return [filename hasSuffix:@".pte"]; |
| 195 | + }, |
| 196 | + }; |
| 197 | +} |
| 198 | + |
| 199 | ++ (NSDictionary<NSString *, void (^)(XCTestCase *)> *)dynamicTestsForResources:(NSDictionary<NSString *, NSString *> *)resources { |
| 200 | + NSString *modelPath = resources[@"model"]; |
| 201 | + return @{ |
| 202 | + @"load" : ^(XCTestCase *testCase) { |
| 203 | + [testCase measureWithMetrics:@[[XCTClockMetric new], [XCTMemoryMetric new]] block:^{ |
| 204 | + XCTAssertEqual(Module(modelPath.UTF8String).load_forward(), Error::Ok); |
| 205 | + }]; |
| 206 | + }, |
| 207 | + @"forward" : ^(XCTestCase *testCase) { |
| 208 | + // Set up and measure the forward pass... |
| 209 | + }, |
| 210 | + }; |
| 211 | +} |
| 212 | + |
| 213 | +@end |
| 214 | +``` |
| 215 | +
|
| 216 | +In this example: |
| 217 | +
|
| 218 | +- We look for `.pte` files in the `Resources` directory. |
| 219 | +- For each model found, we create two tests: `load` and `forward`. |
| 220 | +- The tests measure the time and memory usage of loading and running the model. |
| 221 | +
|
| 222 | +## Extending the Test Suite |
| 223 | +
|
| 224 | +You can create custom tests by subclassing `ResourceTestCase` and overriding the necessary methods. |
| 225 | +
|
| 226 | +### Steps to Create Custom Tests |
| 227 | +
|
| 228 | +1. **Subclass `ResourceTestCase`**: |
| 229 | +
|
| 230 | + ```objective-c |
| 231 | + @interface MyCustomTests : ResourceTestCase |
| 232 | + @end |
| 233 | + ``` |
| 234 | + |
| 235 | +2. **Override `directories` and `predicates`**: |
| 236 | + |
| 237 | + Specify where to look for resources and how to match them. |
| 238 | + |
| 239 | + ```objective-c |
| 240 | + + (NSArray<NSString *> *)directories { |
| 241 | + return @[@"Resources"]; |
| 242 | + } |
| 243 | + |
| 244 | + + (NSDictionary<NSString *, BOOL (^)(NSString *)> *)predicates { |
| 245 | + return @{ |
| 246 | + @"model" : ^BOOL(NSString *filename) { |
| 247 | + return [filename hasSuffix:@".pte"]; |
| 248 | + }, |
| 249 | + @"config" : ^BOOL(NSString *filename) { |
| 250 | + return [filename isEqualToString:@"config.json"]; |
| 251 | + }, |
| 252 | + }; |
| 253 | + } |
| 254 | + ``` |
| 255 | +
|
| 256 | +3. **Implement `dynamicTestsForResources`**: |
| 257 | +
|
| 258 | + Define the tests to run for each combination of resources. |
| 259 | +
|
| 260 | + ```objective-c |
| 261 | + + (NSDictionary<NSString *, void (^)(XCTestCase *)> *)dynamicTestsForResources:(NSDictionary<NSString *, NSString *> *)resources { |
| 262 | + NSString *modelPath = resources[@"model"]; |
| 263 | + NSString *configPath = resources[@"config"]; |
| 264 | + return @{ |
| 265 | + @"customTest" : ^(XCTestCase *testCase) { |
| 266 | + // Implement your test logic here. |
| 267 | + }, |
| 268 | + }; |
| 269 | + } |
| 270 | + ``` |
| 271 | + |
| 272 | +4. **Add the Test Class to the Test Target**: |
| 273 | + |
| 274 | + Ensure your new test class is included in the test target in Xcode. |
| 275 | + |
| 276 | +### Example: LLaMA Token Generation Test |
| 277 | + |
| 278 | +An example of a more advanced test is measuring the tokens per second during text generation with the LLaMA model. |
| 279 | + |
| 280 | +```objective-c |
| 281 | +@interface LLaMATests : ResourceTestCase |
| 282 | +@end |
| 283 | + |
| 284 | +@implementation LLaMATests |
| 285 | + |
| 286 | ++ (NSArray<NSString *> *)directories { |
| 287 | + return @[@"Resources"]; |
| 288 | +} |
| 289 | + |
| 290 | ++ (NSDictionary<NSString *, BOOL (^)(NSString *)> *)predicates { |
| 291 | + return @{ |
| 292 | + @"model" : ^BOOL(NSString *filename) { |
| 293 | + return [filename hasSuffix:@".pte"] && [filename containsString:@"llama"]; |
| 294 | + }, |
| 295 | + @"tokenizer" : ^BOOL(NSString *filename) { |
| 296 | + return [filename isEqualToString:@"tokenizer.bin"]; |
| 297 | + }, |
| 298 | + }; |
| 299 | +} |
| 300 | + |
| 301 | ++ (NSDictionary<NSString *, void (^)(XCTestCase *)> *)dynamicTestsForResources:(NSDictionary<NSString *, NSString *> *)resources { |
| 302 | + NSString *modelPath = resources[@"model"]; |
| 303 | + NSString *tokenizerPath = resources[@"tokenizer"]; |
| 304 | + return @{ |
| 305 | + @"generate" : ^(XCTestCase *testCase) { |
| 306 | + // Implement the token generation test... |
| 307 | + }, |
| 308 | + }; |
| 309 | +} |
| 310 | + |
| 311 | +@end |
| 312 | +``` |
| 313 | +
|
| 314 | +In this test: |
| 315 | +
|
| 316 | +- We look for LLaMA model files and a `tokenizer.bin`. |
| 317 | +- We measure tokens per second and memory usage during text generation. |
| 318 | +
|
| 319 | +## Measuring Performance |
| 320 | +
|
| 321 | +The Benchmark App leverages Apple's performance testing APIs to measure metrics such as execution time and memory usage. |
| 322 | +
|
| 323 | +- **Measurement Options**: By default, each test is run five times to calculate average metrics. |
| 324 | +- **Custom Metrics**: You can define custom metrics by implementing the `XCTMetric` protocol. |
| 325 | +- **Available Metrics**: |
| 326 | + - `XCTClockMetric`: Measures wall-clock time. |
| 327 | + - `XCTMemoryMetric`: Measures memory usage. |
| 328 | + - **Custom Metrics**: For example, the LLaMA test includes a `TokensPerSecondMetric`. |
| 329 | +
|
| 330 | +## Running Tests from the Command Line |
| 331 | +
|
| 332 | +You can also run the tests using `xcodebuild`: |
| 333 | +
|
| 334 | +```bash |
| 335 | +# Run on an iOS Simulator |
| 336 | +xcodebuild test -project extension/apple/Benchmark/Benchmark.xcodeproj \ |
| 337 | +-scheme Benchmark \ |
| 338 | +-destination 'platform=iOS Simulator,name=<SimulatorName>' \ |
| 339 | +-testPlan Tests |
| 340 | +
|
| 341 | +# Run on a physical iOS device |
| 342 | +xcodebuild test -project extension/apple/Benchmark/Benchmark.xcodeproj \ |
| 343 | +-scheme Benchmark \ |
| 344 | +-destination 'platform=iOS,name=<DeviceName>' \ |
| 345 | +-testPlan Tests \ |
| 346 | +-allowProvisioningUpdates DEVELOPMENT_TEAM=<YourTeamID> |
| 347 | +``` |
| 348 | + |
| 349 | +Replace `<SimulatorName>`, `<DeviceName>`, and `<YourTeamID>` with your simulator/device name and Apple development team ID. |
| 350 | + |
| 351 | +## macOS |
| 352 | + |
| 353 | +The app can be built and run on macOS, just add it as the destination platform. |
| 354 | + |
| 355 | +<p align="center"> |
| 356 | +<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_benchmark_app_macos.png" alt="Benchmark App macOS" style="width:700px"> |
| 357 | +</p> |
| 358 | + |
| 359 | +Also, set up app signing to run locally. |
| 360 | + |
| 361 | +<p align="center"> |
| 362 | +<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_benchmark_app_macos_signing.png" alt="Benchmark App macOS Signing" style="width:800px"> |
| 363 | +</p> |
| 364 | + |
| 365 | +## Conclusion |
| 366 | + |
| 367 | +The Executorch Benchmark App provides a flexible and powerful framework for testing and measuring the performance of PyTorch models on Apple devices. By leveraging dynamic test generation, you can easily add your models and resources to assess their performance metrics. Whether you're optimizing existing models or developing new ones, this tool can help you gain valuable insights into their runtime behavior. |
0 commit comments