Testing Android and iOS apps on OSS CI using Nova reusable mobile workflow

With the advent of new tools like ExecuTorch, it's now possible to run LLM inference locally on mobile devices using different models such as llama2. While it isn't hard to experiment with this new capability, test it out on your own devices, and see some results, it takes more efforts to automate this process and make it a part of the CI on various PyTorch-family repositories. To solve this challenge, PyTorch Dev Infra team are launching a new Nova reusable mobile workflow to do the heavy lifting for you when it comes to testing your mobile apps.

With this new reusable workflow, devs now can:

Utilize our mobile infrastructure built on top of AWS Device Farm. It offers a wide variety of popular Android and iOS devices from phones to tablets.
Write and run tests remotely on those devices like how you run them locally with your connected phones.
Go beyond the emulator to stress test and benchmark your local LLM inference solutions on actual devices. This helps accurately answer the questions on how many token the solution could process per second and how much memory and power it needs.
Debug hard-to-reproduce issues on devices that you don't have.
Gather the results and share them with others via the familiar GitHub CI UX.

Quick Start

Let's say you are integrating a new ExecuTorch backend which improves llama2 inference performance. You have already run some prompts to confirm that the token per second (TPS) is higher that what's reported in https://github.com/pytorch/executorch/tree/main/examples/models/llama2#performance. The result looks good on your phones, so the next step is to confirm the value on CI. To do that, you will need a few things:

Decide on a group of devices you want to run the test. Take Android as an example, you might want to run it on the recent Samsung Galaxy S2x. Such a group of devices has already been created in our infra under the ARN arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/e59f866a-30aa-4aa1-87b7-4510e5820dfa.
Build the app that you want to test. It would be in the .apk format for Android and .ipa format for iOS.
Prepare the test to run, we are supporting two types of tests at the moment:
1. Instrumented tests on Android https://developer.android.com/training/testing/instrumented-tests
2. and XCTest on iOS https://developer.apple.com/documentation/xctest
Prepare an optional zip archive of any data files you want to copy to the remote devices. This usually contains the exported models themselves.
1. On Android, the archive will be extracted to /sdcard/ directory.
2. On iOS, the files will be on the application sandbox.

Test specification

After having these items ready, the next step is to take a minute a look at the test specification which codify how the test is run. You probably could just use the default test spec that we provides, but knowing the steps would come in handy if you need to customize. Here are some examples:

The Android test spec for ExecuTorch Llama app can be found in https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec.yml. It prepares the required folder /data/local/tmp/llama/ and copy the exported model xnnpack_llama2.pte together with the tokenizer tokenizer.bin there before running the test. $DEVICEFARM_DEVICE_UDID is set by AWS Device Farm to be the target device, and the output will be available in $DEVICEFARM_LOG_DIR/instrument.log.

...
  test:
    commands:
      # By default, the following ADB command is used by Device Farm to run your Instrumentation test.
      # Please refer to Android's documentation for more options on running instrumentation tests with adb:
      # https://developer.android.com/studio/test/command-line#run-tests-with-adb
      - echo "Starting the Instrumentation test"
      - |
        adb -s $DEVICEFARM_DEVICE_UDID shell "am instrument -r -w --no-window-animation \
        $DEVICEFARM_TEST_PACKAGE_NAME/$DEVICEFARM_TEST_PACKAGE_RUNNER 2>&1 || echo \": -1\"" |
        tee $DEVICEFARM_LOG_DIR/instrument.log
...

The generic iOS test spec used by ExecuTorch iOS demo app is at https://ossci-assets.s3.amazonaws.com/default-ios-device-farm-appium-test-spec.yml just invokes xcodebuild test-without-building on the target device.

  test:
    commands:
      - xcodebuild test-without-building -destination id=$DEVICEFARM_DEVICE_UDID -xctestrun $DEVICEFARM_TEST_PACKAGE_PATH/*.xctestrun  -derivedDataPath $DEVICEFARM_LOG_DIR

If you have a custom test spec, you'll need to upload them somewhere downloadable by the workflow.

Example workflows

Let's bring everything together and go through an actual example of https://github.com/pytorch/executorch/blob/main/.github/workflows/android.yml.

name: Android

on:
  ...

jobs:
  # Build all the demo apps 
  test-demo-android:
    name: test-demo-android
    uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
    strategy:
      matrix:
        include:
          - build-tool: buck2
    with:
      runner: linux.12xlarge
      docker-image: executorch-ubuntu-22.04-clang12-android
      submodules: 'true'
      ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
      timeout: 90
      # The apps are built using Nova reusable GH action, so we set the upload-artifact parameter here to make them available as artifacts on GitHub
      upload-artifact: android-apps
      script: |
        set -eux

        ... Building the apps ...
        
        # In Nova workflow, all the files under artifacts-to-be-uploaded folder will be uploaded
        mkdir -p artifacts-to-be-uploaded
        # Copy the app and its test suite to S3
        cp examples/demo-apps/android/LlamaDemo/app/build/outputs/apk/debug/*.apk artifacts-to-be-uploaded/
        cp examples/demo-apps/android/LlamaDemo/app/build/outputs/apk/androidTest/debug/*.apk artifacts-to-be-uploaded/
        # Also copy the share libraries
        cp cmake-out-android/lib/*.a artifacts-to-be-uploaded/

  # Upload the app and its test suite to S3 so that they can be downloaded by the test job
  upload-artifacts:
    needs: test-demo-android
    runs-on: linux.2xlarge
    steps:
      - name: Download the artifacts
        uses: actions/download-artifact@v3
        with:
          # The name here needs to match the name of the upload-artifact parameter
          name: android-apps
          path: ${{ runner.temp }}/artifacts/

      - name: Verify the artifacts
        shell: bash
        working-directory: ${{ runner.temp }}/artifacts/
        run: |
          ls -lah ./

      - name: Upload the artifacts to S3
        uses: seemethere/upload-artifact-s3@v5
        with:
          s3-bucket: gha-artifacts
          s3-prefix: |
            ${{ github.repository }}/${{ github.run_id }}/artifact
          retention-days: 14
          if-no-files-found: ignore
          path: ${{ runner.temp }}/artifacts/

  # Run the test on remote Android devices
  test-llama-app:
    needs: upload-artifacts
    permissions:
      id-token: write
      contents: read
    uses: pytorch/test-infra/.github/workflows/mobile_job.yml@main
    with:
      device-type: android
      runner: ubuntu-latest
      test-infra-ref: ''
      # This is the ARN of ExecuTorch project on AWS
      project-arn: arn:aws:devicefarm:us-west-2:308535385114:project:02a2cf0f-6d9b-45ee-ba1a-a086587469e6
      # This is the custom Android device pool that only includes Samsung Galaxy S2x
      device-pool-arn: arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/e59f866a-30aa-4aa1-87b7-4510e5820dfa
      # Uploaded to S3 from the previous job, the name of the app comes from the project itself
      android-app-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/app-debug.apk
      android-test-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/app-debug-androidTest.apk
      # The test spec can be downloaded from https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec.yml. A link to download the spec also works here.
      test-spec: arn:aws:devicefarm:us-west-2:308535385114:upload:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/abd86868-fa63-467e-a5c7-218194665a77
      # The exported llama2 model and its tokenizer, can be downloaded from https://ossci-assets.s3.amazonaws.com/executorch-android-llama2-7b.zip. A link to download the archive also works here, but keep in mind that some exported models like llama2 7B is few GB in size, so it would be faster to upload it to AWS beforehand and reuse the existing resource if possible
      extra-data: arn:aws:devicefarm:us-west-2:308535385114:upload:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/bd15825b-ddab-4e47-9fef-a9c8935778dd

pytorch/test-infra/.github/workflows/mobile_job.yml is the one doing the heavy lifting here. It can be tweaked with the following parameters

device-type: either android or ios
project-arn: this value is fixed for each project, please reach out to PyTorch Dev Infra if you need to get one. There are 2 available projects atm.
- arn:aws:devicefarm:us-west-2:308535385114:project:b531574a-fb82-40ae-b687-8f0b81341ae0 for PyTorch core.
- and arn:aws:devicefarm:us-west-2:308535385114:project:02a2cf0f-6d9b-45ee-ba1a-a086587469e6 for ExecuTorch.
device-pool-arn: this is the pool of remote devices to run the test. By default, it will select 5 random popular devices for the test. Please also reach out to PyTorch Dev Infra if you need something more specific. Please note that the app itself can limit which devices it can use, for example, having IPHONEOS_DEPLOYMENT_TARGET set to 17 will exclude all devices with lower iOS version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing Android and iOS apps on OSS CI using Nova reusable mobile workflow

Quick Start

Test specification

Example workflows

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally