Switch Apple benchmark workflow to use the generic ET benchmark iOS app #5212

huydhn · 2024-09-10T00:34:21Z

This requires @shoumikhin change in #5208, so I will rebase and test it out again after #5208 lands

Testing

https://github.com/pytorch/executorch/actions/runs/10787058020

pytorch-bot · 2024-09-10T00:34:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5212

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 31cf557 with merge base f9da675 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-09-10T00:43:36Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

guangy10 · 2024-09-10T03:21:43Z

build/build_apple_llm_demo.sh

-    CODE_SIGNING_ALLOWED=No
-fi
+ARTIFACTS_DIR_NAME="$1"
+APP_PATH="extension/apple/Benchmark/Benchmark"


@shoumikhin what tokenizer is the app built with? w/ both bpe and tiktoken?

guangy10 · 2024-09-10T03:25:00Z

.github/workflows/apple-perf.yml

        ${CONDA_RUN} --no-capture-output \
-          build/build_apple_llm_demo.sh ${{ matrix.tokenizer }} ${ARTIFACTS_DIR_NAME}
+          build/build_apple_llm_demo.sh ${ARTIFACTS_DIR_NAME}

  upload-ios-apps:
    needs: build-llm-demo


Should we rename previous step to build-benchmark-app? Also should we update ios-ipa-archive and ios-xctestrun-zip to point to the new benchmark app

guangy10

Left few comments, otherwise lgtm

facebook-github-bot · 2024-09-11T01:25:35Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-11T16:06:49Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

huydhn · 2024-09-11T16:09:53Z

The benchmark is running on device now, but it crashes once before being retrying successfully (weird) https://github.com/pytorch/executorch/actions/runs/10804559720/job/29970780023:

Here is the test output:

[DEVICEFARM] ########### Entering phase test ###########
 
[DeviceFarm] xcodebuild test-without-building -destination id=$DEVICEFARM_DEVICE_UDID -xctestrun $DEVICEFARM_TEST_PACKAGE_PATH/*.xctestrun -derivedDataPath $DEVICEFARM_LOG_DIR
Command line invocation:
    /Applications/Xcode_15.app/Contents/Developer/usr/bin/xcodebuild test-without-building -destination id=00008120-000424DC1122201E -xctestrun /tmp/devicefarm-workspace/execution-tvdc_j_e/test-package-ruiiiicu/Benchmark_Tests_iphoneos17.5-arm64.xctestrun -derivedDataPath /tmp/devicefarm-workspace/execution-tvdc_j_e/logs-3gvb1hak

User defaults from command line:
    IDEDerivedDataPathOverride = /tmp/devicefarm-workspace/execution-tvdc_j_e/logs-3gvb1hak
    IDEPackageSupportUseBuiltinSCM = YES

2024-09-10 21:07:33.547 xcodebuild[1200:8254]  DVTDevice: Error locating DeviceSupport directory using Optional("arm64e") or Optional("arm64e"): nilError
Test Suite 'All tests' started at 2024-09-10 21:07:34.514.
Test Suite 'Tests.xctest' started at 2024-09-10 21:07:34.514.
Test Suite 'Tests' started at 2024-09-10 21:07:34.514.
Test Case '-[Tests test_forward_aatp_data_llama2]' started.
2024-09-10 21:07:35.608 xcodebuild[1200:8342]  DVTDevice: Error locating DeviceSupport directory using Optional("arm64e") or Optional("arm64e"): nilError

Restarting after unexpected exit, crash, or test timeout in -[Tests test_forward_aatp_data_llama2]; summary will include totals from previous launches.

Test Suite 'Selected tests' started at 2024-09-10 21:07:36.076.
Test Suite 'Tests.xctest' started at 2024-09-10 21:07:36.077.
Test Suite 'Tests' started at 2024-09-10 21:07:36.077.
Test Case '-[Tests test_load_aatp_data_llama2]' started.
/Users/runner/work/executorch/executorch/pytorch/executorch/extension/apple/Benchmark/Tests/Tests.mm:58: Test Case '-[Tests test_load_aatp_data_llama2]' measured [Memory Peak Physical, kB] average: 122225.936, relative standard deviation: 0.000%, values: [122225.936000, 122225.936000, 122225.936000, 122225.936000, 122225.936000], performanceMetricID:com.apple.dt.XCTMetric_Memory.physical_peak, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.000, maxStandardDeviation: 0.000
/Users/runner/work/executorch/executorch/pytorch/executorch/extension/apple/Benchmark/Tests/Tests.mm:58: Test Case '-[Tests test_load_aatp_data_llama2]' measured [Memory Physical, kB] average: 0.000, relative standard deviation: 0.000%, values: [0.000000, 0.000000, 0.000000, 0.000000, 0.000000], performanceMetricID:com.apple.dt.XCTMetric_Memory.physical, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.000, maxStandardDeviation: 0.000
/Users/runner/work/executorch/executorch/pytorch/executorch/extension/apple/Benchmark/Tests/Tests.mm:58: Test Case '-[Tests test_load_aatp_data_llama2]' measured [Clock Monotonic Time, s] average: 0.000, relative standard deviation: 26.339%, values: [0.000001, 0.000001, 0.000001, 0.000001, 0.000001], performanceMetricID:com.apple.dt.XCTMetric_Clock.time.monotonic, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.000, maxStandardDeviation: 0.000
Test Case '-[Tests test_load_aatp_data_llama2]' passed (0.152 seconds).
Test Suite 'Tests' failed at 2024-09-10 21:07:36.229.
	 Executed 2 tests, with 1 failure (0 unexpected) in 0.152 (0.152) seconds
Test Suite 'Tests.xctest' failed at 2024-09-10 21:07:36.229.
	 Executed 2 tests, with 1 failure (0 unexpected) in 0.152 (0.152) seconds
Test Suite 'Selected tests' failed at 2024-09-10 21:07:36.229.
	 Executed 2 tests, with 1 failure (0 unexpected) in 0.152 (0.153) seconds
2024-09-10 21:07:42.661 xcodebuild[1200:8040] [MT] IDETestOperationsObserverDebug: 17.357 elapsed -- Testing started completed.
2024-09-10 21:07:42.661 xcodebuild[1200:8040] [MT] IDETestOperationsObserverDebug: 0.000 sec, +0.000 sec -- start
2024-09-10 21:07:42.661 xcodebuild[1200:8040] [MT] IDETestOperationsObserverDebug: 17.357 sec, +17.357 sec -- end

Test session results, code coverage, and logs:
	/tmp/devicefarm-workspace/execution-tvdc_j_e/logs-3gvb1hak/Logs/Test/Test-Benchmark-2024.09.10_21-07-25--0700.xcresult

Failing tests:
	-[Tests test_forward_aatp_data_llama2]

** TEST EXECUTE FAILED **

@shoumikhin has some leads on the cause of the crash:

et_view.cpp:88] In function et_view(), assert failed: self.numel() == out.numel()

So, I guess we need to figure this out and fix the issue before this can be landed.

guangy10 · 2024-09-11T17:40:55Z

Is this PR ready to test other models w/ different delegates. If the crash is llama specific, we can verify the app with nongenai models in parallel.

Scheduled one: https://github.com/pytorch/executorch/actions/runs/10817524101

facebook-github-bot · 2024-09-11T19:54:30Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

huydhn · 2024-09-11T20:42:16Z

D62511672 works https://github.com/pytorch/executorch/actions/runs/10818420721/job/30015121518 :)

…pp (#5212) Summary: This requires shoumikhin change in #5208, so I will rebase and test it out again after #5208 lands ### Testing https://github.com/pytorch/executorch/actions/runs/10787058020 Reviewed By: shoumikhin, guangy10 Differential Revision: D62415898 Pulled By: huydhn

facebook-github-bot · 2024-09-11T21:50:05Z

This pull request was exported from Phabricator. Differential Revision: D62415898

…pp (#5212) Summary: This requires shoumikhin change in #5208, so I will rebase and test it out again after #5208 lands ### Testing https://github.com/pytorch/executorch/actions/runs/10787058020 Reviewed By: shoumikhin, guangy10 Differential Revision: D62415898 Pulled By: huydhn

facebook-github-bot · 2024-09-11T23:46:52Z

This pull request was exported from Phabricator. Differential Revision: D62415898

facebook-github-bot · 2024-09-12T01:15:45Z

@huydhn merged this pull request in a4be79f.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2024

huydhn requested review from shoumikhin and guangy10 September 10, 2024 00:42

huydhn marked this pull request as ready for review September 10, 2024 00:43

guangy10 reviewed Sep 10, 2024

View reviewed changes

guangy10 approved these changes Sep 10, 2024

View reviewed changes

shoumikhin approved these changes Sep 11, 2024

View reviewed changes

facebook-github-bot force-pushed the switch-to-generic-apple-benchmark-app branch from ecf2ede to ef50595 Compare September 11, 2024 21:49

facebook-github-bot added the fb-exported label Sep 11, 2024

facebook-github-bot force-pushed the switch-to-generic-apple-benchmark-app branch from ef50595 to 31cf557 Compare September 11, 2024 23:46

facebook-github-bot closed this in a4be79f Sep 12, 2024

facebook-github-bot added the Merged label Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch Apple benchmark workflow to use the generic ET benchmark iOS app #5212

Switch Apple benchmark workflow to use the generic ET benchmark iOS app #5212

Uh oh!

huydhn commented Sep 10, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 10, 2024

Uh oh!

guangy10 Sep 10, 2024

Uh oh!

guangy10 Sep 10, 2024

Uh oh!

guangy10 left a comment

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

huydhn commented Sep 11, 2024

Uh oh!

guangy10 commented Sep 11, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

huydhn commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 12, 2024

Uh oh!

Uh oh!

Switch Apple benchmark workflow to use the generic ET benchmark iOS app #5212

Switch Apple benchmark workflow to use the generic ET benchmark iOS app #5212

Uh oh!

Conversation

huydhn commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

pytorch-bot bot commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5212

✅ No Failures

Uh oh!

facebook-github-bot commented Sep 10, 2024

Uh oh!

guangy10 Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

guangy10 Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

guangy10 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

huydhn commented Sep 11, 2024

Uh oh!

guangy10 commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

huydhn commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 11, 2024

Uh oh!

facebook-github-bot commented Sep 12, 2024

Uh oh!

Uh oh!

huydhn commented Sep 10, 2024 •

edited

Loading

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading

guangy10 commented Sep 11, 2024 •

edited

Loading