Skip to content

Qualcomm AI Engine Direct - Implement sdk profiler and intergrate with Qnn profiler #2227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

shewu-quic
Copy link
Collaborator

@shewu-quic shewu-quic commented Mar 4, 2024

Summary:

  • Implement Qnn Profiler for htp backend
    For now, only support kProfileDetailed to profile the performance of each operator with cycle unit.
    Follow up item: Add more qnn profile item
  • Intergrated with sdk profiler
  • Add the argument etdump_path to dump etdump which analyzes the contents by INSPECTOR in qnn_executorch_runner
  • Add unit test to test profile
  • Add export example to generate etrecord

Reproduce commands:

python3 backends/qualcomm/tests/test_qnn_delegate.py TestQNNFloatingPointOperator.test_qnn_backend_conv2d -b /local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/build_android -s $LANAI1 -H $TWL1 -m SM8650 -r /local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch -a /local3/mnt/workspace/shewu/executorch/unit_test -i /local3/mnt/workspace/shewu/executorch/models/data/ImageNet-Mini/images --enable_profile
# Pull the EtDump
adb pull  /data/local/tmp/qnn_executorch_test/etdump.etdp .
# Run inspector to produce the following table
python3 -m sdk.inspector.inspector_cli --etdump_path etdump.etdp --etrecord_path etrecord.bin
╒════╤════════════════════╤══════════════════════════════════════════════╤═════════════╤═════════════╤═════════════╤═════════════╤═════════════╤═════════════╤════════════╤═══════════════════╤═════════════════════════╕
│    │ event_block_name   │ event_name                                   │    p10 (ms) │    p50 (ms) │    p90 (ms) │    avg (ms) │    min (ms) │    max (ms) │ op_types   │ is_delegated_op   │ delegate_backend_name   │
╞════╪════════════════════╪══════════════════════════════════════════════╪═════════════╪═════════════╪═════════════╪═════════════╪═════════════╪═════════════╪════════════╪═══════════════════╪═════════════════════════╡
│  0 │ Default            │ Method::init                                 │   123.898   │   123.898   │   123.898   │   123.898   │   123.898   │   123.898   │ []         │ False             │                         │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│  1 │ Default            │ Program::load_method                         │   123.926   │   123.926   │   123.926   │   123.926   │   123.926   │   123.926   │ []         │ False             │                         │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│  2 │ Execute            │ Input OpId_2 (cycles)                        │  4018       │  4018       │  4018       │  4018       │  4018       │  4018       │ []         │ True              │ QnnBackend              │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│  3 │ Execute            │ aten_permute_copy_default:OpId_17 (cycles)   │ 16765       │ 16765       │ 16765       │ 16765       │ 16765       │ 16765       │ []         │ True              │ QnnBackend              │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│  4 │ Execute            │ aten_convolution_default:OpId_23 (cycles)    │ 12768       │ 12768       │ 12768       │ 12768       │ 12768       │ 12768       │ []         │ True              │ QnnBackend              │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│  5 │ Execute            │ aten_convolution_default_1:OpId_30 (cycles)  │  9439       │  9439       │  9439       │  9439       │  9439       │  9439       │ []         │ True              │ QnnBackend              │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│  6 │ Execute            │ aten_permute_copy_default_1:OpId_33 (cycles) │  2551       │  2551       │  2551       │  2551       │  2551       │  2551       │ []         │ True              │ QnnBackend              │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│  7 │ Execute            │ OpId_0 (cycles)                              │     0       │     0       │     0       │     0       │     0       │     0       │ []         │ True              │ QnnBackend              │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│  8 │ Execute            │ Output OpId_3 (cycles)                       │  3054       │  3054       │  3054       │  3054       │  3054       │  3054       │ []         │ True              │ QnnBackend              │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│  9 │ Execute            │ DELEGATE_CALL                                │     1.17151 │     1.17151 │     1.17151 │     1.17151 │     1.17151 │     1.17151 │ []         │ False             │ QnnBackend              │
├────┼────────────────────┼──────────────────────────────────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼────────────┼───────────────────┼─────────────────────────┤
│ 10 │ Execute            │ Method::execute                              │     1.18318 │     1.18318 │     1.18318 │     1.18318 │     1.18318 │     1.18318 │ []         │ False             │                         │
╘════╧════════════════════╧══════════════════════════════════════════════╧═════════════╧═════════════╧═════════════╧═════════════╧═════════════╧═════════════╧════════════╧═══════════════════╧═════════════════════════╛


Copy link

pytorch-bot bot commented Mar 4, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2227

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 2db56a1 with merge base 588c391 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 4, 2024
@Olivia-liu
Copy link
Contributor

Thanks for making and sharing the draft diff! This is great progress. I'll get back to you soon with answers to the questions you asked.

@shewu-quic
Copy link
Collaborator Author

Just a little note for the two questions,

  1. Cross-Compiling
    The current SDK build flow presents a bit challenges when it comes to cross-compilation scenarios. As the below, it requires some workarounds to achieve successful builds.
    https://github.com/pytorch/executorch/blob/3e67470ffb948e47d784deb84452d905c5f36728/sdk/CMakeLists.txt#L111 

For ExternalProject_Add, it builds on x86 toolchain but does not install to _host_build folder. 
When add subdirectory for third-party/flatcc, link flatcc, and install flatcc library, it will build with android toolchain and override lib and bin in third-party/flatcc. 
They will confuse the cmake and fail to build due to the same lib and bin under the third-party/flatcc for x86 and android toolchains.

  1. The unit of profile item
     Because the unit of some profiling items is cycle in HTP. May I know could we set different units for profiling items with this api?

@shewu-quic shewu-quic force-pushed the dev/hutton/hook_up_official_profiler branch from 3e67470 to f1bc714 Compare March 11, 2024 01:18
@shewu-quic shewu-quic marked this pull request as ready for review March 11, 2024 01:19
@Olivia-liu
Copy link
Contributor

Sorry for taking a while to get back to you.

  1. Cross-Compiling

Thanks for finding a workaround for it. @tarun292 will verify if the workaround breaks anything else of the sdk, if not, I think we can merge this PR.

  1. The unit of profile item

This is something that's on our TODO list but unfortunately there's no way to set different units at this moment.

…th Qnn profiler

Summary:
- Implement Qnn Profiler for htp backend
    For now, only support kProfileDetailed to profile the performance of each operator with cycle unit.
    Follow up item: Add more qnn profile item
- Intergrated with sdk profiler
- Add the argument etdump_path to dump etdump which analyzes the contents by INSPECTOR  in qnn_executorch_runner
- Add unit test to test profile
- Add export example to generate etrecord
@shewu-quic shewu-quic force-pushed the dev/hutton/hook_up_official_profiler branch from f1bc714 to 9980080 Compare March 18, 2024 02:17
@shewu-quic
Copy link
Collaborator Author

Thanks for your response. I am looking forward for this feature which set the unit for profile item.
I have rebased my branch.
Could you please help to have a look on the PR, thank you :)

Copy link
Contributor

@Olivia-liu Olivia-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, could you please make sure you add the comment Dave suggested in the cmake file? Thanks for finding the workaround by the way, we think it's OK to land, but should be well documented. And thanks a lot for doing the integration and sorry for taking a while to review it

@facebook-github-bot
Copy link
Contributor

@Olivia-liu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

It is a short-term workaround for flatcc.

Co-authored-by: Dave Bort <[email protected]>
@shewu-quic
Copy link
Collaborator Author

Hey, could you please make sure you add the comment Dave suggested in the cmake file? Thanks for finding the workaround by the way, we think it's OK to land, but should be well documented. And thanks a lot for doing the integration and sorry for taking a while to review it

I added it in the cmake file.
Thanks for your effort. It is very nice feature to profile more detailed information.

@facebook-github-bot
Copy link
Contributor

@Olivia-liu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Olivia-liu merged this pull request in 0b12daf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants