Skip to content

Qualcomm AI Engine Direct - dynamic shape support #7780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

haowhsu-quic
Copy link
Collaborator

@haowhsu-quic haowhsu-quic commented Jan 21, 2025

Summary

  • dynamic shape related change for QC backend
  • breakage fix
  • test cases

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dynamic_shape -s $DEVICE_SERIAL -m SM8650 -b build-android/

cc @cccclai @winskuo-quic @shewu-quic

Copy link

pytorch-bot bot commented Jan 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7780

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit ba46b4f with merge base 62e49ce (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2025
@haowhsu-quic haowhsu-quic force-pushed the dev_dynamic_shape branch 2 times, most recently from 87e5b41 to 404cdef Compare January 21, 2025 15:54
@haowhsu-quic
Copy link
Collaborator Author

Hi @cccclai, this PR help adopt dynamism in QC backend. Qnn_Tensor_t.v1 is now replaced with Qnn_Tensor_t.v2, all the generated context binaries via QC backend will use v2. For previously generated PTEs and QNN context binary direct path will still be compatible.

Thank you.

@cccclai
Copy link
Contributor

cccclai commented Jan 21, 2025

Thank you! This is definitely a big feature, thank you for adding it.

One question, I remember only some ops enable dynamic shape, like maybe along a specific dimension. How do you error out accordingly in AoT instead of runtime?

@haowhsu-quic
Copy link
Collaborator Author

Yes, there will be a following PR using QnnProperty_hasCapability to query if certain feature is available or not.

@cccclai cccclai added partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm release notes: qualcomm Changes to the Qualcomm backend delegate labels Jan 22, 2025
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Jan 30, 2025

This PR also needs rebase..

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@digantdesai digantdesai added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 3, 2025
summary:
- dynamic shape related change for QC backend
- brekage fix
- test cases
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -77,6 +77,7 @@ table QuantizeParam {
table Tensor {
name: string;
shape: [uint];
dynamic_dims: [ubyte];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a heads up, this year we'd need to figure out the story for BC/FC, and afterwards we'd need to make the flatbuffers bc compatible (like adding to the end instead of inserting)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, will pay attention to this. I think we also have plan to phase out qcir by replacing online_prepare with QNN DLC, which will mitigate the maintaining effort. And it could be fully deprecated once multi-method RFC comes out.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the dynamic shape support!

@@ -134,6 +148,32 @@ def push(self, inputs=None, input_list=None, files=None):
for file_name in input_files:
self._adb(["push", file_name, self.workspace])

# dynamic shape related
with tempfile.TemporaryDirectory() as tmp_dir:
if self.expected_input_shape and self.expected_output_shape:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm could you elaborate a bit more regarding what do we compare with for the expected_input_shape and expected_output_shape?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the output tensor shape extracted from method in runtime will have the maximum value at dynamic dimension. expected_io_shape is used to get the expected size actually calculated by QNN.
For expected_io_dtype, since we're using quantized graph IO with BuildQuantIo pass in to_executorch. Looks like the output node meta from delegated LoweredModule could not be propagated, I have to introduce data type information for getting the correct number of bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh..can you remind the reason to use BuildQuantIo? It has been too long and I might miss some context

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In traditional ET QNN path, the graph IO will be inserted with QDQ for graph IO to stay in floating point. However, Q / DQ can only work in static shape now.
I think HTP team is extending the coverage of dynamic ops, I will clean up current workaround in the future if we start to have more supported ops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q / DQ can only work in static shape now.

Hmm, does it mean dynamic ops only support fp ops, or do I misunderstand?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only few ops in 16bit fixed point can support dynamic shape. Since Q & DQ are not supported, I have to strip them by using BuildQuantIo pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, you may consider this pass so we don't hack too much

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the information, will try to have a follow up PR to clean it.

@cccclai cccclai merged commit 2e204b9 into pytorch:main Feb 11, 2025
45 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm release notes: qualcomm Changes to the Qualcomm backend delegate triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants