-
Notifications
You must be signed in to change notification settings - Fork 608
Qualcomm AI Engine Direct - dynamic shape support #7780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7780
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit ba46b4f with merge base 62e49ce ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
87e5b41
to
404cdef
Compare
Hi @cccclai, this PR help adopt dynamism in QC backend. Thank you. |
Thank you! This is definitely a big feature, thank you for adding it. One question, I remember only some ops enable dynamic shape, like maybe along a specific dimension. How do you error out accordingly in AoT instead of runtime? |
Yes, there will be a following PR using QnnProperty_hasCapability to query if certain feature is available or not. |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This PR also needs rebase.. |
404cdef
to
5588113
Compare
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
summary: - dynamic shape related change for QC backend - brekage fix - test cases
5588113
to
ba46b4f
Compare
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@@ -77,6 +77,7 @@ table QuantizeParam { | |||
table Tensor { | |||
name: string; | |||
shape: [uint]; | |||
dynamic_dims: [ubyte]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a heads up, this year we'd need to figure out the story for BC/FC, and afterwards we'd need to make the flatbuffers bc compatible (like adding to the end instead of inserting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, will pay attention to this. I think we also have plan to phase out qcir
by replacing online_prepare
with QNN DLC, which will mitigate the maintaining effort. And it could be fully deprecated once multi-method RFC comes out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding the dynamic shape support!
@@ -134,6 +148,32 @@ def push(self, inputs=None, input_list=None, files=None): | |||
for file_name in input_files: | |||
self._adb(["push", file_name, self.workspace]) | |||
|
|||
# dynamic shape related | |||
with tempfile.TemporaryDirectory() as tmp_dir: | |||
if self.expected_input_shape and self.expected_output_shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm could you elaborate a bit more regarding what do we compare with for the expected_input_shape
and expected_output_shape
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the output tensor shape extracted from method in runtime will have the maximum value at dynamic dimension. expected_io_shape
is used to get the expected size actually calculated by QNN.
For expected_io_dtype
, since we're using quantized graph IO with BuildQuantIo
pass in to_executorch
. Looks like the output node meta from delegated LoweredModule
could not be propagated, I have to introduce data type information for getting the correct number of bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh..can you remind the reason to use BuildQuantIo
? It has been too long and I might miss some context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In traditional ET QNN path, the graph IO will be inserted with QDQ for graph IO to stay in floating point. However, Q / DQ can only work in static shape now.
I think HTP team is extending the coverage of dynamic ops, I will clean up current workaround in the future if we start to have more supported ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q / DQ can only work in static shape now.
Hmm, does it mean dynamic ops only support fp ops, or do I misunderstand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only few ops in 16bit fixed point can support dynamic shape. Since Q & DQ are not supported, I have to strip them by using BuildQuantIo
pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, you may consider this pass so we don't hack too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the information, will try to have a follow up PR to clean it.
Summary
Test plan
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_dynamic_shape -s $DEVICE_SERIAL -m SM8650 -b build-android/
cc @cccclai @winskuo-quic @shewu-quic