Qualcomm AI Engine Direct - Requantization Mechanism Implementation #2823

winskuo-quic · 2024-04-03T02:00:08Z

Summary:

Implement requantization so mix quantization ( e.g., 8bit quant + 16 bit quant) can be properly delegated to QNN.
Reusing test_qnn_backend_view_permute_matmul unit test to check mix quantization is working as expected.
Added etdump logic back to qnn_executor_runner that was deleted unintentionally during this PR: a531ca5#diff-f3647de74042ac9a417e2d4000a6f2db00c22c89fd028e9433d3c79ffb7d56f6
Refactor common arguments in VIT.

pytorch-bot · 2024-04-03T02:00:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2823

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 337fc1c with merge base 81a7e88 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

winskuo-quic · 2024-04-03T02:03:08Z

Hi @cccclai,
This PR is used to support mix quantization and fixing minor issues that's causing UT failures.
Please have a look on this PR.
Thanks

cccclai · 2024-04-03T06:06:58Z

Hey would you mind explaining a bit what does requantization mean and how it's different than mix precision quantization? Does it mean we'll do 8bit quantization first then 16bit (maybe skip the 8bit quantized ops)?

winskuo-quic · 2024-04-03T06:28:36Z

Hey would you mind explaining a bit what does requantization mean and how it's different than mix precision quantization? Does it mean we'll do 8bit quantization first then 16bit (maybe skip the 8bit quantized ops)?

Hi Chen,
Thanks for the response.
Yes.
Requantization should be the same as mix precision quantization.
There are cases where some ops have output of 8 bit, however, the input only accepts 16bit. In this case, we will requantize the model so that the input can properly accept a 16bit input from the op user. This scenario where we have different quant config is mix quantization.

facebook-github-bot · 2024-04-03T18:32:17Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-04-03T19:55:56Z

Hey would you mind explaining a bit what does requantization mean and how it's different than mix precision quantization? Does it mean we'll do 8bit quantization first then 16bit (maybe skip the 8bit quantized ops)?

Hi Chen, Thanks for the response. Yes. Requantization should be the same as mix precision quantization. There are cases where some ops have output of 8 bit, however, the input only accepts 16bit. In this case, we will requantize the model so that the input can properly accept a 16bit input from the op user. This scenario where we have different quant config is mix quantization.

Thank you for the explanation! Is there an example somewhere to show how to do it, maybe a test case?

winskuo-quic · 2024-04-03T23:45:56Z

Hey would you mind explaining a bit what does requantization mean and how it's different than mix precision quantization? Does it mean we'll do 8bit quantization first then 16bit (maybe skip the 8bit quantized ops)?

Hi Chen, Thanks for the response. Yes. Requantization should be the same as mix precision quantization. There are cases where some ops have output of 8 bit, however, the input only accepts 16bit. In this case, we will requantize the model so that the input can properly accept a 16bit input from the op user. This scenario where we have different quant config is mix quantization.

Thank you for the explanation! Is there an example somewhere to show how to do it, maybe a test case?

Yes.
Please refer to the unit test, TestQNNQuantizedModel.test_qnn_backend_view_permute_matmul, under test_qnn_delegate.py.
https://github.com/pytorch/executorch/pull/2823/files#diff-3a76d2f6f72394bf64270a625d31c16560eb7d3b855297352aae423a27f6f59fR984
It demonstrates how mix quantization works.

facebook-github-bot · 2024-04-04T06:53:55Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-04-05T17:49:55Z

Hey do you mind rebasing the change? The CI was broken on the base commit and should be fixed now.

winskuo-quic · 2024-04-06T12:19:01Z

Hey do you mind rebasing the change? The CI was broken on the base commit and should be fixed now.

Thanks for the friendly reminder. I have just rebased.

facebook-github-bot · 2024-04-07T21:45:48Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-04-07T22:31:54Z

@cccclai merged this pull request in 61ad48d.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2024

kirklandsign approved these changes Apr 4, 2024

View reviewed changes

cccclai added the partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm label Apr 4, 2024

Requantization Mechanism Implementation

337fc1c

winskuo-quic force-pushed the dev1/winskuo/refactor_requantize branch from 19f4bcb to 337fc1c Compare April 6, 2024 12:16

facebook-github-bot closed this in 61ad48d Apr 7, 2024

facebook-github-bot added the Merged label Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Requantization Mechanism Implementation #2823

Qualcomm AI Engine Direct - Requantization Mechanism Implementation #2823

Uh oh!

winskuo-quic commented Apr 3, 2024

Uh oh!

pytorch-bot bot commented Apr 3, 2024 •

edited

Loading

Uh oh!

winskuo-quic commented Apr 3, 2024

Uh oh!

cccclai commented Apr 3, 2024

Uh oh!

winskuo-quic commented Apr 3, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 3, 2024

Uh oh!

cccclai commented Apr 3, 2024

Uh oh!

winskuo-quic commented Apr 3, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 4, 2024

Uh oh!

cccclai commented Apr 5, 2024

Uh oh!

winskuo-quic commented Apr 6, 2024

Uh oh!

facebook-github-bot commented Apr 7, 2024

Uh oh!

facebook-github-bot commented Apr 7, 2024

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - Requantization Mechanism Implementation #2823

Qualcomm AI Engine Direct - Requantization Mechanism Implementation #2823

Uh oh!

Conversation

winskuo-quic commented Apr 3, 2024

Uh oh!

pytorch-bot bot commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2823

✅ No Failures

Uh oh!

winskuo-quic commented Apr 3, 2024

Uh oh!

cccclai commented Apr 3, 2024

Uh oh!

winskuo-quic commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 3, 2024

Uh oh!

cccclai commented Apr 3, 2024

Uh oh!

winskuo-quic commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 4, 2024

Uh oh!

cccclai commented Apr 5, 2024

Uh oh!

winskuo-quic commented Apr 6, 2024

Uh oh!

facebook-github-bot commented Apr 7, 2024

Uh oh!

facebook-github-bot commented Apr 7, 2024

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 3, 2024 •

edited

Loading

winskuo-quic commented Apr 3, 2024 •

edited

Loading

winskuo-quic commented Apr 3, 2024 •

edited

Loading