Skip to content

Qualcomm AI Engine Direct - Requantization Mechanism Implementation #2823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

winskuo-quic
Copy link
Collaborator

Summary:

  • Implement requantization so mix quantization ( e.g., 8bit quant + 16 bit quant) can be properly delegated to QNN.
  • Reusing test_qnn_backend_view_permute_matmul unit test to check mix quantization is working as expected.
  • Added etdump logic back to qnn_executor_runner that was deleted unintentionally during this PR: a531ca5#diff-f3647de74042ac9a417e2d4000a6f2db00c22c89fd028e9433d3c79ffb7d56f6
  • Refactor common arguments in VIT.

Copy link

pytorch-bot bot commented Apr 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2823

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 337fc1c with merge base 81a7e88 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2024
@winskuo-quic
Copy link
Collaborator Author

Hi @cccclai,
This PR is used to support mix quantization and fixing minor issues that's causing UT failures.
Please have a look on this PR.
Thanks

@cccclai
Copy link
Contributor

cccclai commented Apr 3, 2024

Hey would you mind explaining a bit what does requantization mean and how it's different than mix precision quantization? Does it mean we'll do 8bit quantization first then 16bit (maybe skip the 8bit quantized ops)?

@winskuo-quic
Copy link
Collaborator Author

winskuo-quic commented Apr 3, 2024

Hey would you mind explaining a bit what does requantization mean and how it's different than mix precision quantization? Does it mean we'll do 8bit quantization first then 16bit (maybe skip the 8bit quantized ops)?

Hi Chen,
Thanks for the response.
Yes.
Requantization should be the same as mix precision quantization.
There are cases where some ops have output of 8 bit, however, the input only accepts 16bit. In this case, we will requantize the model so that the input can properly accept a 16bit input from the op user. This scenario where we have different quant config is mix quantization.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Apr 3, 2024

Hey would you mind explaining a bit what does requantization mean and how it's different than mix precision quantization? Does it mean we'll do 8bit quantization first then 16bit (maybe skip the 8bit quantized ops)?

Hi Chen, Thanks for the response. Yes. Requantization should be the same as mix precision quantization. There are cases where some ops have output of 8 bit, however, the input only accepts 16bit. In this case, we will requantize the model so that the input can properly accept a 16bit input from the op user. This scenario where we have different quant config is mix quantization.

Thank you for the explanation! Is there an example somewhere to show how to do it, maybe a test case?

@winskuo-quic
Copy link
Collaborator Author

winskuo-quic commented Apr 3, 2024

Hey would you mind explaining a bit what does requantization mean and how it's different than mix precision quantization? Does it mean we'll do 8bit quantization first then 16bit (maybe skip the 8bit quantized ops)?

Hi Chen, Thanks for the response. Yes. Requantization should be the same as mix precision quantization. There are cases where some ops have output of 8 bit, however, the input only accepts 16bit. In this case, we will requantize the model so that the input can properly accept a 16bit input from the op user. This scenario where we have different quant config is mix quantization.

Thank you for the explanation! Is there an example somewhere to show how to do it, maybe a test case?

Yes.
Please refer to the unit test, TestQNNQuantizedModel.test_qnn_backend_view_permute_matmul, under test_qnn_delegate.py.
https://github.com/pytorch/executorch/pull/2823/files#diff-3a76d2f6f72394bf64270a625d31c16560eb7d3b855297352aae423a27f6f59fR984
It demonstrates how mix quantization works.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai cccclai added the partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm label Apr 4, 2024
@cccclai
Copy link
Contributor

cccclai commented Apr 5, 2024

Hey do you mind rebasing the change? The CI was broken on the base commit and should be fixed now.

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/refactor_requantize branch from 19f4bcb to 337fc1c Compare April 6, 2024 12:16
@winskuo-quic
Copy link
Collaborator Author

Hey do you mind rebasing the change? The CI was broken on the base commit and should be fixed now.

Thanks for the friendly reminder. I have just rebased.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cccclai merged this pull request in 61ad48d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants