Qualcomm AI Engine Direct - Quantizer refine for qat #6513

chunit-quic · 2024-10-28T03:58:00Z

Reorginize qualcomm/quantizer
Split quantizer/utils.py to
-- qconfig
-- annotators
-- observers directory
Change coresponding callees
Rename get_default_Nbit_qnn_ptq_config to get_NaNw_qnn_ptq_config
Add 16a4w conv test* (It is not compared with original model)

pytorch-bot · 2024-10-28T03:58:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6513

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 086cca4 with merge base 41a57e6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

chunit-quic · 2024-10-28T04:11:04Z

Hi @cccclai,

We implement the draft PR for

Refine API of QnnQauntizer
Add more qat quant confings
Add 16a4w qat conv test case

We have another local branch which is testing QAT mobilenet v2 based on it. Yet we encounter some problems.
During QAT process, the the accuracy/loss seems to be reasonable if we only quantize conv ops. Yet if we quantize any other op, accuracy/loss drop significantly. Is there any known issue for this? Or maybe it just results from some missetting form us. (We can make a patch or update the local branch here if you are insterested in it)

Thank you. :)

cccclai · 2024-10-28T15:54:23Z

@navsud can you review this PR?

navsud

Overall LGTM.
cc: @cccclai for final approval.

navsud · 2024-10-28T16:22:00Z

backends/qualcomm/quantizer/observers/param_observer.py

+from torch.ao.quantization.observer import UniformQuantizationObserverBase
+
+
+class ParamObserver(UniformQuantizationObserverBase):


This looks great and is generic enough to be added into torch/ao/quantization/observer.py. If possible, please move it there as a follow-up.

How about renaming it to PerChannelParamObserver() or PerChannelWeightObserver?

It's great to see you like it! Change the name and add a TODO for it

navsud · 2024-10-28T16:24:08Z

backends/qualcomm/quantizer/qconfig.py

+        qscheme=torch.per_tensor_symmetric,
+        ch_axis=0,
+        reduce_range=True,
+        observer=MovingAverageMinMaxObserver,


This should be PerChannelMovingAverageMinMaxObserver and qscheme=torch.per_channel_symmetric?

Sorry for misunderstanding. This quant config is for per tensor specifically. The per channel one is here, and we use the condition of a quantizer member function here to assign per channel quant confing.

navsud · 2024-10-28T16:30:33Z

backends/qualcomm/quantizer/quantizer.py

-    get_default_8bit_qat_proto,
-    get_default_8bit_qnn_ptq_config,
+    get_8a8w_qnn_ptq_config,
+    get_8a8w_qnn_qat_config,


As a follow-up PR, can we unify/simplify: get_8a8w_qnn_ptq_config() and get_8a8w_qnn_qat_config() into get_8a8w_qnn_config() with the argument is_train=True/False, which will define whether we are doing PTQ vs. QAT.

No problem. We would like to keep separated at first. Once it seems to be stable we will raise a PR to merge them.

chunit-quic · 2024-10-29T02:00:47Z

Overall LGTM. cc: @cccclai for final approval.

Hi @navsud

Thank you very much for reviewing! We just uploaded a patch based on comments. Just two questions we would like to ask.

Does QAT only work for ops with weight at this moment? I tried to do QAT for a single op add, and got complained about no parameters to be trained. Is it expected?
Similar to the point 1. We are not pretty sure whether we quantize ops of mobilenetV2 properly. Accuracy drop significantly during QAT if we add quant config to any op but convolution op. More details can be found in the first comment of the PR

Thank you for reading! If anything is unclear please feel free to let me know. I will try to describe it more. :D

- Reorginize qualcomm/quantizer - Split quantizer/utils.py to -- qconfig -- annotators -- observers directory - Change coresponding callees - Rename get_default_Nbit_qnn_ptq_config to get_NaNw_qnn_ptq_config - Add 16a4w conv test* (It is not compared with original model)

- Move and rename param_observer.py to per_channel_param_observer.py - Add todo to merge qconfig

- Add todo for per_channel_param_observer.py

navsud

LGTM

chunit-quic · 2024-11-06T01:06:11Z

Hi @cccclai, may you take a glance at this PR when you are free, and perhas merge it, if it looks good to you? Thanks :D

cccclai

Looks good to me - thank you for adding the feature!

cccclai · 2024-11-06T02:01:53Z

I notice that I didn't import to internal...in the worst case, if this diff breaks some tests, we may need to revert and reland it...

chunit-quic · 2024-11-06T02:04:34Z

I notice that I didn't import to internal...in the worst case, if this diff breaks some tests, we may need to revert and reland it...

It's fine. If so just let me know what should I fix and I will do it asap. :)

This reverts commit 068f43c.

…#6722)

cccclai · 2024-11-08T06:44:52Z

@chunit-quic Unfortunately it's reverted because it breaks internal test....can you submit a PR again?

chunit-quic · 2024-11-11T04:09:21Z

@chunit-quic Unfortunately it's reverted because it breaks internal test....can you submit a PR again?

No problem. The new PR is PR6747

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 28, 2024

chunit-quic marked this pull request as draft October 28, 2024 03:58

navsud reviewed Oct 28, 2024

View reviewed changes

shewu-quic mentioned this pull request Oct 30, 2024

[QNN ISSUE] Error Occurred When Running Llama-3.2-1B-Instruct Model with llama_main (QNN 8A8W Quantization) on Device #6546

Closed

chunit-quic marked this pull request as ready for review November 1, 2024 03:06

Joey Tsai added 3 commits November 4, 2024 17:46

Fix baed on comments

fe66fe6

- Move and rename param_observer.py to per_channel_param_observer.py - Add todo to merge qconfig

Add a comment

0e57c97

- Add todo for per_channel_param_observer.py

chunit-quic force-pushed the dev1/chunit/qat_quantizer_refine branch from d542309 to 0e57c97 Compare November 4, 2024 09:47

chunit-quic changed the title ~~[Qualcomm AI Engine Direct - Quantizer refine for qat]~~ Qualcomm AI Engine Direct - Quantizer refine for qat Nov 4, 2024

[Fix lint]

086cca4

navsud approved these changes Nov 5, 2024

View reviewed changes

cccclai approved these changes Nov 6, 2024

View reviewed changes

cccclai merged commit 068f43c into pytorch:main Nov 6, 2024
39 checks passed

kirklandsign added a commit that referenced this pull request Nov 7, 2024

Revert "Qualcomm AI Engine Direct - Quantizer refine for qat (#6513)"

25d987f

This reverts commit 068f43c.

digantdesai pushed a commit that referenced this pull request Nov 7, 2024

Revert "Qualcomm AI Engine Direct - Quantizer refine for qat (#6513)" (…

4af687a

…#6722)

chunit-quic mentioned this pull request Nov 11, 2024

Qualcomm AI Engine Direct - Quantizer refine for qat #6747

Merged

		from torch.ao.quantization.observer import UniformQuantizationObserverBase


		class ParamObserver(UniformQuantizationObserverBase):

Qualcomm AI Engine Direct - Quantizer refine for qat #6513

Qualcomm AI Engine Direct - Quantizer refine for qat #6513

Uh oh!

Conversation

chunit-quic commented Oct 28, 2024

Uh oh!

pytorch-bot bot commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6513

✅ No Failures

Uh oh!

chunit-quic commented Oct 28, 2024

Uh oh!

cccclai commented Oct 28, 2024

Uh oh!

navsud left a comment

Choose a reason for hiding this comment

Uh oh!

navsud Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

chunit-quic Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

navsud Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

chunit-quic Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

navsud Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

chunit-quic Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chunit-quic commented Oct 29, 2024

Uh oh!

navsud left a comment

Choose a reason for hiding this comment

Uh oh!

chunit-quic commented Nov 6, 2024

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cccclai commented Nov 6, 2024

Uh oh!

chunit-quic commented Nov 6, 2024

Uh oh!

cccclai commented Nov 8, 2024

Uh oh!

chunit-quic commented Nov 11, 2024

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 28, 2024 •

edited

Loading

chunit-quic Oct 29, 2024 •

edited

Loading