Qualcomm AI Engine Direct - Apply spin quant R1 and R2 #5175

shewu-quic · 2024-09-09T09:38:35Z

Summary:

Add a argument optimized_rotation_path to specify the optimized rotation file
Refer to https://github.com/facebookresearch/SpinQuant?tab=readme-ov-file to apply R1 R2

pytorch-bot · 2024-09-09T09:38:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5175

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ee3a21c with merge base c5a385e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shewu-quic · 2024-09-09T23:50:14Z

Hi @cccclai,

This PR is to add the argument (optimized_rotation_path) and transform to apply R1 and R2 of the spin quant.
If possible, could you help to take a look?
Thank you :)

facebook-github-bot · 2024-09-10T02:11:58Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai

A few questions, do we only use R1/R2? And is the hardmard_utils.py used anywhere?

cccclai · 2024-09-10T02:26:37Z

examples/models/llama2/export_llama_lib.py

+
+    if args.optimized_rotation_path:
+        transforms.append(fuse_layer_norms)
+        transforms.append(get_rotate_model(args.optimized_rotation_path))


I'm trying to follow, why do we need to do get_rotate_model as a transform?

Under my understanding, we need to rotate the weight before running ptq flow.
I refer to this function in spin quant repo.

oh I see, so you just got R1/R2 from spin quant directly, instead of getting a new weight check point?

Yes that's what I do.
Due to hardware constraint ....... as you known.

shewu-quic · 2024-09-10T02:31:07Z

A few questions, do we only use R1/R2? And is the hardmard_utils.py used anywhere?

For R3, according to my experiments it does not seem to have a big impact on accuracy.
But it will affect the performance, so I add first R1 and R2.

Woops, hardmard_utils.py is for R3 and R4. Let me remove it first.

facebook-github-bot · 2024-09-10T02:44:10Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

examples/models/llama2/export_llama_lib.py

Summary: - Add a argument optimized_rotation_path to specify the optimized rotation file - Refer to https://github.com/facebookresearch/SpinQuant?tab=readme-ov-file to apply R1 R2

facebook-github-bot · 2024-09-10T03:26:46Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-09-10T03:31:33Z

examples/models/llama2/export_llama_lib.py

@@ -51,6 +51,7 @@
 )
 from .source_transformation.rms_norm import replace_rms_norm_with_native_rms_norm
 from .source_transformation.rope import materialze_broadcast_of_rope_freq_cis
+from .source_transformation.rotation import fuse_layer_norms, get_rotate_model


Just a really minor comment, can we rename the rotation file to be a long name, like apply_spin_quant_r1_r2 or something more specific? The reason is that cpu backend is also trying to apply spinquant but using the new check point instead. I'm just trying to make it a bit less confuse

Yes, I have found another PR to try spin quant. I think that is a good method to apply spin quant.
Great, let me rename it.

Is it possible to merge with the other PR? Or will it be too much work before the branch cut?

Do you mean this PR #4962?

Hmm I guess that specific PR only apply R3 or maybe R4, so it's different than this PR

Yes, I couldn't agree more with you.

examples/models/llama2/TARGETS

facebook-github-bot · 2024-09-10T03:46:09Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-10T04:10:41Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-09-13T00:50:31Z

@shewu-quic Hey could you provide command line instructions to repro the llama3 export result?

shewu-quic · 2024-09-13T01:36:39Z

@shewu-quic Hey could you provide command line instructions to repro the llama3 export result?

Oh, just in time! I submitted a PR about how to deploy Llama3 8B Instruct with the QNN Backend.
#5335

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2024

shewu-quic force-pushed the dev1/hutton/enable_spin_quant_R1_R2 branch from 236c84d to 241604f Compare September 9, 2024 23:47

cccclai reviewed Sep 10, 2024

View reviewed changes

examples/models/llama2/export_llama_lib.py Show resolved Hide resolved

Sheng Feng Wu and others added 3 commits September 10, 2024 11:21

Qualcomm AI Engine Direct - Apply spin quant R1 and R2

304e891

Summary: - Add a argument optimized_rotation_path to specify the optimized rotation file - Refer to https://github.com/facebookresearch/SpinQuant?tab=readme-ov-file to apply R1 R2

remove not used

c293b3b

address review

c5e09d7

shewu-quic force-pushed the dev1/hutton/enable_spin_quant_R1_R2 branch from 20ffb98 to c5e09d7 Compare September 10, 2024 03:21

cccclai reviewed Sep 10, 2024

View reviewed changes

rename the rotation file to apply_spin_quant_r1_r2

9a90e5d

cccclai reviewed Sep 10, 2024

View reviewed changes

examples/models/llama2/TARGETS Outdated Show resolved Hide resolved

cccclai approved these changes Sep 10, 2024

View reviewed changes

fix name in TARGETS

ee3a21c

cccclai merged commit 657789e into pytorch:main Sep 10, 2024
36 of 38 checks passed

Qualcomm AI Engine Direct - Apply spin quant R1 and R2 #5175

Qualcomm AI Engine Direct - Apply spin quant R1 and R2 #5175

Uh oh!

Conversation

shewu-quic commented Sep 9, 2024

Uh oh!

pytorch-bot bot commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5175

✅ No Failures

Uh oh!

shewu-quic commented Sep 9, 2024

Uh oh!

facebook-github-bot commented Sep 10, 2024

Uh oh!

cccclai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shewu-quic commented Sep 10, 2024

Uh oh!

facebook-github-bot commented Sep 10, 2024

Uh oh!

Uh oh!

facebook-github-bot commented Sep 10, 2024

Uh oh!

cccclai Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Sep 10, 2024

Uh oh!

facebook-github-bot commented Sep 10, 2024

Uh oh!

Uh oh!

cccclai commented Sep 13, 2024

Uh oh!

shewu-quic commented Sep 13, 2024

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 9, 2024 •

edited

Loading

cccclai Sep 10, 2024 •

edited

Loading