Skip to content

Qualcomm AI Engine Direct - Apply spin quant R1 and R2 #5175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 10, 2024

Conversation

shewu-quic
Copy link
Collaborator

Summary:

Copy link

pytorch-bot bot commented Sep 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5175

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ee3a21c with merge base c5a385e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2024
@shewu-quic shewu-quic force-pushed the dev1/hutton/enable_spin_quant_R1_R2 branch from 236c84d to 241604f Compare September 9, 2024 23:47
@shewu-quic
Copy link
Collaborator Author

Hi @cccclai,

This PR is to add the argument (optimized_rotation_path) and transform to apply R1 and R2 of the spin quant.
If possible, could you help to take a look?
Thank you :)

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions, do we only use R1/R2? And is the hardmard_utils.py used anywhere?


if args.optimized_rotation_path:
transforms.append(fuse_layer_norms)
transforms.append(get_rotate_model(args.optimized_rotation_path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to follow, why do we need to do get_rotate_model as a transform?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under my understanding, we need to rotate the weight before running ptq flow.
I refer to this function in spin quant repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, so you just got R1/R2 from spin quant directly, instead of getting a new weight check point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I do.
Due to hardware constraint ....... as you known.

@shewu-quic
Copy link
Collaborator Author

A few questions, do we only use R1/R2? And is the hardmard_utils.py used anywhere?

For R3, according to my experiments it does not seem to have a big impact on accuracy.
But it will affect the performance, so I add first R1 and R2.

Woops, hardmard_utils.py is for R3 and R4. Let me remove it first.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sheng Feng Wu and others added 3 commits September 10, 2024 11:21
Summary:
- Add a argument optimized_rotation_path to specify the optimized rotation file
- Refer to https://github.com/facebookresearch/SpinQuant?tab=readme-ov-file to apply R1 R2
@shewu-quic shewu-quic force-pushed the dev1/hutton/enable_spin_quant_R1_R2 branch from 20ffb98 to c5e09d7 Compare September 10, 2024 03:21
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -51,6 +51,7 @@
)
from .source_transformation.rms_norm import replace_rms_norm_with_native_rms_norm
from .source_transformation.rope import materialze_broadcast_of_rope_freq_cis
from .source_transformation.rotation import fuse_layer_norms, get_rotate_model
Copy link
Contributor

@cccclai cccclai Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a really minor comment, can we rename the rotation file to be a long name, like apply_spin_quant_r1_r2 or something more specific? The reason is that cpu backend is also trying to apply spinquant but using the new check point instead. I'm just trying to make it a bit less confuse

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have found another PR to try spin quant. I think that is a good method to apply spin quant.
Great, let me rename it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to merge with the other PR? Or will it be too much work before the branch cut?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this PR #4962?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I guess that specific PR only apply R3 or maybe R4, so it's different than this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I couldn't agree more with you.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai cccclai merged commit 657789e into pytorch:main Sep 10, 2024
36 of 38 checks passed
@cccclai
Copy link
Contributor

cccclai commented Sep 13, 2024

@shewu-quic Hey could you provide command line instructions to repro the llama3 export result?

@shewu-quic
Copy link
Collaborator Author

@shewu-quic Hey could you provide command line instructions to repro the llama3 export result?

Oh, just in time! I submitted a PR about how to deploy Llama3 8B Instruct with the QNN Backend.
#5335

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants