-
Notifications
You must be signed in to change notification settings - Fork 608
Qualcomm AI Engine Direct - Apply spin quant R1 and R2 #5175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - Apply spin quant R1 and R2 #5175
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5175
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ee3a21c with merge base c5a385e ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
236c84d
to
241604f
Compare
Hi @cccclai, This PR is to add the argument (optimized_rotation_path) and transform to apply R1 and R2 of the spin quant. |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions, do we only use R1/R2? And is the hardmard_utils.py
used anywhere?
|
||
if args.optimized_rotation_path: | ||
transforms.append(fuse_layer_norms) | ||
transforms.append(get_rotate_model(args.optimized_rotation_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to follow, why do we need to do get_rotate_model
as a transform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under my understanding, we need to rotate the weight before running ptq flow.
I refer to this function in spin quant repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, so you just got R1/R2 from spin quant directly, instead of getting a new weight check point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I do.
Due to hardware constraint ....... as you known.
For R3, according to my experiments it does not seem to have a big impact on accuracy. Woops, |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: - Add a argument optimized_rotation_path to specify the optimized rotation file - Refer to https://github.com/facebookresearch/SpinQuant?tab=readme-ov-file to apply R1 R2
20ffb98
to
c5e09d7
Compare
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@@ -51,6 +51,7 @@ | |||
) | |||
from .source_transformation.rms_norm import replace_rms_norm_with_native_rms_norm | |||
from .source_transformation.rope import materialze_broadcast_of_rope_freq_cis | |||
from .source_transformation.rotation import fuse_layer_norms, get_rotate_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a really minor comment, can we rename the rotation file to be a long name, like apply_spin_quant_r1_r2
or something more specific? The reason is that cpu backend is also trying to apply spinquant but using the new check point instead. I'm just trying to make it a bit less confuse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have found another PR to try spin quant. I think that is a good method to apply spin quant.
Great, let me rename it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to merge with the other PR? Or will it be too much work before the branch cut?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean this PR #4962?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I guess that specific PR only apply R3 or maybe R4, so it's different than this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I couldn't agree more with you.
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@shewu-quic Hey could you provide command line instructions to repro the llama3 export result? |
Oh, just in time! I submitted a PR about how to deploy Llama3 8B Instruct with the QNN Backend. |
Summary: