-
Notifications
You must be signed in to change notification settings - Fork 607
Manually apply 4bit weight packing #7274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7274
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 1be39d1 with merge base de74961 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D67051119 |
Summary: ## Context Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the `groupwise_affine_quantize_tensor` utility function from `torchao` was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur. To fix, just have the `VkInt4WeightOnlyQuantizer` perform that step itself. Reviewed By: jorgep31415 Differential Revision: D67051119
be122f3
to
afaf771
Compare
Summary: ## Context Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the `groupwise_affine_quantize_tensor` utility function from `torchao` was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur. To fix, just have the `VkInt4WeightOnlyQuantizer` perform that step itself. Reviewed By: jorgep31415 Differential Revision: D67051119
afaf771
to
654d8a6
Compare
This pull request was exported from Phabricator. Differential Revision: D67051119 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D67051119 |
Summary: ## Context Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the `groupwise_affine_quantize_tensor` utility function from `torchao` was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur. To fix, just have the `VkInt4WeightOnlyQuantizer` perform that step itself. Reviewed By: jorgep31415 Differential Revision: D67051119
654d8a6
to
187c22d
Compare
This pull request was exported from Phabricator. Differential Revision: D67051119 |
Summary: ## Context Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the `groupwise_affine_quantize_tensor` utility function from `torchao` was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur. To fix, just have the `VkInt4WeightOnlyQuantizer` perform that step itself. Reviewed By: jorgep31415 Differential Revision: D67051119
187c22d
to
1be39d1
Compare
This pull request was exported from Phabricator. Differential Revision: D67051119 |
Summary:
Context
Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the
groupwise_affine_quantize_tensor
utility function fromtorchao
was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur.To fix, just have the
VkInt4WeightOnlyQuantizer
perform that step itself.Differential Revision: D67051119