You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
## Context
Currently, exporting llama models to Vulkan using 4 bit weight quantization is broken because the behaviour of the `groupwise_affine_quantize_tensor` utility function from `torchao` was recently changed so that the packing of two 4-bit integers into a single 8 bit value does not occur.
To fix, just have the `VkInt4WeightOnlyQuantizer` perform that step itself.
Reviewed By: jorgep31415
Differential Revision: D67051119
0 commit comments