Skip to content

Commit 1e2efa3

Browse files
authored
Manually apply 4bit weight packing
Differential Revision: D67051119 Pull Request resolved: #7274
1 parent 66dcd40 commit 1e2efa3

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

backends/vulkan/_passes/int4_weight_only_quantizer.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,12 @@ def _create_quantized_state_dict(
226226
self.groupsize,
227227
self.precision, # dtype for scales_and_zeros
228228
)
229+
# If the packing of 2 4-bit values into a single 8-bit value was not
230+
# performed in the previous function call, then do it manually now.
231+
if w_int4x8.shape == weight.shape:
232+
w_int4x8 = (w_int4x8[::, ::2] << 4 | w_int4x8[::, 1::2]).to(
233+
torch.uint8
234+
)
229235
# In the original implementation, w_int4x8 is packed via calling the
230236
# _convert_weight_to_int4pack operator before storing the weight. However
231237
# the Vulkan implementation does not expect the weights to be packed, so

0 commit comments

Comments
 (0)