Skip to content

Commit 68a16df

Browse files
kimishpatelfacebook-github-bot
authored andcommitted
Revert errorneous 4bit quant changes
Summary: Earlier changes to 4bit working diff results in not working 4 bit support. THis diff restores those and avoids using min/max. This would have also intefered with 8bit quant that expects symmeteric min/max unlike 4bit. Reviewed By: digantdesai Differential Revision: D54198222 fbshipit-source-id: 035d34bbd7f87f8eb7fa61ac6b938ecac651cb00
1 parent 1d74652 commit 68a16df

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

examples/models/llama2/export_llama_lib.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -156,13 +156,13 @@ def check_embedding_byte_registered():
156156
"At the moment only per channel weight quantization is supported."
157157
)
158158
if quant_params.quantize_linear.is_qc4:
159-
nbits = 4
159+
operator_config_dynamic = get_symmetric_quantization_config(
160+
is_per_channel=True, is_dynamic=True, weight_qmin=-8, weight_qmax=7
161+
)
160162
else:
161-
nbits = 8
162-
qmin, qmax = -2 ^ (nbits), 2 ^ (nbits) - 1
163-
operator_config_dynamic = get_symmetric_quantization_config(
164-
is_per_channel=True, is_dynamic=True, weight_qmin=qmin, weight_qmax=qmax
165-
)
163+
operator_config_dynamic = get_symmetric_quantization_config(
164+
is_per_channel=True, is_dynamic=True
165+
)
166166
dynamic_quantizer.set_global(operator_config_dynamic)
167167
quantizers.append(dynamic_quantizer)
168168
return quantizers

0 commit comments

Comments
 (0)