Revert errorneous 4bit quant changes

kimishpatel · facebook-github-bot · commit 68a16df59a94 · 2024-02-26T09:15:26.000-08:00
Summary:
Earlier changes to 4bit working diff results in not working 4 bit support.

THis diff restores those and avoids using min/max. This would have also
intefered with 8bit quant that expects symmeteric min/max unlike 4bit.

Reviewed By: digantdesai

Differential Revision: D54198222

fbshipit-source-id: 035d34bbd7f87f8eb7fa61ac6b938ecac651cb00
diff --git a/examples/models/llama2/export_llama_lib.py b/examples/models/llama2/export_llama_lib.py
@@ -156,13 +156,13 @@ def check_embedding_byte_registered():
                 "At the moment only per channel weight quantization is supported."
             )
         if quant_params.quantize_linear.is_qc4:
-            nbits = 4
+            operator_config_dynamic = get_symmetric_quantization_config(
+                is_per_channel=True, is_dynamic=True, weight_qmin=-8, weight_qmax=7
+            )
         else:
-            nbits = 8
-        qmin, qmax = -2 ^ (nbits), 2 ^ (nbits) - 1
-        operator_config_dynamic = get_symmetric_quantization_config(
-            is_per_channel=True, is_dynamic=True, weight_qmin=qmin, weight_qmax=qmax
-        )
+            operator_config_dynamic = get_symmetric_quantization_config(
+                is_per_channel=True, is_dynamic=True
+            )
         dynamic_quantizer.set_global(operator_config_dynamic)
         quantizers.append(dynamic_quantizer)
     return quantizers