You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
^a8w4dq quantization scheme requires model to be converted to fp32, due to lack of support for fp16 and bf16.
26
+
25
27
*These are the only valid bitwidth options.
26
28
27
29
**There are many valid group size options, including 512, 1024, etc. Note that smaller groupsize tends to be better for preserving model quality and accuracy, and larger groupsize for further improving performance. Set 0 for channelwise quantization.
@@ -65,13 +67,13 @@ python3 generate.py [--compile] llama3 --prompt "Hello, my name is" --quantize '
0 commit comments