You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.
help="output format - use f32 for float32, f16 for float16, bf16 for bfloat16, auto-f16 for the highest-fidelity 16-bit float type depending on the first loaded tensor type",
2399
2408
)
2400
2409
parser.add_argument(
2401
2410
"--bigendian", action="store_true",
@@ -2453,6 +2462,7 @@ def main() -> None:
2453
2462
"f32": gguf.LlamaFileType.ALL_F32,
2454
2463
"f16": gguf.LlamaFileType.MOSTLY_F16,
2455
2464
"bf16": gguf.LlamaFileType.MOSTLY_BF16,
2465
+
"auto-f16": gguf.LlamaFileType.GUESSED, # TODO: use a more appropriate "auto" type
0 commit comments