Skip to content

Commit 77c74db

Browse files
Nexesenexhodlen
authored andcommitted
llama : correction of the attn.v.weight quantization for IQ3_XS (ggml-org#6209)
IQ3_XS was not mentioned, IQ3_S and IQ3_M were present twice. That PR corrects this in the manner which was probably intended initially.
1 parent 8d01895 commit 77c74db

File tree

1 file changed

+1
-7
lines changed

1 file changed

+1
-7
lines changed

llama.cpp

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12027,13 +12027,7 @@ static ggml_type llama_tensor_get_type(quantize_state_internal & qs, ggml_type n
1202712027
else if (ftype == LLAMA_FTYPE_MOSTLY_IQ3_XXS) {
1202812028
new_type = qs.model.hparams.n_gqa() >= 4 ? GGML_TYPE_Q4_K : !qs.has_imatrix ? GGML_TYPE_IQ3_S : GGML_TYPE_IQ3_XXS;
1202912029
}
12030-
else if (ftype == LLAMA_FTYPE_MOSTLY_IQ3_S && qs.model.hparams.n_gqa() >= 4) {
12031-
new_type = GGML_TYPE_Q4_K;
12032-
}
12033-
else if (ftype == LLAMA_FTYPE_MOSTLY_IQ3_M) {
12034-
new_type = GGML_TYPE_Q4_K;
12035-
}
12036-
else if (ftype == LLAMA_FTYPE_MOSTLY_IQ3_S && qs.model.hparams.n_gqa() >= 4) {
12030+
else if ((ftype == LLAMA_FTYPE_MOSTLY_IQ3_XS || ftype == LLAMA_FTYPE_MOSTLY_IQ3_S) && qs.model.hparams.n_gqa() >= 4) {
1203712031
new_type = GGML_TYPE_Q4_K;
1203812032
}
1203912033
else if (ftype == LLAMA_FTYPE_MOSTLY_IQ3_M) {

0 commit comments

Comments
 (0)