Skip to content

Commit 969be5d

Browse files
committed
llama : fix non-quantization of expert gating tensors
This reverts a single line from ggml-org#5475
1 parent cb49e0f commit 969be5d

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

llama.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11213,7 +11213,8 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
1121311213
quantize &= !params->only_copy;
1121411214

1121511215
// do not quantize expert gating tensors
11216-
quantize &= name != LLM_TN(model.arch)(LLM_TENSOR_FFN_GATE_INP, "weight");
11216+
// NOTE: can't use LLM_TN here because the layer number is not known
11217+
quantize &= name.find("ffn_gate_inp.weight") == std::string::npos;
1121711218

1121811219
// do not quantize positional embeddings and token types (BERT)
1121911220
quantize &= name != LLM_TN(model.arch)(LLM_TENSOR_POS_EMBD, "weight");

0 commit comments

Comments
 (0)