Skip to content

Commit 4852a5b

Browse files
dranger003arthw
authored andcommitted
Use model->gguf_kv for loading the template instead of using the C API. (ggml-org#10868)
* Bump model_template to 16384 bytes to support larger chat templates. * Use `model->gguf_kv` for efficiency.
1 parent 0cb43f2 commit 4852a5b

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

src/llama.cpp

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22660,15 +22660,15 @@ int32_t llama_chat_apply_template(
2266022660
std::string curr_tmpl(tmpl == nullptr ? "" : tmpl);
2266122661
if (tmpl == nullptr) {
2266222662
GGML_ASSERT(model != nullptr);
22663-
// load template from model
22664-
std::vector<char> model_template(2048, 0); // longest known template is about 1200 bytes
22665-
std::string template_key = "tokenizer.chat_template";
22666-
int32_t res = llama_model_meta_val_str(model, template_key.c_str(), model_template.data(), model_template.size());
22667-
if (res < 0) {
22663+
22664+
// load template from model, if available
22665+
const auto & it = model->gguf_kv.find("tokenizer.chat_template");
22666+
if (it != model->gguf_kv.end() && it->second.size() > 0) {
22667+
curr_tmpl = it->second;
22668+
}
22669+
else {
2266822670
// worst case: there is no information about template, we will use chatml by default
22669-
curr_tmpl = "chatml"; // see llama_chat_apply_template_internal
22670-
} else {
22671-
curr_tmpl = std::string(model_template.data(), model_template.size());
22671+
curr_tmpl = "chatml"; // see llama_chat_apply_template_internal
2267222672
}
2267322673
}
2267422674

0 commit comments

Comments
 (0)