Add Yandex instruct model template support #12621

vorobyov01 · 2025-03-28T07:56:45Z

Self-reported review complexity:

Low

Hello!
We at Yandex are planning to release our 8B instruct model to open-source. The pre-trained model can be found here: YandexGPT-5-Lite-8B-pretrain.
I created this pull request to add support for our custom chat template in llama.cpp. Could you please take a look and review my changes? @ggerganov

I ran test-chat-template locally — it works as expected. I believe my changes should not affect other parts of the project.

Thank you!

vorobyov01 · 2025-03-28T08:04:07Z

I encountered an issue related to the difference between Jinja and C++ templates. For some reason, the system prompt "You are a helpful assistant" in Jinja was included in the first user message. I suspect this is due to some peculiarities in Jinja templating. As a workaround, I hardcoded it in the tests:

/* .expected_output= */ "<s> Пользователь: Hello\n\n Ассистент: Hi there\n\n Пользователь: Who are you\n\n Ассистент:    I am an assistant   \n\n Пользователь: Another question\n\n Ассистент:[SEP]",

/* .expected_output_jinja= */ "<s> Пользователь: You are a helpful assistant\nHello\n\n Ассистент: Hi there\n\n Пользователь: Who are you\n\n Ассистент:    I am an assistant   \n\n Пользователь: Another question\n\n Ассистент:[SEP]",

vorobyov01 · 2025-03-30T06:44:13Z

@compilade @ngxson Could you please take a look when you have time?

ngxson · 2025-03-30T08:40:16Z

This template doesn't seem to use EOT (end of turn) token. Have you tried with llama-cli to make sure the generation correctly stop after each turn?

vorobyov01 · 2025-03-30T08:45:59Z

Yes, it works fine. We used EOS token as EOT during alignment

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.
 - Not using system message. To change it, set a different value via -sys PROMPT


> Hello! 
 Hello! How can I assist you?

> Write a haiku about tokenizer
 A tokenizer splits,
Breaking words into parts,
Streams flow in lines.

>

We have a prefix space in the 1st token of the reply, which is a consequence of converting our tokenizer to the HF infra.

ngxson

Seems good to me.

Btw if you're from Yandex team, please do not use the \n\nas EOT next time you make an instruction-tuned model. The reason is because this prevent the model to generate long code or markdown content, which makes it completely useless for real-life use case.

src/llama-chat.cpp

vorobyov01 · 2025-03-30T10:40:01Z

It uses the </s> token as an EOT. There will indeed be \n\n</s> at the end of each turn, but not every \n\n sequence should end with </s>

vorobyov01 · 2025-03-30T12:06:32Z

Thanks a lot for reviewing!

ngxson · 2025-03-30T14:52:11Z

It uses the </s> token as an EOT. There will indeed be \n\n</s> at the end of each turn, but not every \n\n sequence should end with </s>

Then I think the current template is wrong:

ss << " Пользователь: " << chat[i]->content << "\n\n";

It must be "\n\n</s>" after each turn, not just "\n\n"

ngxson · 2025-03-30T14:55:13Z

In anyway, for future I would recommend using template like chatml or llama 3, as it is very easy to work with, and also you don't have to deal with trailing space issues

Dismiss until EOS/EOT is confirmed

vorobyov01 · 2025-03-30T17:04:05Z

Our model was trained to respond with only one turn after Ассистент:[SEP], so Ассистент:[SEP] appears only before the very last reply. I might be mistaken (please correct me if im wrong), but when calling llama-cli, it reconstructs this dialog each time, meaning our last reply will always start with Ассистент:[SEP], no matter how long the sequence we generate. It always ends with the eos token </s>. We don't actually have a concept of an end_of_turn token; sorry for the confusion.

I agree that it would be much better and more convenient to use special tokens for this. We realized this as we began the open-source process. We will definitely switch to special tokens in future model updates :)

ngxson · 2025-03-30T18:05:18Z

when calling llama-cli, it reconstructs this dialog each time

No it is not the case, the way llama-cli works is to keep processed tokens as-is.

From what you said, I image the expected behavior is as follow: For the first turn, we will have:

... Ассистент:[SEP]

When it done generating:

... Ассистент: the response</s>

Then for next turn, we have:

... Ассистент: the response ... Ассистент:[SEP]

But in reality, this is what you will get in llama-cli, we can't go back to modify the past token:

... Ассистент:[SEP] the response</s> ... Ассистент:[SEP]

Currently, only llama-server can handle this case, but this will invalidate the KV cache for the last assistant response because now we have to go back and modify one single token (which is very inefficient)

So, please, just use an existing chat template next time. If you don't know how chat templates works and try to invent a new one like this, you risk degrading both performance and quality a lot!

ngxson · 2025-03-30T18:07:38Z

I will approve & merge this for now, but please note that the quality and performance will be degraded compared to other models (as explained above)

Let's hope that next time we can reuse one of the existing templates, so your model will "just work" out of the box with the best performance.

For example, with chatml, you can simple do:

<|im_start|>Пользователь\n{user_message}<|im_end|>\n<|im_start|>Ассистент\n{assistant_message}<|im_end|>...

vorobyov01 · 2025-03-30T18:13:17Z

Ok thanks a lot!
We will inform users that performance may suffer due to a non-standard template.

Sergei Vorobev added 5 commits March 24, 2025 14:02

add yandex template

680d9ab

update yandex chat template

c4e8c0b

fix tests

f43d283

adjust chat template

ba4b6ba

fix style

f939b39

github-actions bot added the testing Everything test related label Mar 28, 2025

fix tool macro in template

bb94f4e

ngxson previously approved these changes Mar 30, 2025

View reviewed changes

ngxson reviewed Mar 30, 2025

View reviewed changes

src/llama-chat.cpp Outdated Show resolved Hide resolved

add clarify comment

ee554df

ngxson approved these changes Mar 30, 2025

View reviewed changes

ngxson merged commit 7242dd9 into ggml-org:master Mar 30, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Yandex instruct model template support #12621

Add Yandex instruct model template support #12621

Uh oh!

vorobyov01 commented Mar 28, 2025 •

edited

Loading

Uh oh!

vorobyov01 commented Mar 28, 2025

Uh oh!

vorobyov01 commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025

Uh oh!

vorobyov01 commented Mar 30, 2025 •

edited

Loading

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

Uh oh!

vorobyov01 commented Mar 30, 2025 •

edited

Loading

Uh oh!

vorobyov01 commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025

Uh oh!

vorobyov01 commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

vorobyov01 commented Mar 30, 2025

Uh oh!

Uh oh!

Add Yandex instruct model template support #12621

Add Yandex instruct model template support #12621

Uh oh!

Conversation

vorobyov01 commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vorobyov01 commented Mar 28, 2025

Uh oh!

vorobyov01 commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025

Uh oh!

vorobyov01 commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vorobyov01 commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vorobyov01 commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025

Uh oh!

vorobyov01 commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025

Uh oh!

ngxson commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vorobyov01 commented Mar 30, 2025

Uh oh!

Uh oh!

vorobyov01 commented Mar 28, 2025 •

edited

Loading

vorobyov01 commented Mar 30, 2025 •

edited

Loading

ngxson left a comment •

edited

Loading

vorobyov01 commented Mar 30, 2025 •

edited

Loading

ngxson commented Mar 30, 2025 •

edited

Loading