Add CLI arg `generate_until_token` to support reasoning and CoT models #617

mapmeld · 2025-03-17T17:17:12Z

As noted in #8 and #513 , LightEval expects models to follow a question with an immediate answer, but chain-of-thought and reasoning models (such as DeepSeek) generate many tokens to arrive at a more accurate / thought-out result before answering.

This PR would add --generate-until-token '</think>' as the syntax to support these models.
It must be run with --use-chat-template and a TransformerModel model, or it will raise an Exception.

I have a CoLab notebook running a BigBench task which I didn't run to the end, but I used logger.info to confirm it was generating reasoning text. In a previous test linked in #513 I confirmed this method works on a short task

Notes:

"generate until token" or "wait until..." is the clearest name I could think of to remind people to use the ending token
this doesn't look at the tokenizer's chat template string, but it could be helpful to detect the appropriate token
Should I have do_sample=True when generating the reasoning text? Is that reproducible?
Is there a way to see logger.debug() when calling lighteval from the command line? I can remove the logging of reasoning text if it isn't helpful
For better results on my custom task in <think> tags for thinking models #513 , I had to change answers ["A", "B", ...] to ["The answer is A", ...] - thoughts about using the template string to set post-reasoning text and be compatible with more evals?

EdwardSJ151 · 2025-04-02T00:34:58Z

Hi! Is this currently working?

mapmeld · 2025-04-03T03:21:46Z

@EdwardSJ151 you should be able to run it with the code in this PR. If you run into issues, please comment.
This might help:

change the logger.debug line to logger.info to confirm that reasoning text is there
if your answers are "A", "B", "C", you might need to change them to "The answer is A"

mapmeld added 3 commits March 17, 2025 12:15

add option to lighteval accelerate for think token

046df99

logger debug the reasoning

6b3009d

modifications

5d03fcf

NathanHB linked an issue May 15, 2025 that may be closed by this pull request

<think> tags for thinking models #513

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CLI arg `generate_until_token` to support reasoning and CoT models #617

Add CLI arg `generate_until_token` to support reasoning and CoT models #617

Uh oh!

mapmeld commented Mar 17, 2025

Uh oh!

EdwardSJ151 commented Apr 2, 2025

Uh oh!

mapmeld commented Apr 3, 2025

Uh oh!

Uh oh!

Add CLI arg generate_until_token to support reasoning and CoT models #617

Are you sure you want to change the base?

Add CLI arg generate_until_token to support reasoning and CoT models #617

Uh oh!

Conversation

mapmeld commented Mar 17, 2025

Uh oh!

EdwardSJ151 commented Apr 2, 2025

Uh oh!

mapmeld commented Apr 3, 2025

Uh oh!

Uh oh!

Add CLI arg `generate_until_token` to support reasoning and CoT models #617

Add CLI arg `generate_until_token` to support reasoning and CoT models #617