Skip to content

Add CLI arg generate_until_token to support reasoning and CoT models #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mapmeld
Copy link
Contributor

@mapmeld mapmeld commented Mar 17, 2025

As noted in #8 and #513 , LightEval expects models to follow a question with an immediate answer, but chain-of-thought and reasoning models (such as DeepSeek) generate many tokens to arrive at a more accurate / thought-out result before answering.

This PR would add --generate-until-token '</think>' as the syntax to support these models.
It must be run with --use-chat-template and a TransformerModel model, or it will raise an Exception.

I have a CoLab notebook running a BigBench task which I didn't run to the end, but I used logger.info to confirm it was generating reasoning text. In a previous test linked in #513 I confirmed this method works on a short task

Notes:

  • "generate until token" or "wait until..." is the clearest name I could think of to remind people to use the ending token
  • this doesn't look at the tokenizer's chat template string, but it could be helpful to detect the appropriate token
  • Should I have do_sample=True when generating the reasoning text? Is that reproducible?
  • Is there a way to see logger.debug() when calling lighteval from the command line? I can remove the logging of reasoning text if it isn't helpful
  • For better results on my custom task in <think> tags for thinking models #513 , I had to change answers ["A", "B", ...] to ["The answer is A", ...] - thoughts about using the template string to set post-reasoning text and be compatible with more evals?

@EdwardSJ151
Copy link

Hi! Is this currently working?

@mapmeld
Copy link
Contributor Author

mapmeld commented Apr 3, 2025

@EdwardSJ151 you should be able to run it with the code in this PR. If you run into issues, please comment.
This might help:

  • change the logger.debug line to logger.info to confirm that reasoning text is there
  • if your answers are "A", "B", "C", you might need to change them to "The answer is A"

@NathanHB NathanHB linked an issue May 15, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<think> tags for thinking models
2 participants