llama : keep track of all EOG tokens in the vocab #9609
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix #9606
Upon vocab construction, iterate over all tokens and store all that look like a token that might cause an "end of generation" event (e.g.
<EOT>
,<endoftext>
,<im_end>
, etc.).llama_token_is_eog
will now check this set of tokens to determine the EOG status.Detected EOG tokens are printed like this (
Qwen2.5-Coder
):This is yet another hack for handling end-of-... tokens. The best way to fix this is to have proper tokenizer configurations, but as discussed in #9606, this is unlikely to happen.