Skip to content

llama : keep track of all EOG tokens in the vocab #9609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 24, 2024
Merged

Conversation

ggerganov
Copy link
Member

fix #9606

Upon vocab construction, iterate over all tokens and store all that look like a token that might cause an "end of generation" event (e.g. <EOT>, <endoftext>, <im_end>, etc.). llama_token_is_eog will now check this set of tokens to determine the EOG status.

Detected EOG tokens are printed like this (Qwen2.5-Coder):

0.00.190.685 I llm_load_print_meta: model size       = 14.19 GiB (16.00 BPW) 
0.00.190.685 I llm_load_print_meta: general.name     = Qwen2.5 Coder 7B Instruct
0.00.190.685 I llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
0.00.190.686 I llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
0.00.190.686 I llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
0.00.190.686 I llm_load_print_meta: LF token         = 148848 'ÄĬ'
0.00.190.696 I llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
0.00.190.697 I llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
0.00.190.697 I llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
0.00.190.698 I llm_load_print_meta: max token length = 256
0.00.190.734 I llm_load_tensors: ggml ctx size =    0.30 MiB

This is yet another hack for handling end-of-... tokens. The best way to fix this is to have proper tokenizer configurations, but as discussed in #9606, this is unlikely to happen.

@tristandruyen
Copy link
Contributor

Seems to fix Qwen2.5-Coder at least #9606 (comment)

@ggerganov ggerganov merged commit 31ac583 into master Sep 24, 2024
59 checks passed
@ggerganov ggerganov deleted the gg/eog-ids branch September 24, 2024 07:16
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Qwen2.5-Coder variants do not properly stop in FIM mode
2 participants