fix: only use eos_token_id as pad_token_id if int #2774

dvrogozh · 2024-11-22T18:45:53Z

LLama 3 has a list of values as eos_token_id:

  "['<|end_of_text|>', '<|eom_id|>', '<|eot_id|>']"

This breaks tokenizer since it expects single value. This commit uses tokenizer.eos_token_id instead in such a case.

Fixes: #2440

CC: @Narsil @zucchini-nlp @ladi-pomsar

zucchini-nlp · 2024-11-24T11:12:22Z

@dvrogozh Cool, looks good to me. We also need a review from TGI team

dvrogozh · 2024-11-26T02:42:52Z

@Narsil, @OlivierDehaene : please, help to review.

Narsil

LGTM

This was referenced Nov 22, 2024

[Volta] [No flash attention] Llama 3.1 8B Instruct failed to start - "< not supported between instances of 'NoneType' and 'int'" #2440

Closed

Setting tokenizer.pad_token_id = model.config.eos_token_id fails for LLama 3 huggingface/transformers#34869

Closed

Narsil approved these changes Dec 2, 2024

View reviewed changes

Narsil merged commit 535149d into huggingface:main Dec 2, 2024

tishizaki mentioned this pull request Jan 27, 2025

Llama-3.2-3B-Instruct failed to use with HuggingfacePipeline because of setting a non-string value as the pad_token langchain-ai/langchain#29431

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: only use eos_token_id as pad_token_id if int #2774

fix: only use eos_token_id as pad_token_id if int #2774

Uh oh!

dvrogozh commented Nov 22, 2024 •

edited

Loading

Uh oh!

zucchini-nlp commented Nov 24, 2024

Uh oh!

dvrogozh commented Nov 26, 2024

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

fix: only use eos_token_id as pad_token_id if int #2774

fix: only use eos_token_id as pad_token_id if int #2774

Uh oh!

Conversation

dvrogozh commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Nov 24, 2024

Uh oh!

dvrogozh commented Nov 26, 2024

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dvrogozh commented Nov 22, 2024 •

edited

Loading