llama : optimize long word tokenization with WPM #8034

ggerganov · 2024-06-20T11:48:37Z

more efficient "longest token" search for very long words, utilizing vocab.max_token_len
reuse llm_tokenizer_wpm instance in loop
reserve array in unicode_cpts_from_utf8

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggml-ci

llama : optimize long word tokenization with WPM

677bf2e

ggml-ci

ggerganov force-pushed the gg/max-token-length branch from fb29bda to 677bf2e Compare June 20, 2024 11:50

ggerganov mentioned this pull request Jun 20, 2024

Bug: Embedding endpoint takes exponential time to process a long unknown token #8029

Closed

ggerganov merged commit a927b0f into master Jun 21, 2024
72 checks passed

ggerganov deleted the gg/max-token-length branch June 21, 2024 05:51

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024

llama : optimize long word tokenization with WPM (ggml-org#8034)

ca8dbd6

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : optimize long word tokenization with WPM #8034

llama : optimize long word tokenization with WPM #8034

Uh oh!

ggerganov commented Jun 20, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

llama : optimize long word tokenization with WPM #8034

llama : optimize long word tokenization with WPM #8034

Uh oh!

Conversation

ggerganov commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Jun 20, 2024 •

edited

Loading