Recognize IBM Granite 3.3 FIM tokens. Makes llama-server /infill usable. #12988

Noeda · 2025-04-17T01:33:10Z

The model in question is, freshly released by IBM: https://huggingface.co/ibm-granite/granite-3.3-8b-base

The Granite 3.3's FIM tokens are very similar to Qwen's; it's just that they use underscore instead of a dash. So <fim_middle> for example instead of <fim-middle>.

Opening up tokenizer_config.json in ibm-granite/granite-3.3-8b-base shows (https://huggingface.co/ibm-granite/granite-3.3-8b-base/blob/main/tokenizer_config.json)

    "<fim_prefix>",
    "<fim_middle>",
    "<fim_suffix>",
    "<fim_pad>",
    ...
    "<reponame>",

Tested with granite-3 I converted to .ggufs. I noticed the llama.cpp code completion vim extension didn't work with the llama-server, so I checked out if tokens were missing, added them, tested them, filed this PR.

I could not find an equivalent for file separator token, but I mapped the 5 tokens I found that had clear llama.cpp equivalents.

Testing:

Checked tokenization (i.e. does llama.cpp tokenize them all to single tokens):

$ ./build/bin/llama-tokenize --model ~/text-generation-webui/models/granite-3.3-base-q8.gguf --prompt "<fim_prefix><fim_middle><fim_suffix><fim_pad><reponame>"
... omitted verbose output ...
     1 -> '<fim_prefix>'
     2 -> '<fim_middle>'
     3 -> '<fim_suffix>'
     4 -> '<fim_pad>'
    18 -> '<reponame>'

Also saw:

And empirically tried in coding with the extension:

code_example_granite.mp4

(I thought it would offer printf() instead...C++ bias? 😉 ) I love that extension.

The Granite's FIM tokens are very similar to Qwen's; it's just that they use underscore instead of a dash. So <fim_middle> for example instead of <fim-middle>. Opening up tokenizer_config.json in ibm-granite/granite-3.3-8b-base shows: ``` "<fim_prefix>", "<fim_middle>", "<fim_suffix>", "<fim_pad>", ... "<reponame>", ```

ExtReMLapin · 2025-04-17T05:40:22Z

That would be nice to have somewhere a list of FIM supported models

The Granite's FIM tokens are very similar to Qwen's; it's just that they use underscore instead of a dash. So <fim_middle> for example instead of <fim-middle>. Opening up tokenizer_config.json in ibm-granite/granite-3.3-8b-base shows: ``` "<fim_prefix>", "<fim_middle>", "<fim_suffix>", "<fim_pad>", ... "<reponame>", ```

ggerganov approved these changes Apr 17, 2025

View reviewed changes

ggerganov merged commit 971f245 into ggml-org:master Apr 17, 2025
48 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recognize IBM Granite 3.3 FIM tokens. Makes llama-server /infill usable. #12988

Recognize IBM Granite 3.3 FIM tokens. Makes llama-server /infill usable. #12988

Uh oh!

Noeda commented Apr 17, 2025

Uh oh!

ExtReMLapin commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

Recognize IBM Granite 3.3 FIM tokens. Makes llama-server /infill usable. #12988

Recognize IBM Granite 3.3 FIM tokens. Makes llama-server /infill usable. #12988

Uh oh!

Conversation

Noeda commented Apr 17, 2025

Uh oh!

ExtReMLapin commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!