More checks before assuming FIM tokens #7644

CISC · 2024-05-30T14:49:10Z

Added a bare minimum of extra checks before assuming CodeLlama/CodeGemma FIM tokens.

This fixes Codestral and any other non-CodeLlama/Gemma model with the word "code" in general.name.

BTW, Codestral was initially released with wrong vocab (missing FIM tokens), updated vocab was released today, however Codestral does not use the middle token, so I'm generating GGUFs without the middle token, this might require a separate fix in server.cpp /infill route!

The GGUFs are up in case anyone wants to test (and perhaps implement an spm_infill argument to switch to Suffix/Prefix/Middle sequence).

Fixes #7592

github-actions · 2024-05-30T19:15:29Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 527 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8901.42ms p(95)=21984.39ms fails=, finish reason: stop=475 truncated=52
Prompt processing (pp): avg=106.63tk/s p(95)=484.55tk/s
Token generation (tg): avg=59.75tk/s p(95)=49.26tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=non-llama-fim-fix commit=f4df3ffdf718eb8c5637991a59ca2f8ec30117cc

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717977423 --> 1717978053
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 404.15, 404.15, 404.15, 404.15, 404.15, 988.04, 988.04, 988.04, 988.04, 988.04, 970.87, 970.87, 970.87, 970.87, 970.87, 972.62, 972.62, 972.62, 972.62, 972.62, 1009.18, 1009.18, 1009.18, 1009.18, 1009.18, 1019.49, 1019.49, 1019.49, 1019.49, 1019.49, 1007.99, 1007.99, 1007.99, 1007.99, 1007.99, 1013.64, 1013.64, 1013.64, 1013.64, 1013.64, 1025.45, 1025.45, 1025.45, 1025.45, 1025.45, 1015.1, 1015.1, 1015.1, 1015.1, 1015.1, 1018.56, 1018.56, 1018.56, 1018.56, 1018.56, 1036.78, 1036.78, 1036.78, 1036.78, 1036.78, 1025.94, 1025.94, 1025.94, 1025.94, 1025.94, 1032.08, 1032.08, 1032.08, 1032.08, 1032.08, 1022.19, 1022.19, 1022.19, 1022.19, 1022.19, 1017.78, 1017.78, 1017.78, 1017.78, 1017.78, 1012.42, 1012.42, 1012.42, 1012.42, 1012.42, 1006.4, 1006.4, 1006.4, 1006.4, 1006.4, 1002.71, 1002.71, 1002.71, 1002.71, 1002.71, 1018.87, 1018.87, 1018.87, 1018.87, 1018.87, 1014.78, 1014.78, 1014.78, 1014.78, 1014.78, 1008.49, 1008.49, 1008.49, 1008.49, 1008.49, 1010.66, 1010.66, 1010.66, 1010.66, 1010.66, 1009.83, 1009.83, 1009.83, 1009.83, 1009.83, 1022.27, 1022.27, 1022.27, 1022.27, 1022.27, 1014.49, 1014.49, 1014.49, 1014.49, 1014.49, 1013.9, 1013.9, 1013.9, 1013.9, 1013.9, 1012.99, 1012.99, 1012.99, 1012.99, 1012.99, 1013.38, 1013.38, 1013.38, 1013.38, 1013.38, 1011.93, 1011.93, 1011.93, 1011.93, 1011.93, 1009.59, 1009.59, 1009.59, 1009.59, 1009.59, 1004.77, 1004.77, 1004.77, 1004.77, 1004.77, 1000.92, 1000.92, 1000.92, 1000.92, 1000.92, 1001.42, 1001.42, 1001.42, 1001.42, 1001.42, 1004.11, 1004.11, 1004.11, 1004.11, 1004.11, 1004.59, 1004.59, 1004.59, 1004.59, 1004.59, 1006.44, 1006.44, 1006.44, 1006.44, 1006.44, 990.82, 990.82, 990.82, 990.82, 990.82, 985.94, 985.94, 985.94, 985.94, 985.94, 979.83, 979.83, 979.83, 979.83, 979.83, 981.1, 981.1, 981.1, 981.1, 981.1, 980.6, 980.6, 980.6, 980.6, 980.6, 985.35, 985.35, 985.35, 985.35, 985.35, 985.62, 985.62, 985.62, 985.62, 985.62, 984.94, 984.94, 984.94, 984.94, 984.94, 983.25, 983.25, 983.25, 983.25, 983.25, 976.91, 976.91, 976.91, 976.91, 976.91, 972.57, 972.57, 972.57, 972.57, 972.57, 975.68, 975.68, 975.68, 975.68, 975.68, 975.4, 975.4, 975.4, 975.4, 975.4, 973.57, 973.57, 973.57, 973.57, 973.57, 973.46, 973.46, 973.46, 973.46, 973.46, 972.74, 972.74, 972.74, 972.74, 972.74, 971.41, 971.41, 971.41, 971.41, 971.41, 946.43, 946.43, 946.43, 946.43, 946.43, 944.3, 944.3, 944.3, 944.3, 944.3, 950.26, 950.26, 950.26, 950.26, 950.26, 948.54, 948.54, 948.54, 948.54, 948.54, 948.92, 948.92, 948.92, 948.92, 948.92, 947.77, 947.77, 947.77, 947.77, 947.77, 948.17, 948.17, 948.17, 948.17, 948.17, 948.17]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717977423 --> 1717978053
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.38, 34.38, 34.38, 34.38, 34.38, 30.85, 30.85, 30.85, 30.85, 30.85, 28.14, 28.14, 28.14, 28.14, 28.14, 30.0, 30.0, 30.0, 30.0, 30.0, 30.09, 30.09, 30.09, 30.09, 30.09, 31.59, 31.59, 31.59, 31.59, 31.59, 33.06, 33.06, 33.06, 33.06, 33.06, 33.65, 33.65, 33.65, 33.65, 33.65, 34.27, 34.27, 34.27, 34.27, 34.27, 34.34, 34.34, 34.34, 34.34, 34.34, 33.87, 33.87, 33.87, 33.87, 33.87, 33.48, 33.48, 33.48, 33.48, 33.48, 33.35, 33.35, 33.35, 33.35, 33.35, 31.18, 31.18, 31.18, 31.18, 31.18, 30.62, 30.62, 30.62, 30.62, 30.62, 29.48, 29.48, 29.48, 29.48, 29.48, 28.21, 28.21, 28.21, 28.21, 28.21, 28.57, 28.57, 28.57, 28.57, 28.57, 28.56, 28.56, 28.56, 28.56, 28.56, 28.48, 28.48, 28.48, 28.48, 28.48, 28.57, 28.57, 28.57, 28.57, 28.57, 28.82, 28.82, 28.82, 28.82, 28.82, 28.96, 28.96, 28.96, 28.96, 28.96, 29.2, 29.2, 29.2, 29.2, 29.2, 29.02, 29.02, 29.02, 29.02, 29.02, 29.13, 29.13, 29.13, 29.13, 29.13, 29.42, 29.42, 29.42, 29.42, 29.42, 29.31, 29.31, 29.31, 29.31, 29.31, 29.33, 29.33, 29.33, 29.33, 29.33, 29.63, 29.63, 29.63, 29.63, 29.63, 29.78, 29.78, 29.78, 29.78, 29.78, 29.92, 29.92, 29.92, 29.92, 29.92, 30.11, 30.11, 30.11, 30.11, 30.11, 30.23, 30.23, 30.23, 30.23, 30.23, 30.16, 30.16, 30.16, 30.16, 30.16, 30.01, 30.01, 30.01, 30.01, 30.01, 29.82, 29.82, 29.82, 29.82, 29.82, 29.56, 29.56, 29.56, 29.56, 29.56, 29.74, 29.74, 29.74, 29.74, 29.74, 29.83, 29.83, 29.83, 29.83, 29.83, 29.96, 29.96, 29.96, 29.96, 29.96, 30.08, 30.08, 30.08, 30.08, 30.08, 30.03, 30.03, 30.03, 30.03, 30.03, 29.89, 29.89, 29.89, 29.89, 29.89, 29.74, 29.74, 29.74, 29.74, 29.74, 29.05, 29.05, 29.05, 29.05, 29.05, 27.98, 27.98, 27.98, 27.98, 27.98, 27.99, 27.99, 27.99, 27.99, 27.99, 27.99, 27.99, 27.99, 27.99, 27.99, 28.12, 28.12, 28.12, 28.12, 28.12, 28.18, 28.18, 28.18, 28.18, 28.18, 28.16, 28.16, 28.16, 28.16, 28.16, 28.19, 28.19, 28.19, 28.19, 28.19, 28.21, 28.21, 28.21, 28.21, 28.21, 28.14, 28.14, 28.14, 28.14, 28.14, 28.19, 28.19, 28.19, 28.19, 28.19, 28.24, 28.24, 28.24, 28.24, 28.24, 28.35, 28.35, 28.35, 28.35, 28.35, 28.5, 28.5, 28.5, 28.5, 28.5, 28.56, 28.56, 28.56, 28.56, 28.56, 28.72]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717977423 --> 1717978053
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.07, 0.07, 0.07, 0.07, 0.07, 0.38, 0.38, 0.38, 0.38, 0.38, 0.22, 0.22, 0.22, 0.22, 0.22, 0.24, 0.24, 0.24, 0.24, 0.24, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.27, 0.27, 0.27, 0.27, 0.27, 0.32, 0.32, 0.32, 0.32, 0.32, 0.26, 0.26, 0.26, 0.26, 0.26, 0.41, 0.41, 0.41, 0.41, 0.41, 0.46, 0.46, 0.46, 0.46, 0.46, 0.47, 0.47, 0.47, 0.47, 0.47, 0.39, 0.39, 0.39, 0.39, 0.39, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.32, 0.32, 0.32, 0.32, 0.32, 0.25, 0.25, 0.25, 0.25, 0.25, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.32, 0.32, 0.32, 0.32, 0.32, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.23, 0.23, 0.23, 0.23, 0.23, 0.35, 0.35, 0.35, 0.35, 0.35, 0.23, 0.23, 0.23, 0.23, 0.23, 0.19, 0.19, 0.19, 0.19, 0.19, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.31, 0.31, 0.31, 0.31, 0.31, 0.39, 0.39, 0.39, 0.39, 0.39, 0.55, 0.55, 0.55, 0.55, 0.55, 0.67, 0.67, 0.67, 0.67, 0.67, 0.59, 0.59, 0.59, 0.59, 0.59, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.24, 0.24, 0.24, 0.24, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.31, 0.31, 0.31, 0.31, 0.31, 0.11, 0.11, 0.11, 0.11, 0.11, 0.25, 0.25, 0.25, 0.25, 0.25, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 527 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717977423 --> 1717978053
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0]

llama.cpp

mofosyne · 2024-06-09T02:39:33Z

@CISC what's going on with the CI? Can you double check? If its not related to your code you may want to rebase against last known good CI build in master.

CISC · 2024-06-09T09:20:32Z

@mofosyne I have no idea, but the Server checks seem to randomly fail for everyone, not just here.

CISC · 2024-06-09T10:06:07Z

@mofosyne So, syncing seems to have "fixed" it, just the normal failures now, no clue what changed. :)

CISC · 2024-06-13T11:17:42Z

@ggerganov Do the checks look ok now?

SPM infill just got merged in llama-cpp-python, so only this issue blocking Codestral Fill-in-Middle.

ggerganov · 2024-06-14T10:19:54Z

Let's merge it, but I think we are doing it wrong when it comes to FIM tokens. I haven't looked in the details deeply, but this code definitely looks like a big hack. We have to try to figure out something more robust

CISC · 2024-06-14T10:27:44Z

The main problem is that there's no way to know if

there are FIM tokens in the vocab
the FIM tokens are used by the model

IMO it is unsafe to make assumptions here, you really need to know that the model in question has the correct tokens and support using them.

CISC added 2 commits May 30, 2024 16:33

More checks before assuming FIM tokens for Llama arch

998d208

Merge branch 'ggerganov:master' into non-llama-fim-fix

29a9884

CISC mentioned this pull request May 30, 2024

Only use FIM middle token if it exists #7648

Merged

mofosyne approved these changes May 30, 2024

View reviewed changes

mofosyne added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label May 30, 2024

CISC mentioned this pull request May 30, 2024

Support SPM infill abetlen/llama-cpp-python#1492

Merged

ggerganov reviewed Jun 3, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

extensive token check

f75eedf

CISC requested a review from ggerganov June 3, 2024 14:47

CISC changed the title ~~More checks before assuming FIM tokens for Llama arch~~ More checks before assuming FIM tokens Jun 6, 2024

Merge branch 'ggerganov:master' into non-llama-fim-fix

f4df3ff

ggerganov merged commit 6fcd133 into ggml-org:master Jun 14, 2024
59 of 72 checks passed

CISC deleted the non-llama-fim-fix branch June 14, 2024 10:22

CISC mentioned this pull request Jun 26, 2024

Fix CodeLlama FIM token checks #8144

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More checks before assuming FIM tokens #7644

More checks before assuming FIM tokens #7644

Uh oh!

CISC commented May 30, 2024 •

edited

Loading

Uh oh!

github-actions bot commented May 30, 2024 •

edited

Loading

Uh oh!

Uh oh!

mofosyne commented Jun 9, 2024

Uh oh!

CISC commented Jun 9, 2024

Uh oh!

CISC commented Jun 9, 2024

Uh oh!

CISC commented Jun 13, 2024

Uh oh!

ggerganov commented Jun 14, 2024

Uh oh!

Uh oh!

CISC commented Jun 14, 2024

Uh oh!

Uh oh!

More checks before assuming FIM tokens #7644

More checks before assuming FIM tokens #7644

Uh oh!

Conversation

CISC commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mofosyne commented Jun 9, 2024

Uh oh!

CISC commented Jun 9, 2024

Uh oh!

CISC commented Jun 9, 2024

Uh oh!

CISC commented Jun 13, 2024

Uh oh!

ggerganov commented Jun 14, 2024

Uh oh!

Uh oh!

CISC commented Jun 14, 2024

Uh oh!

Uh oh!

CISC commented May 30, 2024 •

edited

Loading

github-actions bot commented May 30, 2024 •

edited

Loading