bugfix: Respect n_predict=-2 in server (#12264) #12323

ishaangandhi · 2025-03-10T22:43:38Z

This pull request fixes issue #12264: Eval bug: server API endpoint not respecting n_predict with -2 (until context filled).

Previously, if you set n_predict to -2, the server would ignore that it had a special meaning, and immediately stop producing tokens with reason as length.

curl --location 'http://localhost:8080/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer no-key' \
    --data '{
    "messages": [
        {
            "role": "user",
            "content": "Write a minimum 5,000 word essay (30+ pages) on the history of the United States, starting with the American Revolution."
        }
    ],
    "n_predict": -2
    }'

After the change, we get this (correct) output:

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nAlright, I'm supposed to write a minimum 5,000-word essay on the history of the United States,...His leadership was a symbol of unity and cooperation, and he was also seen as a symbol of unity that changed the way the colonies behaved.\n\n"
      }
    }
  ],
  "created": 1741646088,
  "model": "gpt-3.5-turbo",
  "system_fingerprint": "b4869-2c9f833d",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 4096,
    "prompt_tokens": 34,
    "total_tokens": 4130
  },
  "id": "chatcmpl-LFedbjPLZLJyrIP4mc9ax0uuXLMaX4a9",
  "timings": {
    "prompt_n": 32,
    "prompt_ms": 165.535,
    "prompt_per_token_ms": 5.17296875,
    "prompt_per_second": 193.31259250309603,
    "predicted_n": 4096,
    "predicted_ms": 79655.512,
    "predicted_per_token_ms": 19.447146484375,
    "predicted_per_second": 51.42142580164446
  }
}

We now indeed stop for length, but only after producing 4096 completion tokens.

ngxson · 2025-03-11T13:45:51Z

Could you add a small test case for it? See server/tests/test_completion.py

ishaangandhi · 2025-03-11T14:31:52Z

@ngxson Done!

examples/server/server.cpp

ngxson · 2025-03-12T10:26:13Z

I think this may break another test case, could you check it @ishaangandhi ?

CI didn't pass

ishaangandhi · 2025-03-12T13:14:52Z

@ngxson Got it. I rebased and allowed n_predict to be infinity for that test.

The issue was the inferred text matched on the first token, and the server was only producing 1 token.

They should now pass. Can you re-run the CI tests?

ggerganov · 2025-03-13T08:12:55Z

Restarted the tests - let's see how it goes.

ngxson · 2025-03-13T10:14:52Z

examples/server/tests/unit/test_completion.py

@@ -143,13 +143,15 @@ def test_consistent_result_same_seed(n_slots: int):
 def test_different_result_different_seed(n_slots: int):
    global server
    server.n_slots = n_slots
+    server.n_predict = -1


I don't get why it is needed here. The default value of n_predict is already -1, you should not change a test case that is unrelated the the currently change, that's the whole I idea of having tests: to make sure that your change does not break exiting use cases.

There is a bit of a readability and maintainability issue in this file in that the server object is global, and as a test fixture is resused across tests, even though every test changes its parameters.

ngxson · 2025-03-13T10:17:27Z

examples/server/server.cpp

+        // The request or server have specified limits on the number of tokens to generate.
+        if ((params.n_predict >= 0) || (global_params.n_predict  >= 0)) {
+            n_remaining = std::min(n_remaining, params.n_predict - n_decoded);
+        }

-        if (params.n_predict != -1) {
-            n_remaining = params.n_predict - n_decoded;
-        } else if (global_params.n_predict != -1) {
-            n_remaining = global_params.n_predict - n_decoded;
+        // The request or server have limits based on the context window.
+        if (params.n_predict == -2 || global_params.n_predict == -2) {
+            n_remaining = std::min(n_remaining, n_ctx - n_decoded);


Tbh this code become a bit hard to understand now (and probably that's why the CI fails)

What I recommend here is instead of having this double if check, it should be an if..else like before.

Indeed, my idea is much simple, why risking modifying all of this while we can just set n_predict = n_ctx - n_prompt when the n_predict == -2 ?

ngxson · 2025-03-13T10:51:54Z

superseded by #12364

ishaangandhi requested a review from ngxson as a code owner March 10, 2025 22:43

github-actions bot added examples server labels Mar 10, 2025

github-actions bot added the python python script changes label Mar 11, 2025

ngxson reviewed Mar 11, 2025

View reviewed changes

examples/server/server.cpp Outdated Show resolved Hide resolved

ggerganov reviewed Mar 11, 2025

View reviewed changes

examples/server/server.cpp Show resolved Hide resolved

ngxson mentioned this pull request Mar 12, 2025

server: Added support for n_predict values of -2 #12347

Closed

ngxson previously approved these changes Mar 12, 2025

View reviewed changes

ishaangandhi added 4 commits March 12, 2025 09:06

Respect n_predict=-2 in server

b85e149

Remove test.sh

f3fdca7

Add test that when n_predict=-2 predicted_n==n_ctx

7199eb9

Improve readability

8511ec5

ishaangandhi force-pushed the respect-n_predict-2 branch from 8826cf5 to 8511ec5 Compare March 12, 2025 13:09

Unlimit the n_predict in slots

ff41929

Create new server settings object for test

f94e105

ngxson reviewed Mar 13, 2025

View reviewed changes

ngxson mentioned this pull request Mar 13, 2025

arg : no n_predict = -2 for examples except for main and infill #12364

Merged

ngxson closed this Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bugfix: Respect n_predict=-2 in server (#12264) #12323

bugfix: Respect n_predict=-2 in server (#12264) #12323

Uh oh!

ishaangandhi commented Mar 10, 2025

Uh oh!

ngxson commented Mar 11, 2025

Uh oh!

ishaangandhi commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

ngxson commented Mar 12, 2025

Uh oh!

ishaangandhi commented Mar 12, 2025

Uh oh!

ggerganov commented Mar 13, 2025

Uh oh!

ngxson Mar 13, 2025

Uh oh!

ishaangandhi Mar 13, 2025

Uh oh!

ngxson Mar 13, 2025 •

edited

Loading

Uh oh!

ngxson commented Mar 13, 2025

Uh oh!

Uh oh!

bugfix: Respect n_predict=-2 in server (#12264) #12323

bugfix: Respect n_predict=-2 in server (#12264) #12323

Uh oh!

Conversation

ishaangandhi commented Mar 10, 2025

Uh oh!

ngxson commented Mar 11, 2025

Uh oh!

ishaangandhi commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

ngxson commented Mar 12, 2025

Uh oh!

ishaangandhi commented Mar 12, 2025

Uh oh!

ggerganov commented Mar 13, 2025

Uh oh!

ngxson Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

ishaangandhi Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson commented Mar 13, 2025

Uh oh!

Uh oh!

ngxson Mar 13, 2025 •

edited

Loading