Server: Enable setting default sampling parameters via command-line #8402

HanClinto · 2024-07-09T21:31:43Z

In the discussion of #8279, @hopto-dot was (rightfully) confused by the fact that the server CLI will accept a grammar parameter, but then not do anything with it for the individual requests that are served from it. @ngxson helpfully suggested:

llama-server does read the grammar if you specific one. It's just not taken by slot.

You can patch launch_slot_with_task so that slot takes grammar from server_context.params

I was curious what it would take to patch launch_slot_with_task, and -- best I can tell -- this one-liner seems like this is all that one needs to do in order for this to work...?

To test:

Compiled server:
make -j LLAMA_CURL=1
Created a grammar file that doesn't permit the usage of the letter 'e'

root ::= [^eE]*

Ran the server with model set and a default grammar file specified:

./llama-server -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf --grammar-file "./grammars/no-e.gbnf"

Sent a request to the server via CURL that did not set a grammar

curl \
--request POST \
--url http://localhost:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Please write a Haiku about the sun rising in the morning.","n_predict":128}'

Even though I didn't specify any grammar in the request, the generated output complied with the grammar specified when starting the server, and generated poetry without using that particular vowel:

"content": "\n## INPUT\n\n##OUTPUT\nSunlight's warm cafton,\nAkin to a symphony,\nMorning's symphony.\n",

Note that this still lets any user request override individual parameters -- this only sets the default sampling parameters for the server.

Also note that if one goes through the web GUI, then those requests (even those that have a blank grammar box on the site) will set the grammar parameter to whatever is in that box, so it will override anything set when initializing the server (hence why we needed to test this with curl).

If we care about that, then maybe a later PR could prevent the web GUI from sending parameters when they're set to the default. For now, I'd love to know if anyone can think of any "gotcha's" with my simplistic approach, or if it's really as simple as we see here.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

HanClinto · 2024-07-09T21:44:56Z

Small testing session output, and I'm enjoying these e-less Haikus. :)

Bright light of day
Sun's warmth on my skin
Rising to start

In this second one, it clearly wanted to use "ascends" in the final line, but had to change after the first token was already chosen.

In morning light,
Sorrow turns to joy,
Sun ascists high.

And I don't know what to call this one. It started off great, and then that last line just comes out of nowhere. :D

Sun slowly awaking
Colors fill up, day is born
A brand-spanking-smooth start

All that to say, it's highly unlikely that the model would have generated all this poetry without 'e' if it wasn't using the grammar -- so I think it's effective. :)

ngxson

Ha, clever solution. Thanks! It works well on my side:

make llama-server && ./llama-server -m ../models/gemma-2-9b-it-Q4_K_M.gguf --grammar-file ./grammars/json.gbnf

{
    "messages": [
        {"role": "user", "content": "hello"}
    ],
    "max_tokens": 20,
    "temperature": 0
}

Response:

"content": "{\n  \"response\": \"Hello! 👋 How can I help you today?\"\n}",

HanClinto · 2024-07-09T22:15:51Z

Ha, clever solution. Thanks! It works well on my side:

Glad to hear it! :)

I started looking into seeing what it would take to load the chat interface's default parameters from the server, and there is already an endpoint that will serve the parameters -- http://localhost:8080/props

As much as I'd love for this to be a one-line PR, handle_props should probably be updated to grab the new global parameters, rather than json'ing the parameters of an uninitialized slot.

HanClinto · 2024-07-09T22:26:01Z

Ugh. That's too much work to do right now. I think I'm just going to merge this as-is and worry about that edge case later. I strongly suspect that this endpoint is rarely (if ever) used.

ngxson · 2024-07-10T07:37:35Z

Sorry I misread your comment (I thought we're talking about chat template). Yeah seems like /props does not return anything other than some basic server props, so it make sense to returns default sampling params (and maybe model params, though I'm not sure if this should be the job for /models)

HanClinto · 2024-07-10T19:11:59Z

Sorry I misread your comment (I thought we're talking about chat template). Yeah seems like /props does not return anything other than some basic server props, so it make sense to returns default sampling params (and maybe model params, though I'm not sure if this should be the job for /models)

Okay, sounds good. I'll try and see what I can do about updating this. I'm not sure if this endpoint is used by anyone or anything else, but unless I have reason to change it to something else, then I'll try to keep it in the existing format (with all the same fields) and simply update it to return the default server parameters.

…gml-org#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

Load server sampling parameters from the server context by default.

6cfa080

github-actions bot added examples server labels Jul 9, 2024

HanClinto mentioned this pull request Jul 9, 2024

Add option to ignore tokens with 2+ English characters #8279

Closed

4 tasks

Wordsmithing comment

775f893

HanClinto changed the title ~~Load server sampling parameters from the server context by default.~~ Server: Enable setting default sampling parameters via command-line Jul 9, 2024

ngxson approved these changes Jul 9, 2024

View reviewed changes

HanClinto merged commit a59f8fd into ggml-org:master Jul 9, 2024
53 checks passed

This comment was marked as off-topic.

Sign in to view

hopto-dot mentioned this pull request Jul 10, 2024

[BUG] Respect llama.cpp server default grammar SillyTavern/SillyTavern#2505

Closed

3 tasks

HanClinto mentioned this pull request Jul 10, 2024

Server: Update /props endpoint to correctly return default server parameters #8418

Merged

4 tasks

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 11, 2024

Server: Enable setting default sampling parameters via command-line (g…

5b6cef7

…gml-org#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024

Server: Enable setting default sampling parameters via command-line (g…

1a5d611

…gml-org#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024

Server: Enable setting default sampling parameters via command-line (g…

ae3a78a

…gml-org#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server: Enable setting default sampling parameters via command-line #8402

Server: Enable setting default sampling parameters via command-line #8402

Uh oh!

HanClinto commented Jul 9, 2024

Uh oh!

HanClinto commented Jul 9, 2024

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

HanClinto commented Jul 9, 2024

Uh oh!

HanClinto commented Jul 9, 2024 •

edited

Loading

Uh oh!

Uh oh!

This comment was marked as off-topic.

ngxson commented Jul 10, 2024

Uh oh!

HanClinto commented Jul 10, 2024

Uh oh!

Uh oh!

Server: Enable setting default sampling parameters via command-line #8402

Server: Enable setting default sampling parameters via command-line #8402

Uh oh!

Conversation

HanClinto commented Jul 9, 2024

Uh oh!

HanClinto commented Jul 9, 2024

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HanClinto commented Jul 9, 2024

Uh oh!

HanClinto commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

ngxson commented Jul 10, 2024

Uh oh!

HanClinto commented Jul 10, 2024

Uh oh!

Uh oh!

ngxson left a comment •

edited

Loading

HanClinto commented Jul 9, 2024 •

edited

Loading