-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Server: Enable setting default sampling parameters via command-line #8402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Small testing session output, and I'm enjoying these e-less Haikus. :)
In this second one, it clearly wanted to use "ascends" in the final line, but had to change after the first token was already chosen.
And I don't know what to call this one. It started off great, and then that last line just comes out of nowhere. :D
All that to say, it's highly unlikely that the model would have generated all this poetry without 'e' if it wasn't using the grammar -- so I think it's effective. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, clever solution. Thanks! It works well on my side:
make llama-server && ./llama-server -m ../models/gemma-2-9b-it-Q4_K_M.gguf --grammar-file ./grammars/json.gbnf
{
"messages": [
{"role": "user", "content": "hello"}
],
"max_tokens": 20,
"temperature": 0
}
Response:
"content": "{\n \"response\": \"Hello! 👋 How can I help you today?\"\n}",
Glad to hear it! :) I started looking into seeing what it would take to load the chat interface's default parameters from the server, and there is already an endpoint that will serve the parameters -- http://localhost:8080/props As much as I'd love for this to be a one-line PR, |
Ugh. That's too much work to do right now. I think I'm just going to merge this as-is and worry about that edge case later. I strongly suspect that this endpoint is rarely (if ever) used. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry I misread your comment (I thought we're talking about chat template). Yeah seems like |
Okay, sounds good. I'll try and see what I can do about updating this. I'm not sure if this endpoint is used by anyone or anything else, but unless I have reason to change it to something else, then I'll try to keep it in the existing format (with all the same fields) and simply update it to return the default server parameters. |
…gml-org#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment
…gml-org#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment
…gml-org#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment
In the discussion of #8279, @hopto-dot was (rightfully) confused by the fact that the server CLI will accept a grammar parameter, but then not do anything with it for the individual requests that are served from it. @ngxson helpfully suggested:
I was curious what it would take to patch
launch_slot_with_task
, and -- best I can tell -- this one-liner seems like this is all that one needs to do in order for this to work...?To test:
Compiled server:
make -j LLAMA_CURL=1
Created a grammar file that doesn't permit the usage of the letter 'e'
./llama-server -mu https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf --grammar-file "./grammars/no-e.gbnf"
Even though I didn't specify any grammar in the request, the generated output complied with the grammar specified when starting the server, and generated poetry without using that particular vowel:
Note that this still lets any user request override individual parameters -- this only sets the default sampling parameters for the server.
Also note that if one goes through the web GUI, then those requests (even those that have a blank grammar box on the site) will set the
grammar
parameter to whatever is in that box, so it will override anything set when initializing the server (hence why we needed to test this withcurl
).If we care about that, then maybe a later PR could prevent the web GUI from sending parameters when they're set to the default. For now, I'd love to know if anyone can think of any "gotcha's" with my simplistic approach, or if it's really as simple as we see here.