server: allow to get default generation settings for completion #5307
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does it do?
The PR adds a new field into the
/props
response -default_generation_settings
. This object contains the default server params that will be used to generate the response. Its contents are exactly the same as in thegeneration_settings
object from the/completion
endpoint.What does it solve?
This PR mainly addresses one of my points in #4216 (comment)
Now API clients can get the context size without any trickery. Before that I used
{"n_predict": 0}
request, but recently it stopped working (#5246) and it was a hack anyway.Also, this PR will allow API clients to get the default inference params before doing any inference. For example, if the API client decides to populate its GUI with these default values.
Implementation
For the new
default_generation_settings
object, I take the first slot available. Here the code assumes that all slots have identical default params. Not sure if this is true or not.The JSON is stored right after creating the slot. We can't get info from the slot itself after that, because it will/may be polluted by API params that user already sent to the server.
This line is kind of awkward:
But i'm not sure how to do it "properly".
Example
The resulting
/props
response example: