Skip to content

Commit fe73ead

Browse files
jparkerwebjordankanter
authored andcommitted
server : update /props with "total_slots" value (ggml-org#5373)
* include total "num_slots" in default_generation_settings_for_props * cleanup total_slots return value in /props endpoint * update /props endpoint docs with total_slots * remove num_slots from default_generation_settings_for_props * update /props endpoint section
1 parent 306656d commit fe73ead

File tree

2 files changed

+5
-3
lines changed

2 files changed

+5
-3
lines changed

examples/server/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -276,13 +276,15 @@ Notice that each `probs` is an array of length `n_probs`.
276276
{
277277
"assistant_name": "",
278278
"user_name": "",
279-
"default_generation_settings": { ... }
279+
"default_generation_settings": { ... },
280+
"total_slots": 1
280281
}
281282
```
282283

283284
- `assistant_name` - the required assistant name to generate the prompt in case you have specified a system prompt for all slots.
284285
- `user_name` - the required anti-prompt to generate the prompt in case you have specified a system prompt for all slots.
285286
- `default_generation_settings` - the default generation settings for the `/completion` endpoint, has the same fields as the `generation_settings` response object from the `/completion` endpoint.
287+
- `total_slots` - the total number of slots for process requests (defined by `--parallel` option)
286288

287289
- **POST** `/v1/chat/completions`: OpenAI-compatible Chat Completions API. Given a ChatML-formatted json description in `messages`, it returns the predicted completion. Both synchronous and streaming mode are supported, so scripted and interactive applications work fine. While no strong claims of compatibility with OpenAI API spec is being made, in our experience it suffices to support many apps. Only ChatML-tuned models, such as Dolphin, OpenOrca, OpenHermes, OpenChat-3.5, etc can be used with this endpoint. Compared to `api_like_OAI.py` this API implementation does not require a wrapper to be served.
288290

examples/server/server.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -432,7 +432,6 @@ struct llama_server_context
432432
}
433433

434434
default_generation_settings_for_props = get_formated_generation(slots.front());
435-
default_generation_settings_for_props["num_slots"] = params.n_parallel;
436435
default_generation_settings_for_props["seed"] = -1;
437436

438437
batch = llama_batch_init(n_ctx, 0, params.n_parallel);
@@ -2639,7 +2638,8 @@ int main(int argc, char **argv)
26392638
json data = {
26402639
{ "user_name", llama.name_user.c_str() },
26412640
{ "assistant_name", llama.name_assistant.c_str() },
2642-
{ "default_generation_settings", llama.default_generation_settings_for_props }
2641+
{ "default_generation_settings", llama.default_generation_settings_for_props },
2642+
{ "total_slots", llama.params.n_parallel }
26432643
};
26442644
res.set_content(data.dump(), "application/json; charset=utf-8");
26452645
});

0 commit comments

Comments
 (0)