You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (ggml-org#11539)
* An empty tool_call_id is better than none!
* sync: minja (tool call name optional google/minja#36)
* Force-disable parallel_tool_calls if template doesn't support it
* More debug logs
* Llama 3.x tools: accept / trigger on more varied spaced outputs
* Fix empty content for functionary v3.2 tool call
* Add proper tool call docs to server README
* readme: function calling *is* supported now
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <[email protected]>
---------
Co-authored-by: Georgi Gerganov <[email protected]>
Copy file name to clipboardExpand all lines: examples/server/README.md
+103-9Lines changed: 103 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -126,7 +126,7 @@ The project is under active development, and we are [looking for feedback and co
126
126
|`--grammar GRAMMAR`| BNF-like grammar to constrain generations (see samples in grammars/ dir) (default: '') |
127
127
|`--grammar-file FNAME`| file to read grammar from |
128
128
|`-j, --json-schema SCHEMA`| JSON schema to constrain generations (https://json-schema.org/), e.g. `{}` for any JSON object<br/>For schemas w/ external $refs, use --grammar + example/json_schema_to_grammar.py instead |
@@ -1069,7 +1069,7 @@ Given a ChatML-formatted json description in `messages`, it returns the predicte
1069
1069
1070
1070
*Options:*
1071
1071
1072
-
See [OpenAI Chat Completions API documentation](https://platform.openai.com/docs/api-reference/chat). While some OpenAI-specific features such as function calling aren't supported, llama.cpp `/completion`-specific features such as `mirostat` are supported.
1072
+
See [OpenAI Chat Completions API documentation](https://platform.openai.com/docs/api-reference/chat). llama.cpp `/completion`-specific features such as `mirostat` are also supported.
1073
1073
1074
1074
The `response_format` parameter supports both plain JSON output (e.g. `{"type": "json_object"}`) and schema-constrained JSON (e.g. `{"type": "json_object", "schema": {"type": "string", "minLength": 10, "maxLength": 100}}` or `{"type": "json_schema", "schema": {"properties": { "name": { "title": "Name", "type": "string" }, "date": { "title": "Date", "type": "string" }, "participants": { "items": {"type: "string" }, "title": "Participants", "type": "string" } } } }`), similar to other OpenAI-inspired API providers.
[Function calling](https://platform.openai.com/docs/guides/function-calling) is supported for all models (see https://github.com/ggerganov/llama.cpp/pull/9639):
1123
+
1124
+
- Requires `--jinja` flag
1125
+
- Native tool call formats supported:
1126
+
- Llama 3.1 / 3.3 (including builtin tools support - tool names for `wolfram_alpha`, `web_search` / `brave_search`, `code_interpreter`), Llama 3.2
1127
+
- Functionary v3.1 / v3.2
1128
+
- Hermes 2/3, Qwen 2.5
1129
+
- Mistral Nemo
1130
+
- Firefunction v2
1131
+
- DeepSeek R1 (WIP / seems reluctant to call any tools?)
1132
+
1133
+
<details>
1134
+
<summary>Show some common templates and which format handler they use</summary>
0 commit comments