Skip to content

Commit cb0b06a

Browse files
authored
json: update grammars/README w/ examples & note about additionalProperties (#8132)
* json: update grammars/README * mention broken prefixItems * add mention to llama-gbnf-validator * json: explicit type: object for nested items object in cli example
1 parent 558f44b commit cb0b06a

File tree

1 file changed

+235
-10
lines changed

1 file changed

+235
-10
lines changed

grammars/README.md

Lines changed: 235 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -126,19 +126,244 @@ You can use GBNF grammars:
126126
- in CLI, with [examples/json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)
127127
- in JavaScript with [json-schema-to-grammar.mjs](../examples/server/public/json-schema-to-grammar.mjs) (this is used by the [server](../examples/server)'s Web UI)
128128

129-
Take a look at [tests](../../tests/test-json-schema-to-grammar.cpp) to see which features are likely supported (you'll also find usage examples in https://github.com/ggerganov/llama.cpp/pull/5978, https://github.com/ggerganov/llama.cpp/pull/6659 & https://github.com/ggerganov/llama.cpp/pull/6555).
129+
Take a look at [tests](../tests/test-json-schema-to-grammar.cpp) to see which features are likely supported (you'll also find usage examples in https://github.com/ggerganov/llama.cpp/pull/5978, https://github.com/ggerganov/llama.cpp/pull/6659 & https://github.com/ggerganov/llama.cpp/pull/6555).
130+
131+
```bash
132+
llama-cli \
133+
-hfr bartowski/Phi-3-medium-128k-instruct-GGUF \
134+
-hff Phi-3-medium-128k-instruct-Q8_0.gguf \
135+
-j '{
136+
"type": "array",
137+
"items": {
138+
"type": "object",
139+
"properties": {
140+
"name": {
141+
"type": "string",
142+
"minLength": 1,
143+
"maxLength": 100
144+
},
145+
"age": {
146+
"type": "integer",
147+
"minimum": 0,
148+
"maximum": 150
149+
}
150+
},
151+
"required": ["name", "age"],
152+
"additionalProperties": false
153+
},
154+
"minItems": 10,
155+
"maxItems": 100
156+
}' \
157+
-p 'Generate a {name, age}[] JSON array with famous actors of all ages.'
158+
```
159+
160+
<details>
161+
162+
<summary>Show grammar</summary>
163+
164+
You can convert any schema in command-line with:
165+
166+
```bash
167+
examples/json_schema_to_grammar.py name-age-schema.json
168+
```
169+
170+
```
171+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
172+
item ::= "{" space item-name-kv "," space item-age-kv "}" space
173+
item-age ::= ([0-9] | ([1-8] [0-9] | [9] [0-9]) | "1" ([0-4] [0-9] | [5] "0")) space
174+
item-age-kv ::= "\"age\"" space ":" space item-age
175+
item-name ::= "\"" char{1,100} "\"" space
176+
item-name-kv ::= "\"name\"" space ":" space item-name
177+
root ::= "[" space item ("," space item){9,99} "]" space
178+
space ::= | " " | "\n" [ \t]{0,20}
179+
```
180+
181+
</details>
182+
183+
Here is also a list of known limitations (contributions welcome):
184+
185+
- Unsupported features are skipped silently. It is currently advised to use the command-line Python converter (see above) to see any warnings, and to inspect the resulting grammar / test it w/ [llama-gbnf-validator](../examples/gbnf-validator/gbnf-validator.cpp).
186+
- Can't mix `properties` w/ `anyOf` / `oneOf` in the same type (https://github.com/ggerganov/llama.cpp/issues/7703)
187+
- [prefixItems](https://json-schema.org/draft/2020-12/json-schema-core#name-prefixitems) is broken (but [items](https://json-schema.org/draft/2020-12/json-schema-core#name-items) works)
188+
- `minimum`, `exclusiveMinimum`, `maximum`, `exclusiveMaximum`: only supported for `"type": "integer"` for now, not `number`
189+
- Nested `$ref`s are broken (https://github.com/ggerganov/llama.cpp/issues/8073)
190+
- [pattern](https://json-schema.org/draft/2020-12/json-schema-validation#name-pattern)s must start with `^` and end with `$`
191+
- Remote `$ref`s not supported in the C++ version (Python & JavaScript versions fetch https refs)
192+
- `string` [formats](https://json-schema.org/draft/2020-12/json-schema-validation#name-defined-formats) lack `uri`, `email`
193+
- No [`patternProperties`](https://json-schema.org/draft/2020-12/json-schema-core#name-patternproperties)
130194

131-
Here is also a non-exhaustive list of **unsupported** features:
195+
And a non-exhaustive list of other unsupported features that are unlikely to be implemented (hard and/or too slow to support w/ stateless grammars):
132196

133-
- `additionalProperties`: to be fixed in https://github.com/ggerganov/llama.cpp/pull/7840
134-
- `minimum`, `exclusiveMinimum`, `maximum`, `exclusiveMaximum`
135-
- `integer` constraints to be implemented in https://github.com/ggerganov/llama.cpp/pull/7797
136-
- Remote `$ref`s in the C++ version (Python & JavaScript versions fetch https refs)
137-
- Mixing `properties` w/ `anyOf` / `oneOf` in the same type (https://github.com/ggerganov/llama.cpp/issues/7703)
138-
- `string` formats `uri`, `email`
197+
- [`uniqueItems`](https://json-schema.org/draft/2020-12/json-schema-validation#name-uniqueitems)
139198
- [`contains`](https://json-schema.org/draft/2020-12/json-schema-core#name-contains) / `minContains`
140-
- `uniqueItems`
141199
- `$anchor` (cf. [dereferencing](https://json-schema.org/draft/2020-12/json-schema-core#name-dereferencing))
142200
- [`not`](https://json-schema.org/draft/2020-12/json-schema-core#name-not)
143201
- [Conditionals](https://json-schema.org/draft/2020-12/json-schema-core#name-keywords-for-applying-subsche) `if` / `then` / `else` / `dependentSchemas`
144-
- [`patternProperties`](https://json-schema.org/draft/2020-12/json-schema-core#name-patternproperties)
202+
203+
### A word about additionalProperties
204+
205+
> [!WARNING]
206+
> By default, `object`s accept [additional properties](https://json-schema.org/understanding-json-schema/reference/object#additionalproperties), which you might not want / not expect, and which will make sampling slower (not just because of the extra tokens, but also generates a slower grammar).
207+
> You can set `"additionalProperties": false` on the schema of any object to ensure only properties listed in `properties` are generated (not needed for non-`object` types, e.g. `array` or `string`).
208+
209+
If you're using [Pydantic](https://pydantic.dev/) to generate schemas, you can disable additional properties with the `extra` config on each model class:
210+
211+
```python
212+
# pip install pydantic
213+
import json
214+
from typing import Annotated, List
215+
from pydantic import BaseModel, Extra, Field
216+
class QAPair(BaseModel):
217+
class Config:
218+
extra = 'forbid' # triggers additionalProperties: false in the JSON schema
219+
question: str
220+
concise_answer: str
221+
justification: str
222+
223+
class Summary(BaseModel):
224+
class Config:
225+
extra = 'forbid'
226+
key_facts: List[Annotated[str, Field(pattern='- .{5,}')]]
227+
question_answers: List[Annotated[List[QAPair], Field(min_items=5)]]
228+
229+
print(json.dumps(Summary.model_json_schema(), indent=2))
230+
```
231+
232+
<details>
233+
<summary>Show JSON schema & grammar</summary>
234+
235+
```json
236+
{
237+
"$defs": {
238+
"QAPair": {
239+
"additionalProperties": false,
240+
"properties": {
241+
"question": {
242+
"title": "Question",
243+
"type": "string"
244+
},
245+
"concise_answer": {
246+
"title": "Concise Answer",
247+
"type": "string"
248+
},
249+
"justification": {
250+
"title": "Justification",
251+
"type": "string"
252+
}
253+
},
254+
"required": [
255+
"question",
256+
"concise_answer",
257+
"justification"
258+
],
259+
"title": "QAPair",
260+
"type": "object"
261+
}
262+
},
263+
"additionalProperties": false,
264+
"properties": {
265+
"key_facts": {
266+
"items": {
267+
"pattern": "^- .{5,}$",
268+
"type": "string"
269+
},
270+
"title": "Key Facts",
271+
"type": "array"
272+
},
273+
"question_answers": {
274+
"items": {
275+
"items": {
276+
"$ref": "#/$defs/QAPair"
277+
},
278+
"minItems": 5,
279+
"type": "array"
280+
},
281+
"title": "Question Answers",
282+
"type": "array"
283+
}
284+
},
285+
"required": [
286+
"key_facts",
287+
"question_answers"
288+
],
289+
"title": "Summary",
290+
"type": "object"
291+
}
292+
```
293+
294+
```
295+
QAPair ::= "{" space QAPair-question-kv "," space QAPair-concise-answer-kv "," space QAPair-justification-kv "}" space
296+
QAPair-concise-answer-kv ::= "\"concise_answer\"" space ":" space string
297+
QAPair-justification-kv ::= "\"justification\"" space ":" space string
298+
QAPair-question-kv ::= "\"question\"" space ":" space string
299+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
300+
dot ::= [^\x0A\x0D]
301+
key-facts ::= "[" space (key-facts-item ("," space key-facts-item)*)? "]" space
302+
key-facts-item ::= "\"" "- " key-facts-item-1{5,} "\"" space
303+
key-facts-item-1 ::= dot
304+
key-facts-kv ::= "\"key_facts\"" space ":" space key-facts
305+
question-answers ::= "[" space (question-answers-item ("," space question-answers-item)*)? "]" space
306+
question-answers-item ::= "[" space question-answers-item-item ("," space question-answers-item-item){4,} "]" space
307+
question-answers-item-item ::= QAPair
308+
question-answers-kv ::= "\"question_answers\"" space ":" space question-answers
309+
root ::= "{" space key-facts-kv "," space question-answers-kv "}" space
310+
space ::= | " " | "\n" [ \t]{0,20}
311+
string ::= "\"" char* "\"" space
312+
```
313+
314+
</details>
315+
316+
If you're using [Zod](https://zod.dev/), you can make your objects explicitly strict w/ `z.object(...).strict()` or `z.strictObject(...)`.
317+
318+
Note however that [zod-to-json-schema](https://github.com/StefanTerdell/zod-to-json-schema) currently always seems to set `"additionalProperties": false` anyway (even w/ zod schemas on which `nonstrict()` / `passthrough()` was called).
319+
320+
```js
321+
import { z } from 'zod';
322+
import { zodToJsonSchema } from 'zod-to-json-schema';
323+
324+
const Foo = z.object({
325+
age: z.number().positive(),
326+
email: z.string().email(),
327+
}).strict();
328+
329+
console.log(zodToJsonSchema(Foo));
330+
```
331+
332+
<details>
333+
<summary>Show JSON schema & grammar</summary>
334+
335+
```json
336+
{
337+
"type": "object",
338+
"properties": {
339+
"age": {
340+
"type": "number",
341+
"exclusiveMinimum": 0
342+
},
343+
"email": {
344+
"type": "string",
345+
"format": "email"
346+
}
347+
},
348+
"required": [
349+
"age",
350+
"email"
351+
],
352+
"additionalProperties": false,
353+
"$schema": "http://json-schema.org/draft-07/schema#"
354+
}
355+
```
356+
357+
```
358+
age-kv ::= "\"age\"" space ":" space number
359+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
360+
decimal-part ::= [0-9]{1,16}
361+
email-kv ::= "\"email\"" space ":" space string
362+
integral-part ::= [0] | [1-9] [0-9]{0,15}
363+
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
364+
root ::= "{" space age-kv "," space email-kv "}" space
365+
space ::= | " " | "\n" [ \t]{0,20}
366+
string ::= "\"" char* "\"" space
367+
```
368+
369+
</details>

0 commit comments

Comments
 (0)