Description
[Filing at the request of @epoberezkin in #15]
Background
This is a proposed solution for one of the ways people want to re-use schemas that declare "additionalProperties": false
. It could be made part of the JSON Schema specification, either as a required or optional feature, or it could simply be published to illustrate how a completely separate tool could be built to work around the limitation. This separability hinges on the proposal being in the form of a preprocessing step.
The reason to have the separability is that there is no other way to address this use case without violating at least one desirable property of JSON Schema validation, or introducing arbitrary JSON transformation capabilities into a standard that should be focused on schemas.
Allowing for partial implementation may also be a good idea, as any full solution will be complex to implement. A useful but limited solution is quite easy.
Keywords $combine
and $combinable
There are two keywords in this proposal, both marked with a $
to indicate that they (like $ref
) are structural elements outside of validation proper, and can be preprocessed out (either entirely, or lazily as needed in the case of $ref
cycles).
$combine
implements the oft-requested re-use pattern of applying additionalProperties
only after inventorying all properties
and patternProperties
in all constituent schemas.
$combinable
is a boolean keyword. If false
, then $combine
has no special effect on the schema- it behaves like allOf
(explained in detail below). This preserves an the use case of using additionalProperties
to intentionally close a schema off to this sort of "consider all properties" extension.
If $combinable
defaults to true
, then current schemas must opt-out of being usable with $combine
. If it defaults to false
, then current schemas must opt-in. Opting in would be compatible with the current draft, so I recommend that $combinable
default to false
to avoid automatically circumventing any intent to close a schema to re-use.
Formal description of the limitation of additionalProperties
Note: This assumes that #101 (allow true
and false
for all child schemas) is implemented, as it makes this whole thing about ten times more clear).
For nearly all validation keywords k1, k2, k3... the following (which @awwright referred to as "linearity" in issue #65 ) holds true:
{
"k1": "something",
"k2": {},
"k3": ["etc."]
}
is equivalent to
{
"allOf": [
{"k1": "something"},
{"k2": {}},
{"k3": ["etc."]}
]
}
There are only a few exceptions (see also issue #77 ), and they tend to be vexing. additionalProperties
is the one that trips people up the most. Here are its properties in terms of "schema algebra" (for lack of a better term):
{
"required": ["bar", "x-awesomeness"],
"minProperties": 5,
"maxProperties": 10,
"properties": {
"foo": {"type": "number"},
"bar": {"type": "boolean"}
},
"patternProperties": {
"^x-": {"type": "string"},
"maybe$": {"type": ["boolean", "null"]}
},
"additionalProperties": {
"type": "object"
}
}
is equivalent to
{
"allOf": [
{"required": ["bar"]},
{"required": ["x-awesomeness"]},
{"minProperties": 5},
{"maxProperties": 10},
{"properties": {"foo": {"type": "number"}}},
{"properties": {"bar": {"type": "boolean"}}},
{"patternProperties": {"^x-": {"type": "string"}}},
{"patternProperties": {"maybe$": {"type": ["boolean", "null"]}}},
{
"additionalProperties": {"type": "object"},
"properties": {"foo": true, "bar": true},
"patternProperties": {"^x-": true, "maybe$": true}
}
]
}
You can separate each entry in properties
and patternProperties
, and handle the child validation in each of those separate entries, but the only simplification that can be done to a factored-out additionalProperties
is to reduce the properties
and patternProperties
down to just validating the instance object property names (by giving blank or true
child schemas) and leaving the child validation to the separated schemas.
This is because additionalProperties
determines how to behave by taking the set of instance properties, removing all properties that match either properties
or patternProperties
, and applying its child schema to the value of any properties that remain. So the child schemas in properties
and patternProperties
have no effect on additionalProperties
and can be factored out, but the property names and patterns do.
This is fundamentally why re-using schemas that use additionalProperties
is so problematic for many people. It comes up most when it's set to false
, but the underlying problem affects any use of a non-default additionalProperties
.
Note that required
, minProperties
, and maxProperties
do not have to be carried into the factored-out additionalProperties
schema, as they do not impact the behavior of additionalProperties
. Issue #65 proposes also removing any property named in required
from the set of properties affected by additionalProperties
, which would mean that required
would also need to be carried along in the factored-out schema. This is why I oppose #65.
Re-use and additionalProperties
In the thread about use cases for JSON Schema re-use, we agreed that there have been several use cases motivating various attempts to get around this algebraic limitation. A simple example of what's desired is that people want this:
{
"type": "object",
"allOf": [
{"properties": {"foo": {"type": "boolean"}}, "additionalProperties": false},
{"properties": {"bar": {"type": "number"}}, "required": ["bar"]}
]
}
to be equivalent to
{
"type": "object",
"properties": {"foo": {"type": "boolean"}, "bar": {"type": "number"}},
"additionalProperties": false,
"required": ["bar"]
}
Instead, that allOf
combination can never validate- the foo schema forbids any property other than "foo", while the bar schema requires the property "bar".
Note that an analogous problem occurs if additionalProperties
is set to any non-blank schema. For instance, if it was set to {"type": "string"}
it would still be an impossible schema as that would be applied to "bar" which already required its value to be a number.
patternProperties
behaves in an analogous way, so I mostly will ignore it to keep things (relatively) simple.
Solving the problem with a pre-processing step
I propose to solve this with a keyword, $combine
, indicating a pre-processing step:
- It should start with a
$
to indicate that, like$ref
, there is something structural going on - The keyword simply applies the sort of factoring-out of
additionalProperties
shown at the beginning of this proposal - As a pre-processing step, it can easily be published separately, outside of the JSON Schema spec proper
- It preserves the context-free validation property by ensuring that all schemas can be programmatically converted to schemas that validate in a context-free manner
Example conversion
Here is the conversion, replacing allOf
with $combine
. After preprocessing, this:
{
"type": "object",
"$combine": [
{"properties": {"foo": {"type": "boolean"}}, "additionalProperties": false},
{"properties": {"bar": {"type": "number"}}, "required": ["bar"]}
]
}
becomes this:
{
"type": "object",
"allOf": [
{"properties": {"foo": {"type": "boolean"}}},
{"properties": {"bar": {"type": "number"}}, "required": ["bar"]},
{"properties": {"foo": true, "bar": true}, "additionalProperties": false}
]
}
The only change to the first two entries is removing additionalProperties
. The new third entry just collects the various properties
(and patternProperties
if they were present) with blank schemas, and includes them with the factored-out "additionalProperties": false
.
The same thing would happen when considering our alternate example with "additionalProperties": {"type": "string"}
. The first two would look exactly as they do above, and the third would look the same except for {"type": "string"}
in place of false
.
Difficulty and optional-ness
Making these conversions is easy in the simple case. Even when there are child schemas, it's pretty easy to implement:
{
"$combine": [
{"properties": {
"foo": {"properties": {"bar": X}}
}},
{"properties": {
"foo": {"properties": {"bar": Y}}
}}
]
}
is equivalent to:
{
"properties": {
"foo": {"properties": {"bar": {"$combine": [X, Y]}}}
}
}
$combine
gets challenging when combining complex schemas that themselves involve boolean keywords. Implementing $combine
in such a situation requires implementing a kind of "schema algebra". Requiring this of all implementations seems excessively burdensome. As with format
, I'd recommend specifying this but leaving it up to implementations whether to support all or part of the feature. In particular, an implementation may want to implement $combine
except not support combining schemas that contain boolean keywords.
Here are the challenges for the boolean keywords:
allOf
Since we are converting to an allOf
already, and boolean AND is associative, this is straightforward. It's just more branches to combine but the mechanism is the same. So {"$combine": [A, "allOf": [B, C]]}
is the same as {"$combine": [A, B, C]}
anyOf
An anyOf
needs to be factored out, as the collection of properties
and patternProperties
should not draw from all branches, but only from one at a time. So {"$combine": [A, "anyOf": [B, C]]}
becomes {"anyOf": [{"$combine": [A, B]}, {"$combine": [A, C]}]}
, which reduces the problem to our existing rules.
not
This gets a little tricky. The fundamental principle with not
is that:
{
"not": {
"properties": {"foo": {"type": "number"}},
"patternProperties": {"^x-": {"type": "string"}},
"additionalProperties": {"type": "boolean"}}
}
}
is equivalent to
{
"not": {
"allOf": [
{"properties": {"foo": {}}},
{"patternProperties": {"^x-": {}}},
{
"properties": {"foo": {}},
"patternProperties": {"^x-": {}},
"additionalProperties": {"type": "boolean"}
}
]
}
}
which, by DeMorgan's law, is equivalent to:
{
"anyOf": [
{"properties": {"foo": {"not": {"type": "number"}}}},
{"patternProperties": {"^x-": {"not": {"type": "string"}}}},
{
"properties": {"foo": true},
"patternProperties": {"^x-": true},
"additionalProperties": {"not": {"type": "boolean"}}
}
]
}
The not
doesn't affect the validation of the presence or absence of the properties that are named or matched by a pattern. It only affects the validation of child values.
NOTE: Fully covering all of the things that can happen with not
is complicated, and may require adding some new keywords like minAdditionalProperties
to make possible in all cases. I'll elaborate on these issues if there is ever momentum behind this proposal. They are all solvable.
oneOf
As {"oneOf": [A, B, C]}
is equivalent to:
{
"anyOf": [
{"allOf": [A, {"not": B}, {"not": C}]},
{"allOf": [{"not": A}, B, {"not": C}]},
{"allOf": [{"not": A}, {"not": B}, C]}
]
}
It can be handled from this transformed point with the already established rules.
dependencies
Name dependencies are unaffected, but when dealing with schema dependencies we have to handle the conditional nature of how the dependent schemas are applied. This involves observing that:
{
"dependencies": {"foo": X},
Y
}
Where Y is the rest of the schema containing dependencies
, is equivalent to:
{
"oneOf": [
{
"required": ["foo"],
"allOf": [X, Y]
},
{
"properties": {"foo": false},
X
}
]
}
This reduces the problem to a combination of oneOf
and allOf
, which can therefore be reduced to a combination of allOf
, anyOf
, and not
, which gets us back to our known rules.
Conclusion
This is, as far as I can tell, the only way to get this behavior while staying within the philosophical boundaries of JSON Schema.
I am not personally convinced that JSON Schema should directly incorporate any of the proposed "solutions" for the "additionalProperties": false
"problem", but @epoberezkin requested in #15 that this be filed for the record.
I strongly prefer this (plus #98 ) over introducing arbitrary JSON transformations as is done by #15 , but I would also be fine with rejecting both proposals (or even all three proposals including #98 ). All three proposals could be implemented separately and used either as a build step, or as an extended vocabulary indicated by another meta-schema.