-
Notifications
You must be signed in to change notification settings - Fork 265
draft: augment yaml definitions with test schema #1659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v2.x
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good idea to have the schema of the source collection.
@@ -1,6 +1,6 @@ | |||
# $schema: ../schema.json | |||
name: $accumulator | |||
link: 'https://www.mongodb.com/docs/manual/reference/operator/aggregation/accumulator/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to keep quotes around strings? This style changes make the diff hard to read. Also, I use quotes by default around every string to prevent unexpected parsing rule from Yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, looks like the quotes usage is inconsistent in these files - I'd be inclined to define a prettier rule and reformat them in a separate PR to make this one less noisy if that'd make sense to you. The reason for these getting removed is that the way we're adding the schema information is parsing the yaml, fetching the schema from the docs, then writing it, which results in the file getting reformatted based on the tool conventions (we use https://www.npmjs.com/package/js-yaml here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, at least prettier
supports YAML out of the box, so that feels like a great way to remove any formatting preference ambiguity
@@ -25,7 +25,7 @@ arguments: | |||
- | |||
name: p | |||
type: | |||
- resolvesToArray # of resolvesToNumber |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was keeping this comments as a note for later, as we would like to extend type definition for generic lists and hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense - same as above, those got removed due to the yaml parser not preserving comments, but will try and find a way to avoid that.
pipeline: | ||
- | ||
$sample: | ||
size: 100 | ||
- | ||
$group: | ||
_id: ~ | ||
_id: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted that both ~
and null
can be used according to the spec.
onNull: | ||
bytes: !!binary AAAAAAAAAAAAAAAAAABAMA== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new notation seems less legible to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will look at tweaking the generator to ensure it preserves the old notation.
@@ -24,99 +24,122 @@ arguments: | |||
A positive integral expression that is either a constant or depends on the _id value for $group. | |||
tests: | |||
- | |||
name: 'Null and Missing Values' | |||
link: 'https://www.mongodb.com/docs/manual/reference/operator/aggregation/firstN/#null-and-missing-values' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed a few tests that no longer seem to point to anything in the docs - e.g. "firstN:Null and Missing Values".
This example is in the "Behavior" part of the docs.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/firstN/#null-and-missing-values
"type": "string", | ||
"enum": [ | ||
"array", | ||
"object", | ||
"single", | ||
"search" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this value is not used.
generator/config/schema.json
Outdated
"enum": [ | ||
"Array", | ||
"Binary", | ||
"Boolean", | ||
"Code", | ||
"CodeWScope", | ||
"Date", | ||
"Decimal128", | ||
"Double", | ||
"Int32", | ||
"Int64", | ||
"MaxKey", | ||
"MinKey", | ||
"Null", | ||
"ObjectId", | ||
"BSONRegExp", | ||
"String", | ||
"BSONSymbol", | ||
"Timestamp", | ||
"Undefined", | ||
"Document", | ||
"Number" | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to converge on the same type names for operator return types and argument types. I'm fine with changing the names in expressions.php
to be consistent with other tools.
"Binary", | ||
"Boolean", | ||
"Code", | ||
"CodeWScope", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the CodeWScope
type specific to mongosh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the deprecated code_w_s bson type. Realistically, I don't expect any of the more advanced types to be found in the docs, but I generated the schema from the type definitions and opted not to manually edit it to make it easier to maintain if it ever changes. If we wanted to clean it up, we can definitely remove the deprecated types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GromNaN: For additional context, PHPC conditionally encodes Code or CodeWScope based on whether the MongoDB\BSON\Javascript object was constructed with a non-null $scope
parameter (see: php_phongo_javascript_init
).
generator/config/stage/project.yaml
Outdated
@@ -11,7 +11,7 @@ arguments: | |||
name: specification | |||
type: | |||
- expression | |||
variadic: object | |||
variadic: object # XXX: This should somehow allow explicit typings for { fieldName: 1|0 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is "expression", because any expression is accepted to compute a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. For IDE integrations/user support it's still valuable to be able to say "while this can be an arbitrary document, frequently it will have keys that match document keys and values which are either 0 or 1"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need an additional type for the projection. Maybe an enum:
0: Suppress
1: Include
And this parameter accepts an union of ProjectEnum|expression
This reverts commit 56a801b.
- resolvesToArray | ||
- expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was conflicting with the docs, which say that it can be any expression. It was particularly problematic in the $group
example since the actual input field doesn't need to be an array, it only becomes one in the context of a $group
, which is hard to statically enforce. I've relaxed the requirement in the interest of reducing the false negatives, at the cost of potentially accepting more false positives.
generator/config/expression/avg.yaml
Outdated
$avg: | ||
- $quizzes | ||
$avg: $quizzes | ||
labAvg: | ||
$avg: | ||
- $labs | ||
$avg: $labs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem to have been errors, right? At least in the docs, those are field expressions rather than arrays.
@@ -9,7 +9,7 @@ description: | | |||
arguments: | |||
- name: value | |||
type: | |||
- any | |||
- expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this was any previously, but happy to revert if expression is incorrect here.
This is very much a POC, but I figured I'd open up a PR to get early feedback on the direction we're heading in.
For context, the devtools team wants to use the yaml definitions from the PHP driver to build an autocomplete experience for the aggregation pipelines in mongosh/compass. We've built a tool to convert the yaml to typescript definitions and it seems to be working well for the base case, but to polish it, we'd need to enhance the yaml definitions to allow us to generate more precise type definitions.
Here's a list of the high-level changes we're proposing (will be updated over time):
$comment
fields (since most json parsers are not happy about it).SimplifiedSchema
from themongodb-schema
package. This is a little bit verbose since it supports multiple types per field, even though the docs rarely have heterogeneous schemas.mongodb-schema
package to infer a probabilistic schema from the example collection.firstN:Null and Missing Values
.