Skip to content

Auto-generate Pydantic models from JSON Schemas #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ jobs:
run: |
source .venv/bin/activate
python tests/run_validation_tests.py schema
python tests/run_serialization_tests.py

# - name: Upload test results
# uses: actions/upload-artifact@v2
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
.DS_Store
*.pdf
*.png
__pycache__
__pycache__
build/
4 changes: 4 additions & 0 deletions docs/differences-from-notion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
- Certain block types are not supported.
- TBD: List which ones
- Certain fields can be omitted, which would fall back to default values.
- Metadata field is present in blocks.
45 changes: 45 additions & 0 deletions docs/notes-on-python-implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# JSON-DOC Python Implementation

JSON Schemas are the single source of truth for page and block types.

We use [datamodel-code-generator](https://koxudaxi.github.io/datamodel-code-generator) to generate Pydantic `BaseModel`s from JSON-DOC JSON schemas. This saves us from writing the Pydantic models by hand.

The JSON Schemas as refer to each other, and there are circular dependencies in some cases, like blocks that have children blocks whose types are also the general block under `/block/block_schema.json`.

We make use of datamodel-code-generator conventions of `customBasePath` and `customTypePath` to rewire inheritance from base classes, so that blocks, rich text objects and file objects (e.g. for image block) can be deserialized elegantly. `jsondoc.serialize` contains logic that initializes Pydantic models from JSON-DOC files or dicts.

- `customBasePath` is used to override the base class of a model, instead of `BaseModel`. This is used to inherit from the correct base class, e.g. `BlockBase` for blocks, `RichTextBase` for rich text objects, etc.
- `customTypePath` is used to override the type of a field in a model, which will be imported from the given path. If this is given, there won't be a definition for that object in the generated code.

> [!WARNING]
>
> This stuff is pretty mind-bending, it probably needs a diagram. It is also not as hacky as it looks.

During model generation, instead of simply substituting the `$ref`s with the actual types, we create dummy types and make use of `customTypePath` to refer to the actual type:

```python
# We use the existence of title as a control variable to determine if we should substitute the $ref with the actual type, or inherit from the referenced customTypePath.
title = ref_obj.get("title")
if title is None:
return ref_obj

# We set customTypePath wherever necessary.
# If it's not found, we construct it from the ref.
if "customTypePath" in ref_obj:
customTypePath = ref_obj["customTypePath"]
else:
ref_tokens = ref.split("/")
ref_tokens = ["jsondoc", "models"] + ref_tokens[:-1] + [title]
customTypePath = ".".join(ref_tokens)

ret = {
"type": "object",
"title": title,
"customTypePath": customTypePath,
"properties": {},
}
```

See [autogen_pydantic.py](/scripts/autogen_pydantic.py) for the full implementation.

The module path `jsondoc.models` contains auto-generated models and any edits will be overwritten.
7 changes: 7 additions & 0 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,10 @@ These are not "official" blocks, but exist under the `rich_text` key in some blo
### Deprecated blocks

- ~~`type: template`~~


## Miscellaneous tasks

- [ ] Make non-essential fields optional with default values to make JSON files smaller. Start with rich text fields.
- [ ] Reserve jsondoc PyPI package name.
- [ ] Buy a JSON-DOC domain. json-doc.org and json-doc.com are available.
Empty file added jsondoc/models/__init__.py
Empty file.
34 changes: 34 additions & 0 deletions jsondoc/models/block/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2024-08-21T17:19:54+00:00

from __future__ import annotations

from enum import Enum

from jsondoc.models.block.base import BlockBase


class Type(Enum):
paragraph = 'paragraph'
to_do = 'to_do'
bulleted_list_item = 'bulleted_list_item'
numbered_list_item = 'numbered_list_item'
code = 'code'
column = 'column'
column_list = 'column_list'
divider = 'divider'
equation = 'equation'
heading_1 = 'heading_1'
heading_2 = 'heading_2'
heading_3 = 'heading_3'
image = 'image'
quote = 'quote'
equation_1 = 'equation'
table = 'table'
table_row = 'table_row'
toggle = 'toggle'


class Block(BlockBase):
type: Type
50 changes: 50 additions & 0 deletions jsondoc/models/block/base/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2024-08-21T17:19:54+00:00

from __future__ import annotations

from typing import Any, Dict, Optional

from pydantic import AwareDatetime, BaseModel, ConfigDict
from typing_extensions import Literal


class Parent(BaseModel):
model_config = ConfigDict(
extra='forbid',
)
type: str
block_id: Optional[str] = None
page_id: Optional[str] = None


class CreatedBy(BaseModel):
model_config = ConfigDict(
extra='forbid',
)
object: Literal['user']
id: str


class LastEditedBy(BaseModel):
model_config = ConfigDict(
extra='forbid',
)
object: Literal['user']
id: str


class BlockBase(BaseModel):
object: Literal['block']
id: str
parent: Optional[Parent] = None
type: str
created_time: AwareDatetime
created_by: Optional[CreatedBy] = None
last_edited_time: Optional[AwareDatetime] = None
last_edited_by: Optional[LastEditedBy] = None
archived: bool
in_trash: Optional[bool] = None
has_children: bool
metadata: Optional[Dict[str, Any]] = None
Empty file.
31 changes: 31 additions & 0 deletions jsondoc/models/block/types/bulleted_list_item/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2024-08-21T17:19:54+00:00

from __future__ import annotations

from typing import List, Optional

from jsondoc.models.block.base import BlockBase
from jsondoc.models.block.types.rich_text.base import RichTextBase
from jsondoc.models.shared_definitions import Color
from pydantic import BaseModel, ConfigDict
from typing_extensions import Literal


class BulletedListItem(BaseModel):
model_config = ConfigDict(
extra='forbid',
arbitrary_types_allowed=True,
)
rich_text: List[RichTextBase]
color: Optional[Color] = None


class BulletedListItemBlock(BlockBase):
model_config = ConfigDict(
arbitrary_types_allowed=True,
)
type: Literal['bulleted_list_item']
bulleted_list_item: BulletedListItem
children: Optional[List[BlockBase]] = None
107 changes: 107 additions & 0 deletions jsondoc/models/block/types/code/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2024-08-21T17:19:54+00:00

from __future__ import annotations

from enum import Enum
from typing import List, Optional

from jsondoc.models.block.base import BlockBase
from jsondoc.models.block.types.rich_text.base import RichTextBase
from pydantic import BaseModel, ConfigDict
from typing_extensions import Literal


class Language(Enum):
abap = 'abap'
arduino = 'arduino'
bash = 'bash'
basic = 'basic'
c = 'c'
clojure = 'clojure'
coffeescript = 'coffeescript'
c__ = 'c++'
c_ = 'c#'
css = 'css'
dart = 'dart'
diff = 'diff'
docker = 'docker'
elixir = 'elixir'
elm = 'elm'
erlang = 'erlang'
flow = 'flow'
fortran = 'fortran'
f_ = 'f#'
gherkin = 'gherkin'
glsl = 'glsl'
go = 'go'
graphql = 'graphql'
groovy = 'groovy'
haskell = 'haskell'
html = 'html'
java = 'java'
javascript = 'javascript'
json = 'json'
julia = 'julia'
kotlin = 'kotlin'
latex = 'latex'
less = 'less'
lisp = 'lisp'
livescript = 'livescript'
lua = 'lua'
makefile = 'makefile'
markdown = 'markdown'
markup = 'markup'
matlab = 'matlab'
mermaid = 'mermaid'
nix = 'nix'
objective_c = 'objective-c'
ocaml = 'ocaml'
pascal = 'pascal'
perl = 'perl'
php = 'php'
plain_text = 'plain text'
powershell = 'powershell'
prolog = 'prolog'
protobuf = 'protobuf'
python = 'python'
r = 'r'
reason = 'reason'
ruby = 'ruby'
rust = 'rust'
sass = 'sass'
scala = 'scala'
scheme = 'scheme'
scss = 'scss'
shell = 'shell'
sql = 'sql'
swift = 'swift'
typescript = 'typescript'
vb_net = 'vb.net'
verilog = 'verilog'
vhdl = 'vhdl'
visual_basic = 'visual basic'
webassembly = 'webassembly'
xml = 'xml'
yaml = 'yaml'
java_c_c___c_ = 'java/c/c++/c#'


class Code(BaseModel):
model_config = ConfigDict(
extra='forbid',
arbitrary_types_allowed=True,
)
caption: Optional[List[RichTextBase]] = None
rich_text: List[RichTextBase]
language: Optional[Language] = None


class CodeBlock(BlockBase):
model_config = ConfigDict(
arbitrary_types_allowed=True,
)
type: Literal['code']
code: Code
children: Optional[List[BlockBase]] = None
20 changes: 20 additions & 0 deletions jsondoc/models/block/types/column/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2024-08-21T17:19:54+00:00

from __future__ import annotations

from typing import Any, Dict, List, Optional

from jsondoc.models.block.base import BlockBase
from pydantic import ConfigDict
from typing_extensions import Literal


class ColumnBlock(BlockBase):
model_config = ConfigDict(
arbitrary_types_allowed=True,
)
type: Literal['column']
column: Dict[str, Any]
children: Optional[List[BlockBase]] = None
21 changes: 21 additions & 0 deletions jsondoc/models/block/types/column_list/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2024-08-21T17:19:54+00:00

from __future__ import annotations

from typing import Any, Dict, List, Optional

from jsondoc.models.block.base import BlockBase
from jsondoc.models.block.types.column import ColumnBlock
from pydantic import ConfigDict
from typing_extensions import Literal


class ColumnListBlock(BlockBase):
model_config = ConfigDict(
arbitrary_types_allowed=True,
)
type: Literal['column_list']
column_list: Dict[str, Any]
children: Optional[List[ColumnBlock]] = None
15 changes: 15 additions & 0 deletions jsondoc/models/block/types/divider/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2024-08-21T17:19:54+00:00

from __future__ import annotations

from typing import Any, Dict

from jsondoc.models.block.base import BlockBase
from typing_extensions import Literal


class DividerBlock(BlockBase):
type: Literal['divider']
divider: Dict[str, Any]
21 changes: 21 additions & 0 deletions jsondoc/models/block/types/equation/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# generated by datamodel-codegen:
# filename: example.json
# timestamp: 2024-08-21T17:19:54+00:00

from __future__ import annotations

from jsondoc.models.block.base import BlockBase
from pydantic import BaseModel, ConfigDict
from typing_extensions import Literal


class Equation(BaseModel):
model_config = ConfigDict(
extra='forbid',
)
expression: str


class EquationBlock(BlockBase):
type: Literal['equation']
equation: Equation
Loading
Loading