Skip to content

Commit 9b93acf

Browse files
authored
Auto-generate Pydantic models from JSON Schemas (#2)
JSON Schemas are the single source of truth for page and block types. We implement auto-generated Pydantic models for manipulating JSON-DOC files in Python. See https://koxudaxi.github.io/datamodel-code-generator The module path `jsondoc.models` contains said auto-generated models and any edits will be overwritten. We make use of datamodel-code-generator conventions of `customBasePath` and `customTypePath` to rewire inheritance from base classes, so that blocks, rich text objects and file objects (e.g. for image block) can be deserialized elegantly. `jsondoc.serialize` contains logic that initializes Pydantic models from JSON-DOC files or dicts. * Copy over classes from Markdownify, add datamodel-code-generator for automatically creating Pydantic models from JSON Schemas * Generated Pydantic BaseModels under jsondoc/models/autogen * Figured out how to inherit with datamodel-code-generator * Fix File object inheritance * Fix children being at wrong level for some blocks * Add serialization tests, wip * Block counts match, still wip * Handle rich text * Handle rest of the rich text fields * Serialization test passes with 0 diff * Add notes
1 parent 59b4d36 commit 9b93acf

File tree

82 files changed

+2380
-346
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+2380
-346
lines changed

.github/workflows/test.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ jobs:
4343
run: |
4444
source .venv/bin/activate
4545
python tests/run_validation_tests.py schema
46+
python tests/run_serialization_tests.py
4647
4748
# - name: Upload test results
4849
# uses: actions/upload-artifact@v2

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@
22
.DS_Store
33
*.pdf
44
*.png
5-
__pycache__
5+
__pycache__
6+
build/

docs/differences-from-notion.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- Certain block types are not supported.
2+
- TBD: List which ones
3+
- Certain fields can be omitted, which would fall back to default values.
4+
- Metadata field is present in blocks.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# JSON-DOC Python Implementation
2+
3+
JSON Schemas are the single source of truth for page and block types.
4+
5+
We use [datamodel-code-generator](https://koxudaxi.github.io/datamodel-code-generator) to generate Pydantic `BaseModel`s from JSON-DOC JSON schemas. This saves us from writing the Pydantic models by hand.
6+
7+
The JSON Schemas as refer to each other, and there are circular dependencies in some cases, like blocks that have children blocks whose types are also the general block under `/block/block_schema.json`.
8+
9+
We make use of datamodel-code-generator conventions of `customBasePath` and `customTypePath` to rewire inheritance from base classes, so that blocks, rich text objects and file objects (e.g. for image block) can be deserialized elegantly. `jsondoc.serialize` contains logic that initializes Pydantic models from JSON-DOC files or dicts.
10+
11+
- `customBasePath` is used to override the base class of a model, instead of `BaseModel`. This is used to inherit from the correct base class, e.g. `BlockBase` for blocks, `RichTextBase` for rich text objects, etc.
12+
- `customTypePath` is used to override the type of a field in a model, which will be imported from the given path. If this is given, there won't be a definition for that object in the generated code.
13+
14+
> [!WARNING]
15+
>
16+
> This stuff is pretty mind-bending, it probably needs a diagram. It is also not as hacky as it looks.
17+
18+
During model generation, instead of simply substituting the `$ref`s with the actual types, we create dummy types and make use of `customTypePath` to refer to the actual type:
19+
20+
```python
21+
# We use the existence of title as a control variable to determine if we should substitute the $ref with the actual type, or inherit from the referenced customTypePath.
22+
title = ref_obj.get("title")
23+
if title is None:
24+
return ref_obj
25+
26+
# We set customTypePath wherever necessary.
27+
# If it's not found, we construct it from the ref.
28+
if "customTypePath" in ref_obj:
29+
customTypePath = ref_obj["customTypePath"]
30+
else:
31+
ref_tokens = ref.split("/")
32+
ref_tokens = ["jsondoc", "models"] + ref_tokens[:-1] + [title]
33+
customTypePath = ".".join(ref_tokens)
34+
35+
ret = {
36+
"type": "object",
37+
"title": title,
38+
"customTypePath": customTypePath,
39+
"properties": {},
40+
}
41+
```
42+
43+
See [autogen_pydantic.py](/scripts/autogen_pydantic.py) for the full implementation.
44+
45+
The module path `jsondoc.models` contains auto-generated models and any edits will be overwritten.

docs/roadmap.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,3 +116,10 @@ These are not "official" blocks, but exist under the `rich_text` key in some blo
116116
### Deprecated blocks
117117

118118
- ~~`type: template`~~
119+
120+
121+
## Miscellaneous tasks
122+
123+
- [ ] Make non-essential fields optional with default values to make JSON files smaller. Start with rich text fields.
124+
- [ ] Reserve jsondoc PyPI package name.
125+
- [ ] Buy a JSON-DOC domain. json-doc.org and json-doc.com are available.

jsondoc/models/__init__.py

Whitespace-only changes.

jsondoc/models/block/__init__.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# generated by datamodel-codegen:
2+
# filename: example.json
3+
# timestamp: 2024-08-21T17:19:54+00:00
4+
5+
from __future__ import annotations
6+
7+
from enum import Enum
8+
9+
from jsondoc.models.block.base import BlockBase
10+
11+
12+
class Type(Enum):
13+
paragraph = 'paragraph'
14+
to_do = 'to_do'
15+
bulleted_list_item = 'bulleted_list_item'
16+
numbered_list_item = 'numbered_list_item'
17+
code = 'code'
18+
column = 'column'
19+
column_list = 'column_list'
20+
divider = 'divider'
21+
equation = 'equation'
22+
heading_1 = 'heading_1'
23+
heading_2 = 'heading_2'
24+
heading_3 = 'heading_3'
25+
image = 'image'
26+
quote = 'quote'
27+
equation_1 = 'equation'
28+
table = 'table'
29+
table_row = 'table_row'
30+
toggle = 'toggle'
31+
32+
33+
class Block(BlockBase):
34+
type: Type

jsondoc/models/block/base/__init__.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# generated by datamodel-codegen:
2+
# filename: example.json
3+
# timestamp: 2024-08-21T17:19:54+00:00
4+
5+
from __future__ import annotations
6+
7+
from typing import Any, Dict, Optional
8+
9+
from pydantic import AwareDatetime, BaseModel, ConfigDict
10+
from typing_extensions import Literal
11+
12+
13+
class Parent(BaseModel):
14+
model_config = ConfigDict(
15+
extra='forbid',
16+
)
17+
type: str
18+
block_id: Optional[str] = None
19+
page_id: Optional[str] = None
20+
21+
22+
class CreatedBy(BaseModel):
23+
model_config = ConfigDict(
24+
extra='forbid',
25+
)
26+
object: Literal['user']
27+
id: str
28+
29+
30+
class LastEditedBy(BaseModel):
31+
model_config = ConfigDict(
32+
extra='forbid',
33+
)
34+
object: Literal['user']
35+
id: str
36+
37+
38+
class BlockBase(BaseModel):
39+
object: Literal['block']
40+
id: str
41+
parent: Optional[Parent] = None
42+
type: str
43+
created_time: AwareDatetime
44+
created_by: Optional[CreatedBy] = None
45+
last_edited_time: Optional[AwareDatetime] = None
46+
last_edited_by: Optional[LastEditedBy] = None
47+
archived: bool
48+
in_trash: Optional[bool] = None
49+
has_children: bool
50+
metadata: Optional[Dict[str, Any]] = None

jsondoc/models/block/types/__init__.py

Whitespace-only changes.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# generated by datamodel-codegen:
2+
# filename: example.json
3+
# timestamp: 2024-08-21T17:19:54+00:00
4+
5+
from __future__ import annotations
6+
7+
from typing import List, Optional
8+
9+
from jsondoc.models.block.base import BlockBase
10+
from jsondoc.models.block.types.rich_text.base import RichTextBase
11+
from jsondoc.models.shared_definitions import Color
12+
from pydantic import BaseModel, ConfigDict
13+
from typing_extensions import Literal
14+
15+
16+
class BulletedListItem(BaseModel):
17+
model_config = ConfigDict(
18+
extra='forbid',
19+
arbitrary_types_allowed=True,
20+
)
21+
rich_text: List[RichTextBase]
22+
color: Optional[Color] = None
23+
24+
25+
class BulletedListItemBlock(BlockBase):
26+
model_config = ConfigDict(
27+
arbitrary_types_allowed=True,
28+
)
29+
type: Literal['bulleted_list_item']
30+
bulleted_list_item: BulletedListItem
31+
children: Optional[List[BlockBase]] = None
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# generated by datamodel-codegen:
2+
# filename: example.json
3+
# timestamp: 2024-08-21T17:19:54+00:00
4+
5+
from __future__ import annotations
6+
7+
from enum import Enum
8+
from typing import List, Optional
9+
10+
from jsondoc.models.block.base import BlockBase
11+
from jsondoc.models.block.types.rich_text.base import RichTextBase
12+
from pydantic import BaseModel, ConfigDict
13+
from typing_extensions import Literal
14+
15+
16+
class Language(Enum):
17+
abap = 'abap'
18+
arduino = 'arduino'
19+
bash = 'bash'
20+
basic = 'basic'
21+
c = 'c'
22+
clojure = 'clojure'
23+
coffeescript = 'coffeescript'
24+
c__ = 'c++'
25+
c_ = 'c#'
26+
css = 'css'
27+
dart = 'dart'
28+
diff = 'diff'
29+
docker = 'docker'
30+
elixir = 'elixir'
31+
elm = 'elm'
32+
erlang = 'erlang'
33+
flow = 'flow'
34+
fortran = 'fortran'
35+
f_ = 'f#'
36+
gherkin = 'gherkin'
37+
glsl = 'glsl'
38+
go = 'go'
39+
graphql = 'graphql'
40+
groovy = 'groovy'
41+
haskell = 'haskell'
42+
html = 'html'
43+
java = 'java'
44+
javascript = 'javascript'
45+
json = 'json'
46+
julia = 'julia'
47+
kotlin = 'kotlin'
48+
latex = 'latex'
49+
less = 'less'
50+
lisp = 'lisp'
51+
livescript = 'livescript'
52+
lua = 'lua'
53+
makefile = 'makefile'
54+
markdown = 'markdown'
55+
markup = 'markup'
56+
matlab = 'matlab'
57+
mermaid = 'mermaid'
58+
nix = 'nix'
59+
objective_c = 'objective-c'
60+
ocaml = 'ocaml'
61+
pascal = 'pascal'
62+
perl = 'perl'
63+
php = 'php'
64+
plain_text = 'plain text'
65+
powershell = 'powershell'
66+
prolog = 'prolog'
67+
protobuf = 'protobuf'
68+
python = 'python'
69+
r = 'r'
70+
reason = 'reason'
71+
ruby = 'ruby'
72+
rust = 'rust'
73+
sass = 'sass'
74+
scala = 'scala'
75+
scheme = 'scheme'
76+
scss = 'scss'
77+
shell = 'shell'
78+
sql = 'sql'
79+
swift = 'swift'
80+
typescript = 'typescript'
81+
vb_net = 'vb.net'
82+
verilog = 'verilog'
83+
vhdl = 'vhdl'
84+
visual_basic = 'visual basic'
85+
webassembly = 'webassembly'
86+
xml = 'xml'
87+
yaml = 'yaml'
88+
java_c_c___c_ = 'java/c/c++/c#'
89+
90+
91+
class Code(BaseModel):
92+
model_config = ConfigDict(
93+
extra='forbid',
94+
arbitrary_types_allowed=True,
95+
)
96+
caption: Optional[List[RichTextBase]] = None
97+
rich_text: List[RichTextBase]
98+
language: Optional[Language] = None
99+
100+
101+
class CodeBlock(BlockBase):
102+
model_config = ConfigDict(
103+
arbitrary_types_allowed=True,
104+
)
105+
type: Literal['code']
106+
code: Code
107+
children: Optional[List[BlockBase]] = None
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# generated by datamodel-codegen:
2+
# filename: example.json
3+
# timestamp: 2024-08-21T17:19:54+00:00
4+
5+
from __future__ import annotations
6+
7+
from typing import Any, Dict, List, Optional
8+
9+
from jsondoc.models.block.base import BlockBase
10+
from pydantic import ConfigDict
11+
from typing_extensions import Literal
12+
13+
14+
class ColumnBlock(BlockBase):
15+
model_config = ConfigDict(
16+
arbitrary_types_allowed=True,
17+
)
18+
type: Literal['column']
19+
column: Dict[str, Any]
20+
children: Optional[List[BlockBase]] = None
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# generated by datamodel-codegen:
2+
# filename: example.json
3+
# timestamp: 2024-08-21T17:19:54+00:00
4+
5+
from __future__ import annotations
6+
7+
from typing import Any, Dict, List, Optional
8+
9+
from jsondoc.models.block.base import BlockBase
10+
from jsondoc.models.block.types.column import ColumnBlock
11+
from pydantic import ConfigDict
12+
from typing_extensions import Literal
13+
14+
15+
class ColumnListBlock(BlockBase):
16+
model_config = ConfigDict(
17+
arbitrary_types_allowed=True,
18+
)
19+
type: Literal['column_list']
20+
column_list: Dict[str, Any]
21+
children: Optional[List[ColumnBlock]] = None
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# generated by datamodel-codegen:
2+
# filename: example.json
3+
# timestamp: 2024-08-21T17:19:54+00:00
4+
5+
from __future__ import annotations
6+
7+
from typing import Any, Dict
8+
9+
from jsondoc.models.block.base import BlockBase
10+
from typing_extensions import Literal
11+
12+
13+
class DividerBlock(BlockBase):
14+
type: Literal['divider']
15+
divider: Dict[str, Any]
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# generated by datamodel-codegen:
2+
# filename: example.json
3+
# timestamp: 2024-08-21T17:19:54+00:00
4+
5+
from __future__ import annotations
6+
7+
from jsondoc.models.block.base import BlockBase
8+
from pydantic import BaseModel, ConfigDict
9+
from typing_extensions import Literal
10+
11+
12+
class Equation(BaseModel):
13+
model_config = ConfigDict(
14+
extra='forbid',
15+
)
16+
expression: str
17+
18+
19+
class EquationBlock(BlockBase):
20+
type: Literal['equation']
21+
equation: Equation

0 commit comments

Comments
 (0)