Skip to content

Commit c3af7d1

Browse files
committed
Add some documentation for how to write grammar rules
1 parent 101c424 commit c3af7d1

File tree

2 files changed

+124
-0
lines changed

2 files changed

+124
-0
lines changed

docs/authoring.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,3 +214,7 @@ r[foo.bar.edition2021]
214214
> [!EDITION-2021]
215215
> Describe what changed in 2021.
216216
```
217+
218+
## Grammar
219+
220+
See [Grammar](grammar.md) for details on how to write grammar rules.

docs/grammar.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Grammar
2+
3+
The Reference grammar is written in markdown code blocks using a modified BNF-like syntax (with a blend of regex and other arbitrary things). The `mdbook-spec` extension parses these rules and converts them to a renderable format, including railroad diagrams.
4+
5+
The code block should have a lang string with the word "grammar", a comma, and the category of the grammar, like this:
6+
7+
~~~
8+
```grammar,items
9+
ProductionName -> SomeExpression
10+
```
11+
~~~
12+
13+
The category is used to group similar productions on the grammar summary page in the appendix.
14+
15+
## Grammar syntax
16+
17+
The syntax for the grammar itself is pretty close to what is described in the [Notation chapter](../src/notation.md), though there are some rendering differences.
18+
19+
The syntax for the grammar itself (written in itself, hopefully that's not too confusing) is:
20+
21+
```
22+
Grammar -> Production+
23+
24+
BACKTICK -> U+0060
25+
26+
LF -> U+000A
27+
28+
Production -> Name ` ->` Expression
29+
30+
Name -> <Alphanumeric or `_`>+
31+
32+
Expression -> Sequence (` `* `|` ` `* Sequence)*
33+
34+
Sequence -> (` `* AdornedExpr)+
35+
36+
AdornedExpr -> ExprRepeat Suffix? Footnote?
37+
38+
Suffix -> ` _` <not underscore, unless in backtick>* `_`
39+
40+
Footnote -> `[^` ~[`]` LF]+ `]`
41+
42+
ExprRepeat ->
43+
Expr1 `?`
44+
| Expr1 `*?`
45+
| Expr1 `*`
46+
| Expr1 `+?`
47+
| Expr1 `+`
48+
| Expr1 `{` Range? `..` Range? `}`
49+
50+
Range -> [0-9]+
51+
52+
Expr1 ->
53+
Unicode
54+
| NonTerminal
55+
| Break
56+
| Terminal
57+
| Charset
58+
| Prose
59+
| Group
60+
| NegativeExpression
61+
62+
Unicode -> `U+` [`A`-`Z` `0`-`9`]4..4
63+
64+
NonTerminal -> Name
65+
66+
Break -> LF ` `+
67+
68+
Terminal -> BACKTICK ~[LF]+ BACKTICK
69+
70+
Charset -> `[` (` `* Characters)+ ` `* `]`
71+
72+
Characters ->
73+
CharacterRange
74+
| CharacterTerminal
75+
| CharacterName
76+
77+
CharacterRange -> BACKTICK <any char> BACKTICK `-` BACKTICK <any char> BACKTICK
78+
79+
CharacterTerminal -> Terminal
80+
81+
CharacterName -> Name
82+
83+
Prose -> `<` ~[`>` LF]+ `>`
84+
85+
Group -> `(` ` `* Expression ` `* `)`
86+
87+
NegativeExpression -> `~` ( Charset | Terminal | NonTerminal )
88+
```
89+
90+
The general format is a series of productions separated by blank lines. The expressions are:
91+
92+
| Expression | Example | Description |
93+
|------------|---------|-------------|
94+
| Unicode | U+0060 | A single unicode character. |
95+
| NonTerminal | FunctionParameters | A reference to another production by name. |
96+
| Break | | This is used internally by the renderer to detect line breaks and indentation. |
97+
| Terminal | \`example\` | This is a sequence of exact characters, surrounded by backticks |
98+
| Charset | [ \`A\`-\`Z\` \`0\`-\`9\` \`_\` ] | A choice from a set of characters, space separated. There are three different forms. |
99+
| CharacterRange | [ \`A\`-\`Z\` ] | A range of characters, each character should be in backticks.
100+
| CharacterTerminal | [ \`x\` ] | A single character, surrounded by backticks. |
101+
| CharacterName | [ LF ] | A nonterminal, referring to another production. |
102+
| Prose | \<any ASCII character except CR\> | This is an English description of what should be matched, surrounded in angle brackets. |
103+
| Group | (\`,\` Parameter)+ | This groups an expression for the purpose of precedence, such as applying a repetition operator to a sequence of other expressions.
104+
| NegativeExpression | ~[\` \` LF] | Matches anything except the given Charset, Terminal, or Nonterminal. |
105+
| Sequence | \`fn\` Name Parameters | A sequence of expressions, where they must match in order. |
106+
| Alternation | Expr1 \| Expr2 | Matches only one of the given expressions, separated by the vertical pipe character. |
107+
| Suffix | \_except \[LazyBooleanExpression\]\_ | This adds a suffix to the previous expression to provide an additional English description to it, rendered in subscript. This can have limited markdown, but try to avoid anything except basics like links. |
108+
| Footnote | \[^extern-safe\] | This adds a footnote, which can supply some extra information that may be helpful to the user. The footnote itself should be defined outside of the code block like a normal markdown footnote. |
109+
| Optional | Expr? | The preceding expression is optional. |
110+
| Repeat | Expr* | The preceding expression is repeated 0 or more times. |
111+
| Repeat (non-greedy) | Expr*? | The preceding expression is repeated 0 or more times without being greedy. |
112+
| RepeatPlus | Expr+ | The preceding expression is repeated 1 or more times. |
113+
| RepeatPlus (non-greedy) | Expr+? | The preceding expression is repeated 1 or more times without being greedy. |
114+
| RepeatRange | Expr{2..4} | The preceding expression is repeated between the range of times specified. Either bounds can be excluded, which works just like Rust ranges. |
115+
116+
## Automatic linking
117+
118+
The plugin automatically adds markdown link definitions for all the production names on every page. If you want to link directly to a production name, all you need to do is surround it in square brackets, like `[ArrayExpression]`.
119+
120+
In some cases there might be name collisions with the automatic linking of rule names. In that case, disambiguate with the `grammar-` prefix, such as `[Type][grammar-Type]`. You can also do that if you just feel like being more explicit.

0 commit comments

Comments
 (0)