Skip to content

Commit ddfb9c9

Browse files
committed
Add a “intro to syntax trees” guide
1 parent 9e64b86 commit ddfb9c9

File tree

5 files changed

+158
-9
lines changed

5 files changed

+158
-9
lines changed

dictionary.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
// Jargon
2+
APIs
3+
ASTs
4+
AST
25
CLI
36
XSS
47
attacher
@@ -19,6 +22,7 @@ syntaxes
1922
whitespace
2023

2124
// Names, products, etc.
25+
ATX
2226
BundlePhobia
2327
CDN
2428
CommonMark
@@ -28,10 +32,13 @@ HSL
2832
JSDoc
2933
JSON
3034
JSX
35+
MDXs
3136
MDX
3237
MacBook
3338
Otander
39+
PostCSS
3440
Preact
41+
Setext
3542
gemoji
3643
mdast
3744
nlcst
Lines changed: 148 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,157 @@
11
---
2+
authorGithub: wooorm
3+
authorTwitter: wooorm
4+
author: Titus Wormer
25
description: Guide that shows the basics of syntax trees (ASTs)
36
group: guide
4-
modified: 2024-08-05
5-
published: 2019-12-12
7+
modified: 2024-08-14
8+
published: 2024-08-14
9+
tags:
10+
- syntax tree
11+
- unist
612
title: Intro to syntax trees
713
---
814

915
## Introduction to syntax trees
1016

11-
Unfortunately, this guide is not yet written.
12-
We’re looking for help with that, if you want please edit this file on
13-
[GitHub][].
17+
unified uses abstract syntax trees (abbreviated as ASTs),
18+
that plugins can work on.
19+
This guide introduces what ASTs are and how to work with them.
1420

15-
[github]: https://github.com/unifiedjs/unifiedjs.github.io/blob/main/doc/learn/introduction-to-syntax-trees.md
21+
### Contents
22+
23+
* [What is an AST?](#what-is-an-ast)
24+
* [What is unist?](#what-is-unist)
25+
* [When to use an AST?](#when-to-use-an-ast)
26+
27+
### What is an AST?
28+
29+
An abstract syntax tree (AST) is a tree representation of the syntax of
30+
programming languages.
31+
For us that’s typically markup languages.
32+
33+
As a JavaScript developer you may already know things that are like ASTs:
34+
The DOM and React’s virtual DOM.
35+
Or you may have heard of Babel, ESLint, PostCSS, Prettier, or TypeScript.
36+
They all use ASTs to inspect and transform code.
37+
38+
In unified,
39+
we support *several* ASTs.
40+
The reason for different ASTs is that each markup language has several aspects
41+
that do not translate 1-to-1 to other markup languages.
42+
Taking markdown and HTML as an example,
43+
in some cases markdown has more info than HTML:
44+
markdown has several ways to add a link
45+
(“autolinks”: `<https://url>`,
46+
resource links: `[label](url)`,
47+
and reference links with definitions: `[label][id]` and `[id]: url`).
48+
In other cases,
49+
HTML has more info than markdown.
50+
It has many tags,
51+
which add new meaning (semantics),
52+
that aren’t available in markdown.
53+
If there was one AST,
54+
it would be quite hard to do the tasks that several remark and rehype plugins
55+
now do.
56+
57+
See [“How to build a syntax tree”][build-a-syntax-tree] for more info on how to
58+
make a tree.
59+
See [“Syntax trees in TypeScript”][syntax-trees-in-typescript] on how to work
60+
with ASTs in TypeScript.
61+
62+
### What is unist?
63+
64+
But all our ASTs have things in common.
65+
The bit in common is called unist.
66+
By having a shared interface,
67+
we can also share tools that work on all ASTs.
68+
In practice,
69+
that means you can use for example [`unist-util-visit`][unist-util-visit]
70+
to visit nodes in any supported AST.
71+
72+
See [“Tree traversal”][tree-traversal] for more info on `unist-util-visit`.
73+
74+
unist is different from the ASTs used in other tools.
75+
Quite noticeable because it uses a particular set of names for things:
76+
`type`, `children`, `position`.
77+
But perhaps harder to see is that it’s compatible with JSON.
78+
It’s all objects and arrays.
79+
Strings,
80+
numbers.
81+
Where other tools use instances with methods,
82+
we use plain data.
83+
Years ago in retext we started out like that too.
84+
But we found that we preferred to be able to read and write a tree from/to a
85+
JSON file,
86+
to treat ASTs as data,
87+
and use more functional utilities.
88+
89+
### When to use an AST?
90+
91+
You can use an AST when you want to inspect or transform content.
92+
93+
Say you wanted to count the number of headings in a markdown file.
94+
You could also do that with a regex:
95+
96+
```js twoslash
97+
/// <reference types="node" />
98+
// ---cut---
99+
const value = `# Pluto
100+
101+
Pluto is a dwarf planet in the Kuiper belt.
102+
103+
## History
104+
105+
### Discovery
106+
107+
In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the
108+
position of…`
109+
110+
const expression = /^#+[^\r\n]+/gm
111+
const headings = [...value.matchAll(expression)].length
112+
113+
console.log(headings) //=> 3
114+
```
115+
116+
But what if the headings were in a code block?
117+
Or if Setext headings were used instead of ATX headings?
118+
The grammar of markdown is more complex than a regex can handle.
119+
That’s where an AST can help.
120+
121+
```js twoslash
122+
/// <reference types="node" />
123+
// ---cut---
124+
import {fromMarkdown} from 'mdast-util-from-markdown'
125+
import {visit} from 'unist-util-visit'
126+
127+
const value = `# Pluto
128+
129+
Pluto is a dwarf planet in the Kuiper belt.
130+
131+
## History
132+
133+
### Discovery
134+
135+
In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the
136+
position of…`
137+
138+
const tree = fromMarkdown(value)
139+
140+
let headings = 0
141+
142+
visit(tree, 'heading', function () {
143+
headings++
144+
})
145+
146+
console.log(headings) //=> 3
147+
```
148+
149+
See [“Tree traversal”][tree-traversal] for more info on `unist-util-visit`.
150+
151+
[unist-util-visit]: https://github.com/syntax-tree/unist-util-visit
152+
153+
[build-a-syntax-tree]: /learn/recipe/build-a-syntax-tree/
154+
155+
[syntax-trees-in-typescript]: /learn/guide/syntax-trees-typescript/
156+
157+
[tree-traversal]: /learn/recipe/tree-traversal/

doc/learn/introduction-to-unified.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ generate a table of contents,
140140
and (potentially) much more:
141141
that’s when to opt for unified.
142142

143-
> A large part of MDX’s success has been leveraging the unified and remark
143+
> A large part of MDXs success has been leveraging the unified and remark
144144
> ecosystem.
145145
> I was able to get a prototype working in a few hours because I didn’t have to
146146
> worry about markdown parsing: remark gave it to me for free.

generate/pipeline/article.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ export const article = unified()
9090
properties: {ariaLabel: 'Link to self', className: ['anchor']}
9191
})
9292
.use(rehypeAbbreviate, {
93-
ignore: ['ECMAScript', 'ID', 'JSDoc', 'JSX', 'MDX'],
93+
ignore: ['ATX', 'ECMAScript', 'ESLint', 'ID', 'JSDoc', 'JSX', 'MDX'],
9494
titles: {
9595
API: 'Application programming interface',
9696
ARIA: 'Accessible rich internet applications',

readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This basic build uses two shortcuts over the full build:
1515
The full build is a slow site to properly build!
1616
Takes about 20 minutes (🤯) on my tiny trusted 12 inch MacBook.
1717
The reason for this is that it crawls the whole ecosystem.
18-
We contact 5 API’s: GitHub, npm, npms, OpenCollective, and BundlePhobia.
18+
We contact 5 APIs: GitHub, npm, npms, OpenCollective, and BundlePhobia.
1919
When generating, it builds a performant static site.
2020
Everything is minified.
2121
Images are highly optimized.

0 commit comments

Comments
 (0)