|
1 | 1 | ---
|
| 2 | +authorGithub: wooorm |
| 3 | +authorTwitter: wooorm |
| 4 | +author: Titus Wormer |
2 | 5 | description: Guide that shows the basics of syntax trees (ASTs)
|
3 | 6 | group: guide
|
4 |
| -modified: 2024-08-05 |
5 |
| -published: 2019-12-12 |
| 7 | +modified: 2024-08-14 |
| 8 | +published: 2024-08-14 |
| 9 | +tags: |
| 10 | + - syntax tree |
| 11 | + - unist |
6 | 12 | title: Intro to syntax trees
|
7 | 13 | ---
|
8 | 14 |
|
9 | 15 | ## Introduction to syntax trees
|
10 | 16 |
|
11 |
| -Unfortunately, this guide is not yet written. |
12 |
| -We’re looking for help with that, if you want please edit this file on |
13 |
| -[GitHub][]. |
| 17 | +unified uses abstract syntax trees (abbreviated as ASTs), |
| 18 | +that plugins can work on. |
| 19 | +This guide introduces what ASTs are and how to work with them. |
14 | 20 |
|
15 |
| -[github]: https://github.com/unifiedjs/unifiedjs.github.io/blob/main/doc/learn/introduction-to-syntax-trees.md |
| 21 | +### Contents |
| 22 | + |
| 23 | +* [What is an AST?](#what-is-an-ast) |
| 24 | +* [What is unist?](#what-is-unist) |
| 25 | +* [When to use an AST?](#when-to-use-an-ast) |
| 26 | + |
| 27 | +### What is an AST? |
| 28 | + |
| 29 | +An abstract syntax tree (AST) is a tree representation of the syntax of |
| 30 | +programming languages. |
| 31 | +For us that’s typically markup languages. |
| 32 | + |
| 33 | +As a JavaScript developer you may already know things that are like ASTs: |
| 34 | +The DOM and React’s virtual DOM. |
| 35 | +Or you may have heard of Babel, ESLint, PostCSS, Prettier, or TypeScript. |
| 36 | +They all use ASTs to inspect and transform code. |
| 37 | + |
| 38 | +In unified, |
| 39 | +we support *several* ASTs. |
| 40 | +The reason for different ASTs is that each markup language has several aspects |
| 41 | +that do not translate 1-to-1 to other markup languages. |
| 42 | +Taking markdown and HTML as an example, |
| 43 | +in some cases markdown has more info than HTML: |
| 44 | +markdown has several ways to add a link |
| 45 | +(“autolinks”: `<https://url>`, |
| 46 | +resource links: `[label](url)`, |
| 47 | +and reference links with definitions: `[label][id]` and `[id]: url`). |
| 48 | +In other cases, |
| 49 | +HTML has more info than markdown. |
| 50 | +It has many tags, |
| 51 | +which add new meaning (semantics), |
| 52 | +that aren’t available in markdown. |
| 53 | +If there was one AST, |
| 54 | +it would be quite hard to do the tasks that several remark and rehype plugins |
| 55 | +now do. |
| 56 | + |
| 57 | +See [“How to build a syntax tree”][build-a-syntax-tree] for more info on how to |
| 58 | +make a tree. |
| 59 | +See [“Syntax trees in TypeScript”][syntax-trees-in-typescript] on how to work |
| 60 | +with ASTs in TypeScript. |
| 61 | + |
| 62 | +### What is unist? |
| 63 | + |
| 64 | +But all our ASTs have things in common. |
| 65 | +The bit in common is called unist. |
| 66 | +By having a shared interface, |
| 67 | +we can also share tools that work on all ASTs. |
| 68 | +In practice, |
| 69 | +that means you can use for example [`unist-util-visit`][unist-util-visit] |
| 70 | +to visit nodes in any supported AST. |
| 71 | + |
| 72 | +See [“Tree traversal”][tree-traversal] for more info on `unist-util-visit`. |
| 73 | + |
| 74 | +unist is different from the ASTs used in other tools. |
| 75 | +Quite noticeable because it uses a particular set of names for things: |
| 76 | +`type`, `children`, `position`. |
| 77 | +But perhaps harder to see is that it’s compatible with JSON. |
| 78 | +It’s all objects and arrays. |
| 79 | +Strings, |
| 80 | +numbers. |
| 81 | +Where other tools use instances with methods, |
| 82 | +we use plain data. |
| 83 | +Years ago in retext we started out like that too. |
| 84 | +But we found that we preferred to be able to read and write a tree from/to a |
| 85 | +JSON file, |
| 86 | +to treat ASTs as data, |
| 87 | +and use more functional utilities. |
| 88 | + |
| 89 | +### When to use an AST? |
| 90 | + |
| 91 | +You can use an AST when you want to inspect or transform content. |
| 92 | + |
| 93 | +Say you wanted to count the number of headings in a markdown file. |
| 94 | +You could also do that with a regex: |
| 95 | + |
| 96 | +```js twoslash |
| 97 | +/// <reference types="node" /> |
| 98 | +// ---cut--- |
| 99 | +const value = `# Pluto |
| 100 | +
|
| 101 | +Pluto is a dwarf planet in the Kuiper belt. |
| 102 | +
|
| 103 | +## History |
| 104 | +
|
| 105 | +### Discovery |
| 106 | +
|
| 107 | +In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the |
| 108 | +position of…` |
| 109 | + |
| 110 | +const expression = /^#+[^\r\n]+/gm |
| 111 | +const headings = [...value.matchAll(expression)].length |
| 112 | + |
| 113 | +console.log(headings) //=> 3 |
| 114 | +``` |
| 115 | + |
| 116 | +But what if the headings were in a code block? |
| 117 | +Or if Setext headings were used instead of ATX headings? |
| 118 | +The grammar of markdown is more complex than a regex can handle. |
| 119 | +That’s where an AST can help. |
| 120 | + |
| 121 | +```js twoslash |
| 122 | +/// <reference types="node" /> |
| 123 | +// ---cut--- |
| 124 | +import {fromMarkdown} from 'mdast-util-from-markdown' |
| 125 | +import {visit} from 'unist-util-visit' |
| 126 | + |
| 127 | +const value = `# Pluto |
| 128 | +
|
| 129 | +Pluto is a dwarf planet in the Kuiper belt. |
| 130 | +
|
| 131 | +## History |
| 132 | +
|
| 133 | +### Discovery |
| 134 | +
|
| 135 | +In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the |
| 136 | +position of…` |
| 137 | + |
| 138 | +const tree = fromMarkdown(value) |
| 139 | + |
| 140 | +let headings = 0 |
| 141 | + |
| 142 | +visit(tree, 'heading', function () { |
| 143 | + headings++ |
| 144 | +}) |
| 145 | + |
| 146 | +console.log(headings) //=> 3 |
| 147 | +``` |
| 148 | + |
| 149 | +See [“Tree traversal”][tree-traversal] for more info on `unist-util-visit`. |
| 150 | + |
| 151 | +[unist-util-visit]: https://github.com/syntax-tree/unist-util-visit |
| 152 | + |
| 153 | +[build-a-syntax-tree]: /learn/recipe/build-a-syntax-tree/ |
| 154 | + |
| 155 | +[syntax-trees-in-typescript]: /learn/guide/syntax-trees-typescript/ |
| 156 | + |
| 157 | +[tree-traversal]: /learn/recipe/tree-traversal/ |
0 commit comments