Skip to content

Commit 956d3ea

Browse files
bagel897tkucar
authored and
tkucar
committed
docs: Add docs for incremental recomputation (#311)
1 parent fbeb41b commit 956d3ea

File tree

3 files changed

+127
-3
lines changed

3 files changed

+127
-3
lines changed

architecture/6. incremental-computation/A. Overview.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,46 @@
11
# Incremental Computation
22

3-
TODO
3+
After we performed some changes to the codebase, we may need to recompute the codebase graph.
4+
This is not a trivial task, because we need to be able to recompute the codebase graph incrementally and efficiently.
5+
6+
## Use Cases
7+
8+
### 1. Repeated Moves
9+
10+
```python
11+
# file1.py
12+
def foo():
13+
return bar()
14+
15+
16+
def bar():
17+
return 42
18+
```
19+
20+
Let's move symbol `bar` to `file2.py`
21+
22+
```python
23+
# file2.py
24+
def bar():
25+
return 42
26+
```
27+
28+
Then we move symbol `foo` to `file3.py`
29+
30+
```python
31+
# file3.py
32+
from file2 import bar
33+
34+
35+
def foo():
36+
return bar()
37+
```
38+
39+
You'll notice we have added an import from file2, not file1. This means that before we can move foo to file3, we need to sync the graph to reflect the changes in file2.
40+
41+
### 2. Branching
42+
43+
If we want to checkout a different branch, we need to update the baseline state to the git commit of the new branch and recompute the codebase graph.
444

545
## Next Step
646

architecture/6. incremental-computation/B. Change Detection.md

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,57 @@
11
# Change Detection
22

3-
TODO
3+
## Lifecycle of an operation on the codebase graph
4+
5+
Changes will go through 4 states. By default, we do not apply changes to the codebase graph, only to the filesystem.
6+
7+
### Pending transactions
8+
9+
After calling an edit or other transaction method, the changes are stored in a pending transaction. Pending transactions will be committed as described in the previous chapter.
10+
11+
### Pending syncs
12+
13+
After a transaction is committed, the file is marked as a pending sync. This means the filesystem state has been updated, but the codebase graph has not been updated yet.
14+
15+
### Applied syncs
16+
17+
When we sync the graph, we apply all the pending syncs and clear them. The codebase graph is updated to reflect the changes. We track all the applied syncs in the codebase graph.
18+
19+
### Saved/baseline state
20+
21+
Finally, we can set the baseline state to a git commit. This is the state we target when we reset the codebase graph. When we checkout branches, we update the baseline state.
22+
23+
## Change Detection
24+
25+
When we sync or build the graph, first we build a list of all files in 3 categories:
26+
27+
- Removed files
28+
- Added files
29+
- Files to repase
30+
31+
For example, if we move a file, it will be in the added and removed files
32+
If we add a file, it will be in the added files even if we peformed edits on it later.
33+
34+
## Codebase.commit logic
35+
36+
We follow the following logic
37+
38+
1. Commit all pending transactions
39+
1. Write all buffered files to the disk
40+
1. Store this to pending changes (usually we will skip the remaining steps if we commit without syncing the graph)
41+
1. Build list of removed, added and modified files from pending changes
42+
1. For removed files, we need to remove all the edges that point to the file.
43+
1. For added files, we need to add all the edges that point to the file.
44+
1. For modified files, we remove all the edges that point to the file and add all the edges that point to the new file. This is complicated since edges may pass through the modified file and need to be intelligently updated.
45+
1. Mark all pending changes as applied
46+
47+
## Reset logic
48+
49+
Reset is just the inverse of commit. We need to
50+
51+
1. Cancel all pending transactions
52+
1. Restore file state to the state to the target git commit
53+
1. Clear all pending changes to the graph
54+
1. Reverse all applied syncs to the graph
455

556
## Next Step
657

architecture/6. incremental-computation/C. Graph Recomputation.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,39 @@
11
# Graph Recomputation
22

3-
TODO
3+
## Node Reparsing
4+
5+
Some limitations we encounter are:
6+
7+
- It is non-trivial to update tree sitter nodes, and the SDK has no method to do this.
8+
- Therefore, all existing nodes are invalidated and need to be recomputed every time filesystem state changes.
9+
10+
Therefore, to recompute the graph, we must first have the filesystem state updated. Then we can remove all nodes in the modified files and create new nodes in the modified files.
11+
12+
## Edge Recomputation
13+
14+
- Nodes may either use (out edges) or be used by (in edges) other nodes.
15+
- Recomputing the out-edges is straightforward, we just need to reparse the file and compute dependencies again.
16+
- Recomputing the in-edges is more difficult.
17+
- The basic algorithm of any incremental computation engine is to:
18+
- Detect what changed
19+
- Update that query with the new data
20+
- If the output of the query changed, we need to update all the queries that depend on that query.
21+
22+
### Detecting what changed
23+
24+
A difficulty is that the nodes are completely freshed for updated files. Therefore, this by default will include all nodes in updated files.
25+
26+
### Updating the query
27+
28+
To do this, we:
29+
30+
- Wipe the entire cache of the query engine
31+
- Remove all existing out edges of the node
32+
- Recompute dependencies of that node
33+
34+
### Update what changed
35+
36+
This part has not been fully implemented yet. Currently, we update all the nodes that are descendants of the changed node and all the nodes in the file.
437

538
## Next Step
639

0 commit comments

Comments
 (0)