Skip to content

Commit e756064

Browse files
authored
docs: add supervision tree graph and update the rest of the architecture document (#1100)
1 parent b3c86e2 commit e756064

File tree

1 file changed

+226
-77
lines changed

1 file changed

+226
-77
lines changed

docs/architecture.md

Lines changed: 226 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,121 +1,270 @@
1-
# Architecture of a consensus node
1+
# Architecture of the consensus node
22

3+
## Processes summary
34

4-
This is a block diagram of all the parts of the system and will be updated as needed.
5+
### Supervision tree
6+
7+
This is our complete supervision tree.
8+
9+
```mermaid
10+
graph LR
11+
Application[Application <br> <:one_for_one>]
12+
BeaconNode[BeaconNode <br> <:one_for_all>]
13+
P2P.IncomingRequests[P2P.IncomingRequests <br> <:one_for_one>]
14+
ValidatorManager[ValidatorManager <br> <:one_for_one>]
15+
Telemetry[Telemetry <br> <:one_for_one>]
16+
17+
Application --> Telemetry
18+
Application --> DB
19+
Application --> Blocks
20+
Application --> BlockStates
21+
Application --> Metadata
22+
Application --> BeaconNode
23+
Application --> BeaconApi.Endpoint
24+
25+
BeaconNode -->|genesis_time,<br>genesis_validators_root,<br> fork_choice_data, time| BeaconChain
26+
BeaconNode -->|store, head_slot, time| ForkChoice
27+
BeaconNode -->|listen_addr, <br>enable_discovery, <br> discovery_addr, <br>bootnodes| P2P.Libp2pPort
28+
BeaconNode --> P2P.Peerbook
29+
BeaconNode --> P2P.IncomingRequests
30+
BeaconNode --> PendingBlocks
31+
BeaconNode --> SyncBlocks
32+
BeaconNode --> Attestation
33+
BeaconNode --> BeaconBlock
34+
BeaconNode --> BlobSideCar
35+
BeaconNode --> OperationsCollector
36+
BeaconNode -->|slot, head_root| ValidatorManager
37+
BeaconNode -->|genesis_time, snapshot, votes| ExecutionChain
38+
ValidatorManager --> ValidatorN
39+
40+
P2P.IncomingRequests --> IncomingRequests.Handler
41+
P2P.IncomingRequests --> IncomingRequests.Receiver
42+
43+
Telemetry --> :telemetry_poller
44+
Telemetry --> TelemetryMetricsPrometheus
45+
```
46+
47+
Each box is a process. If it has children, it's a supervisor, with it's restart strategy clarified.
48+
49+
If it's a leaf in the tree, it's a GenServer, task, or other non-supervisor process. The tags in the edges/arrows are the init args passed on children init (start or restart after crash).
50+
51+
### High level interaction
52+
53+
This is the high level interaction between the processes.
554

655
```mermaid
756
graph LR
8-
subgraph "Consensus Node"
9-
engine[Engine API Client]
10-
BAPI[Beacon API]
11-
TICK[Slot processor]
12-
blk_db[Block DB]
13-
BS_db[Beacon State DB]
14-
brod[Broadway]
15-
FCTree[Fork choice store - Genserver]
16-
BAPI -->|Beacon state queries| BS_db
17-
brod -->|Save blocks| blk_db
18-
brod -->|Blocks and attestations| FCTree
19-
TICK -->|New ticks| FCTree
20-
BAPI --> engine
21-
BAPI --> |head/slot requests| FCTree
22-
brod --> |Save new states|BS_db
57+
58+
ExecutionChain
59+
60+
BlobDb
61+
BlockDb
62+
63+
subgraph "P2P"
64+
Libp2pPort
65+
Peerbook
66+
IncomingRequests
67+
Attestation
68+
BeaconBlock
69+
BlobSideCar
70+
Metadata
2371
end
24-
GOS[Gossip Protocols]
25-
exec[Execution Client]
26-
VALIDATOR[Validator]
27-
engine <--> |payload validation, execution| exec
28-
GOS -->|blocks, attestations| brod
29-
VALIDATOR --> BAPI
72+
73+
subgraph "Node"
74+
Validator
75+
BeaconChain
76+
ForkChoice
77+
PendingBlocks
78+
OperationsCollector
79+
end
80+
81+
BeaconChain <-->|on_tick <br> get_fork_digest, get_| Validator
82+
BeaconChain -->|on_tick| BeaconBlock
83+
BeaconChain <-->|on_tick <br> update_fork_choice_cache| ForkChoice
84+
BeaconBlock -->|add_block| PendingBlocks
85+
Validator -->|get_eth1_data <br>to build blocks| ExecutionChain
86+
Validator -->|publish block| Libp2pPort
87+
Validator -->|collect, stop_collecting| Attestation
88+
Validator -->|get slashings, <br>attestations,<br> voluntary exits|OperationsCollector
89+
Validator -->|store_blob| BlobDb
90+
ForkChoice -->|notify new block|Validator
91+
ForkChoice <-->|notify new block <br> on_attestation|OperationsCollector
92+
ForkChoice -->|notify new block|ExecutionChain
93+
ForkChoice -->|store_block| BlockDb
94+
PendingBlocks -->|on_block| ForkChoice
95+
PendingBlocks -->|get_blob_sidecar|BlobDb
96+
Libp2pPort <-->|gosipsub <br> validate_message| BlobSideCar
97+
Libp2pPort <-->|gossipsub <br> validate_message<br> subscribe_to_topic| BeaconBlock
98+
Libp2pPort <-->|gossipsub <br> validate_message<br> subscribe_to_topic| Attestation
99+
Libp2pPort -->|store_blob| BlobDb
100+
Libp2pPort -->|new_peer| Peerbook
101+
BlobSideCar -->|store_blob| BlobDb
102+
Attestation -->|set_attnet|Metadata
103+
IncomingRequests -->|get seq_number|Metadata
104+
PendingBlocks -->|penalize/get<br>on downloading|Peerbook
105+
Libp2pPort -->|new_request| IncomingRequests
30106
```
31107

32-
## Networking
108+
## Sequences
109+
110+
This section contains sequence diagrams representing the interaction of processes through time in response to a stimulus. The main entry point for new events is through gossip and request-response protocols, which is how nodes communicates between each other.
111+
112+
Request-response is a simple protocol where client request for specific data such as old blocks that they may be missing or other clients metadata.
113+
114+
Gossip allows clients to subscribe to different topics (hence the name "gossipsub") they are interested in, and get updates for them. This is how a node receives new blocks or attestations from their peers.
115+
116+
We use the `go-libp2p` library for the networking primitives, which is an implementation of the `libp2p` networking stack. We use ports to communicate with a go application and Broadway to process notifications. This port has a GenServer owner called `Libp2pPort`.
117+
118+
# Gossipsub
33119

34-
The main entry for new events is the gossip protocol, which is how our consensus node communicates with other consensus nodes. This includes:
120+
### Subscribing
35121

36-
1. Discovery: our node has a series of known `bootnodes` hardcoded. We request a list of the nodes they know about and add them to our list. We save them locally and now can use those too to request new nodes.
37-
2. Message propagation. When a proposer sends a new block, or validators attest for a new block, they send those to other known nodes. Those, in turn, propagate the messages sent to other nodes. This process is repeated until, ideally, the whole network receives the messages.
122+
At the beginning of the application we subscribe a series of handler processes that will react to new gossipsub events:
38123

39-
We use the `go-libp2p` library for the networking primitives, which is an implementation of the `libp2p` networking stack.
124+
- `Gossip.BeaconBlock` will handle topic `/eth2/<context>/beacon_block/ssz_snappy`.
125+
- `Gossip.BlobSideCar` will subscribe to all blob subnet topics. They're names are of the form `/eth2/<context>/blob_sidecar_<subnet_index>`.
126+
- `Gossip.OperationsCollector` will subscribe to operations `beacon_aggregate_and_proof` (attestations), `voluntary_exit`, `proposer_slashing`, `attester_slashing`, `bls_to_execution_change`.
40127

41-
We use ports to communicate with a go application and Broadway to process notifications.
128+
This is the process of subscribing, taking the operations collector as an example:
42129

43-
**TO DO**: We need to document the port's architecture.
130+
```mermaid
131+
sequenceDiagram
132+
participant sync as SyncBlocks
133+
participant ops as OperationsCollector
134+
participant p2p as Libp2pPort <br> (GenServer)
135+
participant port as Go Libp2p<br>(Port)
136+
137+
ops ->>+ ops: init()
138+
139+
loop
140+
ops ->> p2p: join_topic
141+
p2p ->> port: join_command
44142
45-
## Gossipsub
143+
end
144+
deactivate ops
145+
146+
sync ->>+ ops: start()
46147
47-
One of the main communication protocols is GossipSub. This allows us to tell peers which topics we're interested in and receive events for them. The main external events we react to are blocks and attestations.
148+
loop
149+
ops ->> p2p: subscribe_to_topic
150+
p2p ->> port: subscribe_command <br> from: operations_collector_pid
151+
end
152+
deactivate ops
153+
154+
port ->>+ p2p: gossipsub_message <br> handler: operations_collector_pid
155+
p2p ->> ops: {:gossipsub, topic, message}
156+
deactivate p2p
157+
activate ops
158+
ops ->>ops: handle_msg(message)
159+
deactivate ops
160+
```
161+
162+
Joining a topic allows the node to get the messages and participate in gossip for that topic. Subscribing means that the node will actually read the contents of those messages.
163+
164+
We delay the subscription until the sync is finished to guarantee that we're at a point where we can process the messages that we receive.
165+
166+
This will send the following message to the go libp2p app:
167+
168+
```elixir
169+
%Command{
170+
from: self() |> :erlant.term_to_binary(),
171+
c: {:subscribe, %SubscribeToTopic{name: topic_name}}
172+
}
173+
```
174+
175+
`self` here is the caller, which is `OperationsCollector`'s pid. The go side will save that binary representation of the pid and attach it to gossip messages that arrive for that topic. That is, the messages that will be notified to `Libp2pPort` will be of the form:
176+
177+
```elixir
178+
%Gossipsub{
179+
handler: operations_collector_pid,
180+
message: message,
181+
msg_id: id
182+
}
183+
```
184+
185+
The operations collector then handles that message on `handle_info`, which means it deserializes and decompresses each message and then call specific handlers for that topic.
48186

49187
### Receiving an attestation
50188

189+
This is the intended way to process attestations in the current architecture, although the fork choice call is disabled and only attestations in blocks are being processed.
190+
51191
```mermaid
52192
sequenceDiagram
53-
participant prod as Topic Producer (GenStage)
54-
participant proc as Topic Processor (Broadway)
55-
participant FC as Fork-choice store
193+
participant p2p as Libp2pPort
194+
participant ops as OperationsCollector
195+
participant fc as ForkChoice
196+
197+
p2p ->>+ ops: {:gossipsub, "beacon_aggregate_and_proof", att}
198+
ops ->> ops: Decompress and deserialize message
199+
ops ->>+ fc: on_attestation() <br> (disabled)
200+
fc ->>- fc: Handlers.on_attestation(store, attestation, false)
201+
ops ->>- ops: handle_msg({:attestation, aggregate}, state)
202+
56203
57-
prod ->> proc: Produce demand
58-
proc ->> proc: Decompress and deserialize message
59-
proc ->>+ proc: on_attestation()
60-
proc ->> FC: request latest message by the same validator
61-
FC -->> proc: return
62-
proc ->> proc: Validate attestation
63-
proc ->>- FC: Update fork-choice store weights
64204
```
65205

66-
When receiving an attestation, it's processed by the [on_attestation](https://eth2book.info/capella/annotated-spec/#on_attestation) callback. We just validate it and send it to the fork choice store to update its weights and target checkpoints. The attestation is only processed if this attestation is the latest message by that validator. If there's a newer one, it should be discarded.
206+
When receiving an attestation, the ForkChoice GenServer takes the current store object and modifies it using the [`on_attestation`](https://eth2book.info/capella/annotated-spec/#on_attestation) handler. It validates it and updates the fork tree weights and target checkpoints. The attestation is only processed if this attestation is the latest message by that validator. If there's a newer one, it should be discarded.
67207

68-
The most relevant piece of the spec here is the [get_weight](https://eth2book.info/capella/annotated-spec/#get_weight) function, which is the core of the fork-choice algorithm. In the specs, this function is called on demand, when calling [get_head](https://eth2book.info/capella/annotated-spec/#get_head), works with the store's values, and recalculates them each time. In our case, we cache the weights and the head root each time we add a block or attestation, so we don't need to do the same calculations again.
208+
The most relevant piece of the spec here is the [get_weight](https://eth2book.info/capella/annotated-spec/#get_weight) function, which is the core of the fork-choice algorithm. In the specs, this function is called on demand, when calling [get_head](https://eth2book.info/capella/annotated-spec/#get_head), works with the store's values, and recalculates them each time. In our case, we cache the weights and the head root each time we add a block or attestation, so we don't need to do the same calculations again.
69209

70210
**To do**: we should probably save the latest messages in persistent storage as well so that if the node crashes we can recover the tree weights.
71211

72212
### Receiving a block
73213

214+
A block is first received and sent to the `PendingBlocks` GenServer, which checks if the block has everything needed or if it's duplicated before letting it be processed.
215+
74216
```mermaid
75217
sequenceDiagram
76-
participant prod as Topic Producer (GenStage)
77-
participant proc as Topic Processor (Broadway)
78-
participant block as Block DB
79-
participant state as Beacon States DB
218+
participant port as Libp2pPort
219+
participant block as BeaconBlock
220+
participant pending as PendingBlocks
221+
222+
activate block
223+
port ->> block: {:gossipsub, {topic, id, message}}
224+
block ->> block: Decompress and deserialize message
225+
block ->> port: validate_message(id, :accept)
226+
block ->> pending: {:add_block, SignedBeaconBlock}
227+
deactivate block
228+
229+
```
230+
231+
However, the block isn't processed immediately. Once every 500ms, `PendingBlocks` checks if there are blocks that should be processed and does so.
232+
233+
```mermaid
234+
sequenceDiagram
235+
participant pending as PendingBlocks
80236
participant FC as Fork-choice store
81-
participant exec as Execution Client
82-
83-
prod ->> proc: Produce demand
84-
proc ->> proc: Decompress and deserialize message
85-
proc ->>+ proc: on_block(block)
86-
proc ->> exec: Validate execution payload
87-
exec -->> proc: ok
88-
proc ->> FC: request validation metadata
89-
FC -->> proc: return
90-
proc ->> proc: Validate block
91-
proc ->> block: Save new block
92-
proc ->> proc: Calculate state transition
93-
proc ->> state: Save new beacon state metadata
94-
proc ->> FC: Add a new block to the tree and update weights
95-
loop
96-
proc ->>- proc: process_operations
97-
end
237+
participant rec as recompute_head <br> (async task)
238+
participant StoreDB
239+
participant subs as OperationsCollector <br> Validators <br> ExecutionClient <br> BeaconChain
240+
241+
pending ->> pending: :process_blocks
242+
activate pending
98243
loop
99-
proc ->> proc: on_attestation
244+
pending ->> pending: check which blocks are pending <br> check parent is downloaded and processed
245+
pending ->>+ FC: {:on_block, block_root, signed_block, from}
246+
deactivate pending
100247
end
101-
```
102248
103-
Receiving a block is more complex:
249+
FC ->> FC: process_block
250+
FC ->>+ rec: recompute_head(store)
251+
FC ->> FC: prune_old_states
252+
FC ->> pending: {:block_processed, block_root, true}
253+
deactivate FC
104254
105-
- The block itself needs to be stored.
106-
- The state transition needs to be applied, a new beacon state calculated, and stored separately.
107-
- A new node needs to be added to the block tree aside from updating weights.
108-
- on_attestation needs to be called for each attestation.
255+
rec ->> StoreDB: pruned_store
256+
rec ->> rec: Handlers.get_head()
257+
rec ->> subs: notify_new_block
109258
110-
Also, there's a more complex case: we can only include a block in the fork tree if we know of its parents and their connection with our current finalized checkpoint. If we receive a disconnected node, we'll need to use Request-Response to ask peers for the missing blocks.
259+
```
111260

112-
## Request-Response
261+
For the happy path, shown above, fork choice store calculates the state transition, and notifies the pending blocks GenServer that the block was correctly processed, so it can mark it as such.
113262

114-
**TO DO**: document how ports work for this.
263+
Asynchronously, a new task is started to recompute the new head, as this takes a significant amount of time. When the head is recomputed, multiple processes are notified.
115264

116-
### Pending blocks
265+
## Request-Response
117266

118-
**TO DO**: document pending blocks design.
267+
**TO DO**: document how ports work for this.
119268

120269
### Checkpoint sync
121270

0 commit comments

Comments
 (0)