-
Notifications
You must be signed in to change notification settings - Fork 411
Fix gossip using gossip_timestamp_filter
instead of queries
#1382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gossip using gossip_timestamp_filter
instead of queries
#1382
Conversation
CC @t-bast please tell me there's something substantial I've missed here and there is a way to keep the graph in sync that is smarter than this. |
Codecov Report
@@ Coverage Diff @@
## main #1382 +/- ##
==========================================
+ Coverage 90.81% 90.84% +0.03%
==========================================
Files 73 73
Lines 41209 41294 +85
Branches 41209 41294 +85
==========================================
+ Hits 37424 37514 +90
+ Misses 3785 3780 -5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granted, this is far from ideal, but I'm not sure we can do much better today.
// | ||
// Obnoxiously, `gossip_timestamp_filter` isn't *just* a filter, but its also a request for | ||
// your peer to send you the full routing graph. Thus, in order to tell a peer to send you | ||
// any updates as it sees them, you have to also ask for the full routing graph to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand that part, what do you mean here by "full routing graph"?
If you sent a gossip_timestamp_filter
set to now() - 1 hour
for example, your peer should initially send you what changed in the last hour, and will then send new updates as it sees them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC it sends you what messages have a timestamp field in the last hour, not what messages it has received in the last hour. Given gossip can take some time to propagate, presumably this could lead to missing updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, but when you say its also a request for your peer to send you the full routing graph
that sentence is quite misleading, it's only a request for your peer to send you all the updates starting from that timestamp (it's unrelated to the fact that you may miss updates by providing a timestamp that's too close from now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks, I added a (subject to the filter)
.
// In an attempt to cut a middle ground between always fetching the full graph from all of | ||
// our peers and never receiving gossip from peers at all, we send all of our peers a | ||
// `gossip_timestamp_filter`, with the filter time set either two weeks ago or an hour ago. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed quite a hard problem and I believe this is a good middle ground. You don't know what you previously missed, so you should somewhat regularly do a full sync (gossip_timestamp_filter
of two weeks). It also depends a lot on whether your node is mostly online or mostly offline. If you're mostly online, you will receive updates through staggered broadcast, which can make up for some of the previous updates that you missed, even if you don't do a two weeks sync. If you're mostly offline though, you should do more frequent full syncs (also because the last time you attempted a full sync, you may have disconnected before it was complete).
You could also imagine more complex strategies, where you would have some heuristics based on the graph size and/or path-finding success to trigger a full sync (if path-finding fails to find routes too often, you're probably missing things so you may want to initiate a full sync), but I believe that doing a full sync with at least one of your peers when you reconnect is a wasteful but much simpler way to cover this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the please-include-timestamps extension to gossip queries would let us actually figure out exactly what we're missing, except (a) lnd doesn't support it, I think? and (b) after we did that we'd really like to send a gossip timestamp filter of 0/2 weeks ago without getting a full graph dump, but it seems you can't do that in the current protocol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a) lnd doesn't support it, I think?
Damn, I didn't remember that (I haven't checked though). But that's why we did support the timestamps extension in eclair, because it made it almost practical to sync on eclair-mobile when connected to nodes that supported this extension (like ours).
(b) after we did that we'd really like to send a gossip timestamp filter of 0/2 weeks ago without getting a full graph dump
There may be something I'm misunderstanding here when you say "getting a full graph dump".
At least in eclair, we don't do anything proactively when you send us a gossip_timestamp_filter
, we only attach this value to the connection and will not propagate updates that happened before this timestamp through staggered broadcast. But we don't actively send you anything in response to gossip_timestamp_filter
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we don't actively send you anything in response to gossip_timestamp_filter...?
Ah, you interpreted the spec the way I did...which is different from c-lightning and lnd. The spec says this:
The receiver:
SHOULD send all gossip messages whose timestamp is greater or equal to first_timestamp, and less than first_timestamp plus timestamp_range.
MAY wait for the next outgoing gossip flush to send these.
I (and apparently you) thought that just meant "going forward", c-lightning and lnd think that means "send it now". Thus, if you set the first_timestamp
field to two weeks ago, both will send you a full gossip dump!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! I now understand your comments better, I never realized c-lightning and lnd understood it this way (which shows I didn't do enough cross-compat testing of gossip...). I did understand it the same way as you did: gossip_timestamp_filter
is just something that applies to (future) staggered broadcast.
It would make sense to clarify the spec and see which behavior is more desirable.
// peer receives after you connect, you must send a `gossip_timestamp_filter` message. This | ||
// message, as the name implies, tells the peer to not forward any gossip messages with a | ||
// timestamp older than a given value (not the time the peer received the filter, but the | ||
// timestamp in the update message, which is often hours behind when the peer received the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have data on this? The last time Rusty did some analysis, I believe it took a lot less than an hour for most updates to propagate on mainnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any new data, no, my recollection from Rusty's last analysis was there was quite a long tail, even though most updates propagated relatively quickly. That said, given nodes may use block time instead of wall clock (LDK does this), timestamps may be quite a while back when a message is generated. Worse, clock-off-by-an-hour is a pretty common thing for users who mis-set clocks, though a bit less so with time sync these days.
See large comment added for more details
83ebc1d
to
1386a0c
Compare
Squashed without change. |
See large comment added for more details