Skip to content

Fix gossip using gossip_timestamp_filter instead of queries #1382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

See large comment added for more details

@TheBlueMatt
Copy link
Collaborator Author

CC @t-bast please tell me there's something substantial I've missed here and there is a way to keep the graph in sync that is smarter than this.

@codecov-commenter
Copy link

codecov-commenter commented Mar 24, 2022

Codecov Report

Merging #1382 (83ebc1d) into main (f6fa8e9) will increase coverage by 0.03%.
The diff coverage is 90.90%.

❗ Current head 83ebc1d differs from pull request most recent head 1386a0c. Consider uploading reports for the commit 1386a0c to get more accurate results

@@            Coverage Diff             @@
##             main    #1382      +/-   ##
==========================================
+ Coverage   90.81%   90.84%   +0.03%     
==========================================
  Files          73       73              
  Lines       41209    41294      +85     
  Branches    41209    41294      +85     
==========================================
+ Hits        37424    37514      +90     
+ Misses       3785     3780       -5     
Impacted Files Coverage Δ
lightning/src/routing/network_graph.rs 89.03% <90.90%> (-0.51%) ⬇️
lightning/src/ln/chanmon_update_fail_tests.rs 98.07% <0.00%> (+0.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6fa8e9...1386a0c. Read the comment docs.

Copy link

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted, this is far from ideal, but I'm not sure we can do much better today.

//
// Obnoxiously, `gossip_timestamp_filter` isn't *just* a filter, but its also a request for
// your peer to send you the full routing graph. Thus, in order to tell a peer to send you
// any updates as it sees them, you have to also ask for the full routing graph to be
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand that part, what do you mean here by "full routing graph"?

If you sent a gossip_timestamp_filter set to now() - 1 hour for example, your peer should initially send you what changed in the last hour, and will then send new updates as it sees them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC it sends you what messages have a timestamp field in the last hour, not what messages it has received in the last hour. Given gossip can take some time to propagate, presumably this could lead to missing updates.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, but when you say its also a request for your peer to send you the full routing graph that sentence is quite misleading, it's only a request for your peer to send you all the updates starting from that timestamp (it's unrelated to the fact that you may miss updates by providing a timestamp that's too close from now)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks, I added a (subject to the filter).

Comment on lines +449 to +451
// In an attempt to cut a middle ground between always fetching the full graph from all of
// our peers and never receiving gossip from peers at all, we send all of our peers a
// `gossip_timestamp_filter`, with the filter time set either two weeks ago or an hour ago.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed quite a hard problem and I believe this is a good middle ground. You don't know what you previously missed, so you should somewhat regularly do a full sync (gossip_timestamp_filter of two weeks). It also depends a lot on whether your node is mostly online or mostly offline. If you're mostly online, you will receive updates through staggered broadcast, which can make up for some of the previous updates that you missed, even if you don't do a two weeks sync. If you're mostly offline though, you should do more frequent full syncs (also because the last time you attempted a full sync, you may have disconnected before it was complete).

You could also imagine more complex strategies, where you would have some heuristics based on the graph size and/or path-finding success to trigger a full sync (if path-finding fails to find routes too often, you're probably missing things so you may want to initiate a full sync), but I believe that doing a full sync with at least one of your peers when you reconnect is a wasteful but much simpler way to cover this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the please-include-timestamps extension to gossip queries would let us actually figure out exactly what we're missing, except (a) lnd doesn't support it, I think? and (b) after we did that we'd really like to send a gossip timestamp filter of 0/2 weeks ago without getting a full graph dump, but it seems you can't do that in the current protocol?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) lnd doesn't support it, I think?

Damn, I didn't remember that (I haven't checked though). But that's why we did support the timestamps extension in eclair, because it made it almost practical to sync on eclair-mobile when connected to nodes that supported this extension (like ours).

(b) after we did that we'd really like to send a gossip timestamp filter of 0/2 weeks ago without getting a full graph dump

There may be something I'm misunderstanding here when you say "getting a full graph dump".
At least in eclair, we don't do anything proactively when you send us a gossip_timestamp_filter, we only attach this value to the connection and will not propagate updates that happened before this timestamp through staggered broadcast. But we don't actively send you anything in response to gossip_timestamp_filter...?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't actively send you anything in response to gossip_timestamp_filter...?

Ah, you interpreted the spec the way I did...which is different from c-lightning and lnd. The spec says this:

The receiver:
SHOULD send all gossip messages whose timestamp is greater or equal to first_timestamp, and less than first_timestamp plus timestamp_range.
MAY wait for the next outgoing gossip flush to send these.

I (and apparently you) thought that just meant "going forward", c-lightning and lnd think that means "send it now". Thus, if you set the first_timestampfield to two weeks ago, both will send you a full gossip dump!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I now understand your comments better, I never realized c-lightning and lnd understood it this way (which shows I didn't do enough cross-compat testing of gossip...). I did understand it the same way as you did: gossip_timestamp_filter is just something that applies to (future) staggered broadcast.

It would make sense to clarify the spec and see which behavior is more desirable.

// peer receives after you connect, you must send a `gossip_timestamp_filter` message. This
// message, as the name implies, tells the peer to not forward any gossip messages with a
// timestamp older than a given value (not the time the peer received the filter, but the
// timestamp in the update message, which is often hours behind when the peer received the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have data on this? The last time Rusty did some analysis, I believe it took a lot less than an hour for most updates to propagate on mainnet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any new data, no, my recollection from Rusty's last analysis was there was quite a long tail, even though most updates propagated relatively quickly. That said, given nodes may use block time instead of wall clock (LDK does this), timestamps may be quite a while back when a message is generated. Worse, clock-off-by-an-hour is a pretty common thing for users who mis-set clocks, though a bit less so with time sync these days.

@jkczyz jkczyz added this to the 0.0.106 milestone Mar 24, 2022
jkczyz
jkczyz previously approved these changes Mar 25, 2022
See large comment added for more details
@TheBlueMatt
Copy link
Collaborator Author

Squashed without change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants