Skip to content

Use unique per-node "node_counter"s rather than a node hashmap in routing #3104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

During routing, we spend most of our time doing hashmap lookups. It turns out, we can drop two of them, the first requires a good bit of work - assigning each node in memory a random u32 "node counter", we can then drop the main per-node routefinding state map and replace it with a vec. Once we do that, we can also drop the first-hop hashmap lookup that we do on a per node basis as we walk the network graph, replacing it with a check in the same vec.

This is just the minimal node_id -> counter change from #2803

@TheBlueMatt TheBlueMatt force-pushed the 2024-06-routing-counters branch from 0d4a06f to 515d757 Compare June 6, 2024 18:49
@codecov-commenter
Copy link

codecov-commenter commented Jun 6, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 93.78698% with 21 lines in your changes missing coverage. Please review.

Project coverage is 91.56%. Comparing base (b8cdde8) to head (4f5e17b).
Report is 122 commits behind head on main.

Files Patch % Lines
lightning/src/routing/router.rs 94.93% 12 Missing ⚠️
lightning/src/routing/gossip.rs 91.66% 3 Missing and 5 partials ⚠️
lightning/src/blinded_path/mod.rs 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3104      +/-   ##
==========================================
+ Coverage   89.82%   91.56%   +1.74%     
==========================================
  Files         119      121       +2     
  Lines       97991   113379   +15388     
  Branches    97991   113379   +15388     
==========================================
+ Hits        88016   103816   +15800     
+ Misses       7400     7032     -368     
+ Partials     2575     2531      -44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tnull
Copy link
Contributor

tnull commented Jun 12, 2024

IMO, feel free to squash, if that's fine with @valentinewallace?

@valentinewallace
Copy link
Contributor

Yup, feel free to squash and rebase!

@TheBlueMatt
Copy link
Collaborator Author

Rebased and squashed fixups.

@TheBlueMatt TheBlueMatt force-pushed the 2024-06-routing-counters branch from 515d757 to 402fcf8 Compare June 12, 2024 15:23
node_entry.into_mut().channels.push(short_channel_id);
let node = node_entry.into_mut();
node.channels.push(short_channel_id);
**node_counter = node.node_counter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following this because the node should already have a counter set, I think? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node_counter is a reference to the channel's node_counter field, not the node's one. I updated the variable name to be clearer.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM

Comment on lines +3024 to +3039
if target_node_counter.is_none() {
break 'path_walk;
}
if target_node_counter == Some(payee_node_counter) { break 'path_walk; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code was a bit more readable, is this change necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the next line unwraps the target_node_counter, so it seemed nice to have the sanity check here that its set?

@valentinewallace
Copy link
Contributor

Btw, some warnings were introduced in CI

@TheBlueMatt TheBlueMatt force-pushed the 2024-06-routing-counters branch from 402fcf8 to 8a2ad24 Compare June 18, 2024 19:09
@tnull tnull self-requested a review June 19, 2024 09:40
These counters are simply a unique number describing each node.
They have no specific meaning, but start at 0 and count up, with
counters being reused after a node has been deleted.
In the next commit, we'll use the new `node_counter`s to remove a
`HashMap` from the router, using a `Vec` to store all our per-node
information. In order to make finding entries in that `Vec` cheap,
here we store the source and destintaion `node_counter`s in
`ChannelInfo`, givind us the counters for both ends of a channel
without doing a second `HashMap` lookup.
@TheBlueMatt TheBlueMatt force-pushed the 2024-06-routing-counters branch from 8a2ad24 to 7bbac2e Compare June 19, 2024 13:57
@TheBlueMatt
Copy link
Collaborator Author

Squashed fixups.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first high-level pass for the new approach, but have yet to get fully into the details. Having the builder already makes a lot of difference w.r.t. readability/maintainability.

As this is still a lot of added complexity, do you have any benchmark numbers indicating how significant the performance improvement actually is?


impl PartialEq for ChannelInfo {
fn eq(&self, o: &ChannelInfo) -> bool {
self.features == o.features &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea if/how we could ensure we don't forget to add any new fields to these PartialEq implementations for ChannelInfo/NodeInfo in the future? Maybe it would be worth to create CountedNodeInfo/CountedChannelInfo wrapper structs and maintain the simple derives on the inner structs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, there's no way with standard rust outside of what you describe. I'm a bit heasitant to jump to a substruct given we currently expose the node/channel maps publicly, and having some kind of wrapper struct would then get exposed in our public API, which would suck :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, rather than exposing the IndexedMap directly (which, due to its custom nature doesn't play nice with a lot of the Rust ecosystem anyways), we might want to provide iterators over the underlying data directly, and possibly a few other accessors that would allow us to keep only exposing NodeInfo/ChannelInfo, while the internal representation includes the wrapper?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a lot of work...I'm not quite sure why the IndexedMap doesn't play nice with existing Rust stuff - it has methods almost the same as a standard Rust map (and we could certainly add more), exposes iterators and Entrys and everything. Doing away with the map being exposed would mean we'd need to replicate most of that in the NetworkGraph API with iterators and accessors, adding a handful of methods. I'm not really sure its worth it here :(

})
}

fn node_counter_from_pubkey(&mut self, pubkey: PublicKey) -> u32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're adding helpers, could we also add a newtype for the counter itself? Could improve readability over having yet another non-descript u32.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from CLTVs, I think this is our only u32? I certainly can add a newtype, but given I believe its only ever accessed/stored in fields titled *_counter I'm not really clear on it improving readability/reducing the chance of bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the block height values are u32, all the fee values are u32, CLTVs, Timestamps are u32s, etc...

IMO we should start to introduce newtypes for a good part of the ubiquitous u64/u32 values internally, but especially when it comes to the API. Besides such instances as here, Amount / FeeRate might be very valuable to avoid any conversion errors, as discussed elsewhere before.

Another example would be to have a ShortChannelId data type that would hide the (somewhat internal/LDK-specific u64 representation) and could display the different kinds of SCIDs differently and allow encoding as BOLT7 BBBxTTTxOOO (without external helpers).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should start to introduce newtypes for a good part of the ubiquitous u64/u32 values

There's a lot of value in newtypes when there's room for confusion, but I don't think that's always the case just because two things have the same type. Otherwise its just boilerplate .0s everywhere for the sake of it and the need to go double-check the type isn't doing something out from under us when we use it.

Besides such instances as here, Amount / FeeRate might be very valuable to avoid any conversion errors, as discussed elsewhere before.

Indeed, mostly because feerates are multiplied to create amounts, so we use two elements of the different types in the same line, but maybe more importantly it would let us consolidate our feerate -> fee amount conversion logic, which we have pasted in a lot of places right now.

Another example would be to have a ShortChannelId data type that would hide the (somewhat internal/LDK-specific u64 representation) and could display the different kinds of SCIDs differently and allow encoding as BOLT7 BBBxTTTxOOO (without external helpers).

Right, this would be nice for the display implementation.

@TheBlueMatt
Copy link
Collaborator Author

TheBlueMatt commented Jun 19, 2024

Very unscientific benchmarks on my laptop (comparing current git with this PR rebased on git) show mean improvement between 14% and ~0. Note of course that if we further optimize the scorer (which we can do quite a bit) these numbers would look better.

Benchmarking generate_routes_with_zero_penalty_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, or reduce sample count to 80.
generate_routes_with_zero_penalty_scorer
                        time:   [55.518 ms 62.282 ms 70.580 ms]
                        change: [-9.9271% +3.7391% +20.115%] (p = 0.64 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  13 (13.00%) low mild
  4 (4.00%) high severe

Benchmarking generate_mpp_routes_with_zero_penalty_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, or reduce sample count to 80.
generate_mpp_routes_with_zero_penalty_scorer
                        time:   [58.569 ms 62.549 ms 66.635 ms]
                        change: [-23.334% -13.029% -1.7878%] (p = 0.03 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  11 (11.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

Benchmarking generate_routes_with_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.9s, or reduce sample count to 20.
generate_routes_with_probabilistic_scorer
                        time:   [158.19 ms 163.96 ms 169.43 ms]
                        change: [-16.324% -12.044% -7.7884%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild

Benchmarking generate_mpp_routes_with_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.4s, or reduce sample count to 30.
generate_mpp_routes_with_probabilistic_scorer
                        time:   [153.42 ms 162.01 ms 170.47 ms]
                        change: [-20.280% -14.764% -9.0255%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low mild

Benchmarking generate_large_mpp_routes_with_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 80.6s, or reduce sample count to 10.
generate_large_mpp_routes_with_probabilistic_scorer
                        time:   [649.67 ms 708.47 ms 766.42 ms]
                        change: [-16.700% -5.7015% +7.0523%] (p = 0.37 > 0.05)
                        No change in performance detected.

Benchmarking generate_routes_with_nonlinear_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.9s, or reduce sample count to 20.
generate_routes_with_nonlinear_probabilistic_scorer
                        time:   [160.20 ms 166.87 ms 173.08 ms]
                        change: [-10.685% -5.8941% -1.0357%] (p = 0.03 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) low mild

Benchmarking generate_mpp_routes_with_nonlinear_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17.1s, or reduce sample count to 20.
Benchmarking generate_mpp_routes_with_nonlinear_probabilistic_scorer: Collecting 100 samples in estimated 17.120 s (100 iteratigenerate_mpp_routes_with_nonlinear_probabilistic_scorer
                        time:   [166.54 ms 173.06 ms 179.44 ms]
                        change: [-5.5326% -0.2423% +5.7249%] (p = 0.93 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  4 (4.00%) high mild

Benchmarking generate_large_mpp_routes_with_nonlinear_probabilistic_scorer: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 79.9s, or reduce sample count to 10.
Benchmarking generate_large_mpp_routes_with_nonlinear_probabilistic_scorer: Collecting 100 samples in estimated 79.915 s (100 igenerate_large_mpp_routes_with_nonlinear_probabilistic_scorer
                        time:   [649.03 ms 720.50 ms 791.29 ms]
                        change: [-21.702% -9.3020% +4.8850%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

@TheBlueMatt TheBlueMatt force-pushed the 2024-06-routing-counters branch from 7bbac2e to 3db5f1e Compare June 19, 2024 14:58
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to do another closer pass on the routing logic updates.

In general, I'm still not the biggest fan of embedding these internal counters into publicly exposed datatypes that users might use differently than expected by us. If we could find a way to further encapsulate the node counter logic and leave the public types untouched, it would be preferable IMO.

I also ran some benchmarks, which however paint a pretty clear picture (so far, the nonlinear case is still running). On an older Linux Server (Intel Xeon E3-1226 v3 @ 3.30Ghz), with 10000 samples:

Benchmarking generate_routes_with_zero_penalty_scorer: Collecting 10000 samples in estimated 1032.3 s (10k iterations)
Benchmarking generate_routes_with_zero_penalty_scorer: Analyzing
generate_routes_with_zero_penalty_scorer
                        time:   [100.72 ms 101.46 ms 102.20 ms]
                        change: [+14.158% +15.781% +17.449%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 200 outliers among 10000 measurements (2.00%)
  200 (2.00%) high mild

Benchmarking generate_mpp_routes_with_zero_penalty_scorer: Collecting 10000 samples in estimated 1032.9 s (10k iterations)
Benchmarking generate_mpp_routes_with_zero_penalty_scorer: Analyzing
generate_mpp_routes_with_zero_penalty_scorer
                        time:   [100.76 ms 101.61 ms 102.48 ms]
                        change: [+11.333% +12.728% +14.099%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 200 outliers among 10000 measurements (2.00%)
  200 (2.00%) high mild

Benchmarking generate_routes_with_probabilistic_scorer: Collecting 10000 samples in estimated 1427.9 s (10k iterations)
Benchmarking generate_routes_with_probabilistic_scorer: Analyzing
generate_routes_with_probabilistic_scorer
                        time:   [141.39 ms 142.10 ms 142.80 ms]
                        change: [-55.456% -55.182% -54.911%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking generate_mpp_routes_with_probabilistic_scorer: Collecting 10000 samples in estimated 1365.2 s (10k iterations)
Benchmarking generate_mpp_routes_with_probabilistic_scorer: Analyzing
generate_mpp_routes_with_probabilistic_scorer
                        time:   [131.85 ms 132.59 ms 133.33 ms]
                        change: [-57.247% -56.991% -56.732%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking generate_large_mpp_routes_with_probabilistic_scorer: Collecting 10000 samples in estimated 3147.2 s (10k iterations)
Benchmarking generate_large_mpp_routes_with_probabilistic_scorer: Analyzing
generate_large_mpp_routes_with_probabilistic_scorer
                        time:   [352.87 ms 356.96 ms 361.08 ms]
                        change: [-73.539% -73.138% -72.724%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 200 outliers among 10000 measurements (2.00%)
  200 (2.00%) high mild

I think I'll rerun these over the weekend as they almost seem to good to be true.

///
/// These IDs allow the router to avoid a `HashMap` lookup by simply using this value as an
/// index in a `Vec`, skipping a big step in some of the hottest code when routing.
pub(crate) node_counter: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm still not too keen on embedding this index in the NodeInfo type itself, as it semantically clearly should live in the enclosing data structure.

As this is part of the public API and Clone, users might hang on to NodeInfos whose node_counter is stale as we might have dropped the node already from the network graph and reused by now. If we'd ever accept NodeInfo inputs to some API, we might read these stale counters. In fact, users might already run into this if they'd use our serialization logic in a manner we don't predict.

IIUC, a wrapper struct would allow us to clearly distinguish the LDK-internal accounting data (node_counter) from the user-facing data type that users may want to reuse independently of our current NetworkGraph state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we don't, however, expose the counters nor does LDK access them via NodeInfos which the user might hold. Thus, it seems like a stretch to be too concerned about it now, though I can see why its a bit awkward. Obviously if the router were to ever take NodeInfos from users we'd be screwed, but I don't see any reason why we'd ever do that. Worse, wrapping NodeInfos would mean we can't directly expose the graph's node map anymore, which would be a pretty annoying API change.

})
}

fn node_counter_from_pubkey(&mut self, pubkey: PublicKey) -> u32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the block height values are u32, all the fee values are u32, CLTVs, Timestamps are u32s, etc...

IMO we should start to introduce newtypes for a good part of the ubiquitous u64/u32 values internally, but especially when it comes to the API. Besides such instances as here, Amount / FeeRate might be very valuable to avoid any conversion errors, as discussed elsewhere before.

Another example would be to have a ShortChannelId data type that would hide the (somewhat internal/LDK-specific u64 representation) and could display the different kinds of SCIDs differently and allow encoding as BOLT7 BBBxTTTxOOO (without external helpers).


impl PartialEq for ChannelInfo {
fn eq(&self, o: &ChannelInfo) -> bool {
self.features == o.features &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, rather than exposing the IndexedMap directly (which, due to its custom nature doesn't play nice with a lot of the Rust ecosystem anyways), we might want to provide iterators over the underlying data directly, and possibly a few other accessors that would allow us to keep only exposing NodeInfo/ChannelInfo, while the internal representation includes the wrapper?

@TheBlueMatt
Copy link
Collaborator Author

In general, I'm still not the biggest fan of embedding these internal counters into publicly exposed datatypes that users might use differently than expected by us. If we could find a way to further encapsulate the node counter logic and leave the public types untouched, it would be preferable IMO.

I believe none of the new counters are public, otherwise I'd definitely agree with you :).

I think I'll rerun these over the weekend as they almost seem to good to be true.

50% from this alone does seem a bit high, but I wouldn't have been surprised to see that on the zero-penalty-scorer cases. The router does a ton of lookups in dist (one per heap pop, IIRC) and this change basically entirely removes those lookups. Now, the regressions that you show (and that were in my results) are fairly surprising, but I'm not all that worried anyway given no one actually uses that anyway.

@TheBlueMatt TheBlueMatt force-pushed the 2024-06-routing-counters branch 2 times, most recently from 4b34513 to 3654954 Compare June 28, 2024 19:25
@tnull
Copy link
Contributor

tnull commented Jul 9, 2024

The second round of benches seems consistent:

Benchmarking generate_routes_with_zero_penalty_scorer: Collecting 10000 samples in estimated 1085.9 s (10k iterations)
Benchmarking generate_routes_with_zero_penalty_scorer: Analyzing
generate_routes_with_zero_penalty_scorer
                        time:   [102.36 ms 103.06 ms 103.77 ms]
                        change: [+16.136% +17.439% +18.754%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 200 outliers among 10000 measurements (2.00%)
  200 (2.00%) high mild

Benchmarking generate_mpp_routes_with_zero_penalty_scorer: Collecting 10000 samples in estimated 1018.4 s (10k iterations)
Benchmarking generate_mpp_routes_with_zero_penalty_scorer: Analyzing
generate_mpp_routes_with_zero_penalty_scorer
                        time:   [95.104 ms 95.822 ms 96.534 ms]
                        change: [+13.495% +14.898% +16.293%] (p = 0.00 < 0.05)
                        Performance has regressed.

Benchmarking generate_routes_with_probabilistic_scorer: Collecting 10000 samples in estimated 1186.2 s (10k iterations)
Benchmarking generate_routes_with_probabilistic_scorer: Analyzing
generate_routes_with_probabilistic_scorer
                        time:   [122.36 ms 123.28 ms 124.20 ms]
                        change: [-57.518% -57.152% -56.754%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking generate_mpp_routes_with_probabilistic_scorer: Collecting 10000 samples in estimated 1388.4 s (10k iterations)
Benchmarking generate_mpp_routes_with_probabilistic_scorer: Analyzing
generate_mpp_routes_with_probabilistic_scorer
                        time:   [142.66 ms 143.32 ms 143.97 ms]
                        change: [-55.878% -55.593% -55.303%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 400 outliers among 10000 measurements (4.00%)
  400 (4.00%) low mild

Benchmarking generate_large_mpp_routes_with_probabilistic_scorer: Collecting 10000 samples in estimated 3897.9 s (10k iterations)
Benchmarking generate_large_mpp_routes_with_probabilistic_scorer: Analyzing
generate_large_mpp_routes_with_probabilistic_scorer
                        time:   [400.45 ms 405.30 ms 410.16 ms]
                        change: [-67.074% -66.557% -66.019%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1000 outliers among 10000 measurements (10.00%)
  600 (6.00%) high mild
  400 (4.00%) high severe

Benchmarking generate_routes_with_nonlinear_probabilistic_scorer: Collecting 10000 samples in estimated 1324.3 s (10k iterations)
Benchmarking generate_routes_with_nonlinear_probabilistic_scorer: Analyzing
generate_routes_with_nonlinear_probabilistic_scorer
                        time:   [134.10 ms 135.00 ms 135.89 ms]
                        change: [-55.028% -54.692% -54.343%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 400 outliers among 10000 measurements (4.00%)
  400 (4.00%) low mild

Benchmarking generate_mpp_routes_with_nonlinear_probabilistic_scorer: Collecting 10000 samples in estimated 1467.1 s (10k iterations)
Benchmarking generate_mpp_routes_with_nonlinear_probabilistic_scorer: Analyzing
generate_mpp_routes_with_nonlinear_probabilistic_scorer
                        time:   [145.04 ms 145.76 ms 146.48 ms]
                        change: [-49.909% -49.575% -49.270%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking generate_large_mpp_routes_with_nonlinear_probabilistic_scorer: Collecting 10000 samples in estimated 5201.3 s (10k iterations)
Benchmarking generate_large_mpp_routes_with_nonlinear_probabilistic_scorer: Analyzing
generate_large_mpp_routes_with_nonlinear_probabilistic_scorer
                        time:   [404.36 ms 410.29 ms 416.29 ms]
                        change: [-69.657% -69.115% -68.601%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 600 outliers among 10000 measurements (6.00%)
  400 (4.00%) high mild
  200 (2.00%) high severe

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, in general I'm still not the biggest fan of introducing this additional complexity and adding private fields to public datatypes (which has several drawbacks/consequences, such as prohibiting that the types are constructed outside of the crate's context, that we'd ever add an API taking them as an argument, etc.).

That said, the performance benefits shown by the benchmarks seem significant, and the changes here look good to me I think. I'll let @valentinewallace have another look, but happy to ACK from my side.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to squash.

}
})
.collect::<Vec<_>>();
match &payment_params.payee {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to separate these checks into a method, given that get_route is ~1200 lines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIce

In the next commit we'll stop using `NodeId`s to look up nodes when
routing, instead using the new per-node counters. Here we take the
first step, adding a local struct which tracks temporary counters
for route hints/source/destination nodes.

Because we must ensure we have a 1-to-1 mapping from node ids to
`node_counter`s, even across first-hop and last-hop hints, we have
to be careful to check the network graph first, then a new
`private_node_id_to_node_counter` map to ensure we only ever end up
with one counter per node id.
With the new `NodeCounters` have have a all the `NodeId`s we'll
need during routing, so there's no need to keep the
`private_hop_key_cache` which existed to provide references to
`NodeId`s which are needed during routing.
The router's `introduction_node_id_cache` is used to cache the
`NodeId`s of blinded path introduction points so that we don't
have to look them up every time we go around the main router loop.
When using it, if the introduction point isn't a public node we
then look up the introduction in our first-hops map.

In either case, we have to end up with a reference to a `NodeId`
that outlives our `dist` map.

Here we consolidate both the initial cache building and the
first-hops map lookup to one place, storing only a reference to a
`NodeId` either in the `NetworkGraph` or in the new `NodeCounters`
to get the required lifetime without needing to reference into the
first-hops map.

We then take this opportunity to avoid `clone`ing the first-hops
map entries as we now no longer reference into it.
Now that we have unique, dense, 32-bit identifiers for all the
nodes in our network graph, we can store the per-node information
when routing in a simple `Vec` rather than a `HashMap`. This avoids
the overhead of hashing and table scanning entirely, for a nice
"simple" optimization win.
Now that `PathBuildingHop` is stored in a `Vec` (as `Option`s),
rather than `HashMap` entries, they can grow to fill a full two
cache lines without a memory access performance cost. In the next
commit we'll take advantage of this somewhat, but here we update
the assertions and drop the `repr(C)`, allowing rust to lay the
memory out as it wishes.
This marginally reduces the size of `get_route` by moving a the
blinded path introduction point resolution and blinded path checks
into a helper method.
@TheBlueMatt TheBlueMatt force-pushed the 2024-06-routing-counters branch from 3654954 to 4f5e17b Compare July 10, 2024 01:32
@TheBlueMatt
Copy link
Collaborator Author

Squashed without change and then added one more commit:

$ git diff-tree -U1 3654954e 4f5e17b5^1
$ 

@valentinewallace valentinewallace merged commit 78c0eaa into lightningdevkit:main Jul 10, 2024
12 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants