-
Notifications
You must be signed in to change notification settings - Fork 411
Add probing_diversity_penalty_msat
to the scorer parameters
#3422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add probing_diversity_penalty_msat
to the scorer parameters
#3422
Conversation
Needs a test, should otherwise work, plan to start using this asap tho. |
Hmm. I think we had previously discussed this and I expressed the need for path diversity in background probing to explore the wider network. However, I'm not sure why this should be a parameter on our regular scorer we use to send payments. Shouldn't this rather be handled by target/path selection of the background probing software? |
bc8a318
to
3fe0c91
Compare
I'm not sure exactly how we'd build it as a separate router. I think "just do the normal thing but try to explore backup options" seems like a pretty good strategy. Doing that as a separate router/scorer would be kinda hard - it really needs access to the internal scorer fields and scorer support.
It still would be. No sensible use of the scorer would ever set this parameter, you'd only ever set it when background probing. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3422 +/- ##
==========================================
+ Coverage 88.66% 90.00% +1.34%
==========================================
Files 149 151 +2
Lines 116893 130296 +13403
Branches 116893 130296 +13403
==========================================
+ Hits 103640 117273 +13633
+ Misses 10753 10463 -290
- Partials 2500 2560 +60 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3fe0c91
to
957f93f
Compare
Any further thoughts here, @tnull? |
/// will be applied. | ||
/// | ||
/// It obviously does not make sense to assign a non-0 value here unless you are using the | ||
/// pathfinding result for background probing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way of probing where you set this exploration/exploitation trade-off is just one way to do it, isn't it?
From https://lightningdevkit.org/probing:
Recall that earlier, we discussed the concept of using multiple scorers. One scorer that we actually save results to and another un-updated scorer that can be used to identify routes with a "beginner's mind" - so to speak. If we'd like to leverage this approach, we can pass an un-updated scorer here and then update a different scorer with the results of the probe.
This is a different approach where no scores are used for probe route selection, which is different again from what is in the PR description (using a single scorer).
What's your current thinking on what's best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, certainly depends on your goal, but I think one model of probing I've kinda come around to is the idea of finding "backup routes" to destination you will likely pay in the future. ie its great when we repeatedly pay a destination and we have a route we always use that works basically all the time, but what happens when that route suddenly fails? Now we have no data on any other path because we're always paying over that same route.
This PR is certainly not a thorough way to get good "backup routes", of course, but with regular probing my gut says it should do at least an okay job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can follow your thoughts there, but I am quite uncomfortable with the level of gut feeling involved in these decisions. This is probably something that keeps coming back. In #2709 we were also guessing a bit at the best approach, and there are probably more areas where it's not very clear what the best approach or parameters are.
It does make me think more about the idea of setting up a test framework that involves a virtual network of nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, there's definitely a bit of a gut approach here, and a sim-framework might help, but the amount of gut-feelings that would go into building a sim-framework, I assume, would end up with more gut-result than even this :/. IMO we'd be better off A/B testing two nodes with identical channels to the same peers probing on the same interval and making test payments on the same interval to evaluate such things, but that requires a lot of time to get results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that is true. With a sim network though, I think the discussion will move to a higher level. We'd discuss whether the network is realistic enough, and parameters / probe methods / pathfinding algorithms will follow from that.
To me, A/B testing sounds even more prone to getting non-representative results and is probably too slow to scan parameter spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear that in a sim network the impact of guesses/gut-informed decisions is multiplicative (each guess compounds the effects of previous ones), but the only way to find out would be to build it and see what we get, plus to compare with reality (via some dataset...not sure what datset...). Maybe comparing with my probing dataset would work, when we shipped the new scorer in 0.1 a data scientist who works on c= helped run some analysis and at least found that our results were robust across a number of dimensions (channel size, etc), so there's at least some reason to think that those results are moderately representative of the broader network.
Oh grrrr, I forgot about the first commit here, we should backport that since it fixes build for some 32-bit platforms with newer rustc. |
How are you planning to use this? Using it sounds like testing as well, and perhaps that is a more functional test than the unit test in this PR. |
I've been running this patch on my personal node for a few weeks now. I haven't tried to do any analysis of whether it resulted in higher success rates, though. |
Do you think this is sufficient for merging what's in this PR? And, do you have thoughts on how that analysis could be carried out? |
I mean up to you and @tnull, but I think the maintenance cost of this code is relatively small (and we can drop it in the future if we have a better method of probing) and it seems unlikely to me that using the new penalty when doing regular (eg. minute-ly) probes would get you materially worse data - fundamentally you trade off gathering data for you "preferred" paths for paths you have less recent data for. This would be a bad tradeoff if you aren't gathering enough datapoints for paths you otherwise care about, but if you're doing focused probing relatively often you gather quite a few datapoints (eg probing the top 100 destinations that you pay to once a minute gets you 14 attempts to each desination per day), so ISTM this is a pretty obviously a useful feature at least for that kind of probing. Whether its the best way to do probing, probably not, but it seems pretty Obvious (TM) to me that its better than the current logic.
I'd have to A/B test, probably. |
/// pathfinding result for background probing. | ||
/// | ||
/// Specifically, the following penalty is applied | ||
/// `probing_diversity_penalty_msat * max(0, (86400 - current time + last update))^2 / 86400^2` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just talked offline with Joost a bit about this and during that I wondered what a good magic value would be for the penalty, i.e, from what threshold does it actually have an impact on diversity of the picked paths?
Could we maybe have the applied penalty be proportional to the amount of datapoints tracked (mayb in the last day)? This way, even if we'd start with the penalty having no impact on diversity and we'd probe the same path over and over again, it would accrue over time and eventually the applied penalty would be so big that it would have an impact, having us moving on to the next likely path at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just talked offline with Joost a bit about this and during that I wondered what a good magic value would be for the penalty, i.e, from what threshold does it actually have an impact on diversity of the picked paths?
I set mine to 100_000. 100 sats is about max of where fees lie anyway, so a penalty between 25 sats (for 12 hours) and 100 sats (absolute max) should have a material impact. In general, I'm not sure I understand your question - its the same type of penalty as any of our other penalties, and we have to pick penalties which appropriately prefer other paths which have lower fees, ie they need to be somehow proportional to fees on the network.
Could we maybe have the applied penalty be proportional to the amount of datapoints tracked (mayb in the last day)? This way, even if we'd start with the penalty having no impact on diversity and we'd probe the same path over and over again, it would accrue over time and eventually the applied penalty would be so big that it would have an impact, having us moving on to the next likely path at least?
We currently don't track any information that would allow us to do that, and tracking it would probably be nontrivial in size (eg a list of the last N times, as u64s, we got a result for each channel). I'm not entirely convinced that makes a difference, though - the theory here is that we probably don't need more results for a channel than once or twice a day if we're probing constantly over the course of months. Achieving that by calculating the number of times we probed a channel in the last day vs just looking at the last time we probed a channel should come out in the wash.
How about if you'd - on your node - probe the same destination let's say 20 times and do that with and without this patch? Perhaps this is exactly your A/B test suggestion, but not sure if you also wanted to simplify to probing just a single node without randomization. Looking at the paths taken with and without should give some confidence that paths are indeed diversified. We can merge this code, and then go from there. |
I started comparing paths between a scorer with and without this, and once it got going it picked a different path 4 out of the last 16 probes. I looked at a few of the hops on two of them and we generally preferred hops with less total data (I don't currently print the last-updated time). |
What are your thoughts on this result? If there isn't much time in between, I'd think that 12 duplicate probes is maybe not good enough? Or an indication that What do you mean by 'once it got going' by the way? |
My node is picking random nodes in the graph to probe and I have 55 active channels, so I imagine for many probes it will pick a path that I haven't explored in a while (in part because many nodes are hard to pay/don't have any inbound liquidity so it should constantly be trying new paths). To be clear, the thing I was looking at was just whether or not the path it picks is different with or without the new penalty, so none of these paths are repeated, rather we're just picking the same path with or without the penalty applied. On a slightly longer time horizon, this morning I saw 14/52 probes take a different path with double the penalty I'd originally applied (maxing out at 200 sats), which at least indicates that the penalty logic is working somewhat correctly. |
Thanks for that clarification. And good that you got at least some indication that it is working. We already established that it isn't exact science at the moment, and also that improving that situation probably requires a significant investment (test framework, robust a/b testing, etc). In that light, would you say let's merge this PR to at least open up for some experimentation? I do sense that it is worth being healthily cautious about future intuitive additions that further complicate the scorer. |
Yea, IMO we should land it and move on. We should give more thought to probing broadly and maybe take some testing approach things if we add more knobs for probing. For non-probing results we at least have a test framework now which is helpful. |
lightning/src/routing/scoring.rs
Outdated
} | ||
|
||
/// The amount of padding required to make [`ChannelLiquidity`] (plus a u64) a full 4 cache lines. | ||
const LIQ_PADDING_LEN: usize = (256 - ::core::mem::size_of::<(u64, DummyLiquidity)>()) / 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any idea on the effect size of this optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall exactly anymore, but the commit that introduced it (4fd07ec) said "a few percent".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it is worth it then, writing, reviewing, understanding and maintaining the code and making sure that it remains a win on all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay here, was re-running benchmarks to see what's up. I only have some pretty old hardware handy, but on two old Intel machines: one seems to see no change (-3% to +3% with an outlier at +8%) the other sees around 10% win by doing this (and some related prefetches I keep meaning to upstream, but which don't make much sense without the alignment) when doing linear scoring and relatively flat when doing nonlinear scoring. Though linear scoring isn't used very much, the performance gain there (which was actually robust, unlike the noise on the other machine, probably because its NUMA) is nontrivial and likely indicative of gains on other platforms (I don't have an arm, especially a relatively cheap arm, machine handy to benchmark on atm).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linear scoring - what does this mean exactly? That liquidity information is retrieved for channels in the order in which they are stored in the hashmap? Interested to hear where 'linear scoring' is happening in LDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core PDF in LDK is switchable - you can use the crazy M + (x - 0.5)^N thing or you can just use the naive Rene scoring. The naive Rene scoring sticks with integer math so exercises a different part of the CPU and can be a bit faster due to lack of float <> int conversions (ie the conversions and float math appears to be the bottleneck on the machine in question, rather than the memory loads, but once we remove that bottleneck, or if we used a faster machine, the memory loads become the bottleneck). When we added the nonlinear PDF we weren't particularly confident in it as we hadn't done much formal side-by-side testing of it, and thus made it switch-able. Now that we have we can maybe remove the option and only have nonlinear scoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are saying that when linear scoring is removed, non-linear floating point conversion takes over, and this cache optimization isn't significant anymore. Is it worth keeping then?
lightning/src/routing/scoring.rs
Outdated
read_tlv_fields!(r, { | ||
(0, channel_liquidities, required), | ||
}); | ||
let mut last_duration_since_epoch = Duration::from_secs(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that when there is no data, the first pathfinding operation works with time zero for all the penalty calculations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense, yea.
Rebased. |
e1f41be
to
40f6175
Compare
@TheBlueMatt given there is some back-and-forth here, mind splitting out the first commit into a dedicated PR to make sure it can land (and get backported) independently ASAP? |
There's back and forth on the first commit too :) |
Went ahead and addressed the build issues in 0.1 with #3610 so this doesn't need backported. Was gonna drop the first commit but realized the changes to the struct would also break the build, so easier to at least hash out whether we want the first commit or if we want to drop the assertions here and then move forward for 0.2. |
I'd think remove all the cache line optimization here if it breaks with the struct changes in this PR, and move reintroduction/discussion to a different PR? |
We generally expect `ChannelLiquidity` to be exactly three cache lines to ensure the first bytes we need are all one one cache line. This improves performance very marginally on some machines, but the assertions that this is true do not work on some Android 32-bit machines due to differing `Duration` sizes. Here we simply remove the assertions to fix build on platforms where the struct size isn't exactly on cache lines. This may marginally harm performance but it shouldn't be that critical. Fixes lightningdevkit#3415
40f6175
to
0dc5fae
Compare
Okay, yea, sorry, was checking if more strict alignment than what this PR did would actually show up in benchmarks and it really didn't, so went ahead and swapped the first commit for the one in #3610. We can revisit alignment later if we have benchmarks showing it helps much (but sadly I think the floating point performance is the limiting factor). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke about this PR offline and went through and settled the remaining discussions. Just two remarks about naming, non-blocking.
In the next commit we'll enable users to assign a penalty to channels which we don't have any recent information for to better enable probing diversity. In order to do so, we need to actually track the last time we received new information for a channel, which we do here.
When doing background probing, its important to get a diverse view of the network graph to ensure you have as many options when pathfinding as possible. Sadly, the naive probing we currently recommend users do does not accomplish that - using the same success-optimized pathfinder when sending as when probing results in always testing the same (good) path over and over and over again. Instead, here, we add a `probing_diversity_penalty_msat` parameter to the scorer, allowing us to penalize channels for which we already have recent data. It applies a penalty which scales with the square of the inverse time since the channel was last updated, up to one day.
0dc5fae
to
a245ca7
Compare
Squashed with only the two requested renames added. Figure we should just land this with one ack since this is all off-by-default stuff and not wildly complicated implementation-wise (low risk of imploding something), and at least tnull has looked at it. $ git diff-tree -U1 0dc5faeb3 a245ca733
diff --git a/lightning/src/routing/scoring.rs b/lightning/src/routing/scoring.rs
index 9627521ff..d88bce4f5 100644
--- a/lightning/src/routing/scoring.rs
+++ b/lightning/src/routing/scoring.rs
@@ -480,3 +480,3 @@ where L::Target: Logger {
/// decayed every liquidity bound up to that time.
- last_duration_since_epoch: Duration,
+ last_update_time: Duration,
}
@@ -934,3 +934,3 @@ struct ChannelLiquidity {
/// ignoring decays).
- last_datapoint: Duration,
+ last_datapoint_time: Duration,
}
@@ -945,3 +945,3 @@ struct DirectedChannelLiquidity<L: Deref<Target = u64>, HT: Deref<Target = Histo
offset_history_last_updated: T,
- last_datapoint: T,
+ last_datapoint_time: T,
}
@@ -957,3 +957,3 @@ impl<G: Deref<Target = NetworkGraph<L>>, L: Deref> ProbabilisticScorer<G, L> whe
channel_liquidities: ChannelLiquidities::new(),
- last_duration_since_epoch: Duration::from_secs(0),
+ last_update_time: Duration::from_secs(0),
}
@@ -1189,3 +1189,3 @@ impl ChannelLiquidity {
offset_history_last_updated: last_updated,
- last_datapoint: last_updated,
+ last_datapoint_time: last_updated,
}
@@ -1222,3 +1222,3 @@ impl ChannelLiquidity {
offset_history_last_updated: &self.offset_history_last_updated,
- last_datapoint: &self.last_datapoint,
+ last_datapoint_time: &self.last_datapoint_time,
}
@@ -1246,3 +1246,3 @@ impl ChannelLiquidity {
offset_history_last_updated: &mut self.offset_history_last_updated,
- last_datapoint: &mut self.last_datapoint,
+ last_datapoint_time: &mut self.last_datapoint_time,
}
@@ -1414,3 +1414,3 @@ DirectedChannelLiquidity< L, HT, T> {
fn penalty_msat(
- &self, amount_msat: u64, inflight_htlc_msat: u64, last_duration_since_epoch: Duration,
+ &self, amount_msat: u64, inflight_htlc_msat: u64, last_update_time: Duration,
score_params: &ProbabilisticScoringFeeParameters,
@@ -1497,9 +1497,8 @@ DirectedChannelLiquidity< L, HT, T> {
if score_params.probing_diversity_penalty_msat != 0 {
- // We use `last_duration_since_epoch` as a stand-in for the current time as we don't
- // want to fetch the current time in every score call (slowing things down
- // substantially on some platforms where a syscall is required), don't want to add an
- // unnecessary `std` requirement. Assuming we're probing somewhat regularly, it should
- // reliably be close to the current time, (and using the last the last time we probed
- // is also fine here).
- let time_since_update = last_duration_since_epoch.saturating_sub(*self.last_datapoint);
+ // We use `last_update_time` as a stand-in for the current time as we don't want to
+ // fetch the current time in every score call (slowing things down substantially on
+ // some platforms where a syscall is required), don't want to add an unnecessary `std`
+ // requirement. Assuming we're probing somewhat regularly, it should reliably be close
+ // to the current time, (and using the last the last time we probed is also fine here).
+ let time_since_update = last_update_time.saturating_sub(*self.last_datapoint_time);
let mul = Duration::from_secs(60 * 60 * 24).saturating_sub(time_since_update).as_secs();
@@ -1606,3 +1605,3 @@ DirectedChannelLiquidity<L, HT, T> {
*self.last_updated = duration_since_epoch;
- *self.last_datapoint = duration_since_epoch;
+ *self.last_datapoint_time = duration_since_epoch;
}
@@ -1616,3 +1615,3 @@ DirectedChannelLiquidity<L, HT, T> {
*self.last_updated = duration_since_epoch;
- *self.last_datapoint = duration_since_epoch;
+ *self.last_datapoint_time = duration_since_epoch;
}
@@ -1660,3 +1659,3 @@ impl<G: Deref<Target = NetworkGraph<L>>, L: Deref> ScoreLookUp for Probabilistic
let capacity_msat = usage.effective_capacity.as_msat();
- let time = self.last_duration_since_epoch;
+ let time = self.last_update_time;
self.channel_liquidities
@@ -1711,3 +1710,3 @@ impl<G: Deref<Target = NetworkGraph<L>>, L: Deref> ScoreUpdate for Probabilistic
}
- self.last_duration_since_epoch = duration_since_epoch;
+ self.last_update_time = duration_since_epoch;
}
@@ -1739,3 +1738,3 @@ impl<G: Deref<Target = NetworkGraph<L>>, L: Deref> ScoreUpdate for Probabilistic
}
- self.last_duration_since_epoch = duration_since_epoch;
+ self.last_update_time = duration_since_epoch;
}
@@ -1752,3 +1751,3 @@ impl<G: Deref<Target = NetworkGraph<L>>, L: Deref> ScoreUpdate for Probabilistic
self.channel_liquidities.time_passed(duration_since_epoch, self.decay_params);
- self.last_duration_since_epoch = duration_since_epoch;
+ self.last_update_time = duration_since_epoch;
}
@@ -2409,5 +2408,5 @@ ReadableArgs<(ProbabilisticScoringDecayParameters, G, L)> for ProbabilisticScore
let channel_liquidities = ChannelLiquidities::read(r)?;
- let mut last_duration_since_epoch = Duration::from_secs(0);
+ let mut last_update_time = Duration::from_secs(0);
for (_, liq) in channel_liquidities.0.iter() {
- last_duration_since_epoch = cmp::max(last_duration_since_epoch, liq.last_updated);
+ last_update_time = cmp::max(last_update_time, liq.last_updated);
}
@@ -2418,3 +2417,3 @@ ReadableArgs<(ProbabilisticScoringDecayParameters, G, L)> for ProbabilisticScore
channel_liquidities,
- last_duration_since_epoch,
+ last_update_time,
})
@@ -2435,3 +2434,3 @@ impl Writeable for ChannelLiquidity {
(9, self.offset_history_last_updated, required),
- (11, self.last_datapoint, required),
+ (11, self.last_datapoint_time, required),
});
@@ -2452,3 +2451,3 @@ impl Readable for ChannelLiquidity {
let mut offset_history_last_updated = None;
- let mut last_datapoint = None;
+ let mut last_datapoint_time = None;
read_tlv_fields!(r, {
@@ -2462,3 +2461,3 @@ impl Readable for ChannelLiquidity {
(9, offset_history_last_updated, option),
- (11, last_datapoint, option),
+ (11, last_datapoint_time, option),
});
@@ -2487,3 +2486,3 @@ impl Readable for ChannelLiquidity {
offset_history_last_updated: offset_history_last_updated.unwrap_or(last_updated),
- last_datapoint: last_datapoint.unwrap_or(last_updated),
+ last_datapoint_time: last_datapoint_time.unwrap_or(last_updated),
})
@@ -2661,3 +2660,3 @@ mod tests {
let offset_history_last_updated = Duration::ZERO;
- let last_datapoint = Duration::ZERO;
+ let last_datapoint_time = Duration::ZERO;
let network_graph = network_graph(&logger);
@@ -2668,3 +2667,3 @@ mod tests {
min_liquidity_offset_msat: 700, max_liquidity_offset_msat: 100,
- last_updated, offset_history_last_updated, last_datapoint,
+ last_updated, offset_history_last_updated, last_datapoint_time,
liquidity_history: HistoricalLiquidityTracker::new(),
@@ -2674,3 +2673,3 @@ mod tests {
min_liquidity_offset_msat: 700, max_liquidity_offset_msat: 100,
- last_updated, offset_history_last_updated, last_datapoint,
+ last_updated, offset_history_last_updated, last_datapoint_time,
liquidity_history: HistoricalLiquidityTracker::new(),
@@ -2741,3 +2740,3 @@ mod tests {
let offset_history_last_updated = Duration::ZERO;
- let last_datapoint = Duration::ZERO;
+ let last_datapoint_time = Duration::ZERO;
let network_graph = network_graph(&logger);
@@ -2748,3 +2747,3 @@ mod tests {
min_liquidity_offset_msat: 200, max_liquidity_offset_msat: 400,
- last_updated, offset_history_last_updated, last_datapoint,
+ last_updated, offset_history_last_updated, last_datapoint_time,
liquidity_history: HistoricalLiquidityTracker::new(),
@@ -2802,3 +2801,3 @@ mod tests {
let offset_history_last_updated = Duration::ZERO;
- let last_datapoint = Duration::ZERO;
+ let last_datapoint_time = Duration::ZERO;
let network_graph = network_graph(&logger);
@@ -2809,3 +2808,3 @@ mod tests {
min_liquidity_offset_msat: 200, max_liquidity_offset_msat: 400,
- last_updated, offset_history_last_updated, last_datapoint,
+ last_updated, offset_history_last_updated, last_datapoint_time,
liquidity_history: HistoricalLiquidityTracker::new(),
@@ -2915,3 +2914,3 @@ mod tests {
let offset_history_last_updated = Duration::ZERO;
- let last_datapoint = Duration::ZERO;
+ let last_datapoint_time = Duration::ZERO;
let network_graph = network_graph(&logger);
@@ -2929,3 +2928,3 @@ mod tests {
min_liquidity_offset_msat: 40, max_liquidity_offset_msat: 40,
- last_updated, offset_history_last_updated, last_datapoint,
+ last_updated, offset_history_last_updated, last_datapoint_time,
liquidity_history: HistoricalLiquidityTracker::new(), |
if score_params.probing_diversity_penalty_msat != 0 { | ||
// We use `last_update_time` as a stand-in for the current time as we don't want to | ||
// fetch the current time in every score call (slowing things down substantially on | ||
// some platforms where a syscall is required), don't want to add an unnecessary `std` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading this now, I am not so sure anymore that it is important to avoid std for probing specifically. It seems likely that probing is done only on servers that have std?
For attr failures we do a similar thing where we only report hold time data if the platform has std. This would avoid the extra disk writes, and make for a simpler changeset in general. Not just lines of code, but also reasoning about the time approximation. This comment basically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm incredibly skeptical of making exceptions for more and more use-cases. While its true that most active probers will be built with std
, I'd be pretty skeptical of calling it a guarantee. There's not much reason to think that some edge-nodes will never decide to do some probing to improve their graph knowledge, and even if we assume they never would we could also end up at some point in the future having forwarding nodes that happen to be running in a weird environment (eg some high-security high-funds node running in a secure enclave or so that might not like regular std SystemTime
).
We can cut some corners here and there but diverging too much in terms of functionality between std
and not really dilutes the value of LDK, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative approach is to implement the minimum that is needed today for probing and don't support non-std probing diversity yet. Then if any of the potential requirements that you list above materializes, add it then. In the case of this PR, it wouldn't increase the total amount of work much. In this PR you'd just fetch time, and then in the potential future non-std extension, delete that one line and add the time derivation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd much rather do it right the first time than have a user complain that something isn't working in their build and have to rush out a release to get it working. If its a ton of complexity or not practical its one thing, but its fairly straightforward here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two non-blocking naming nits, otherwise LGTM
|
||
/// The last time when the liquidity bounds were updated with new payment information (i.e. | ||
/// ignoring decays). | ||
last_datapoint_time: Duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name this last_datapoint_duration_since_epoch
to clarify the format since Duration
is a ubiquitous type that doesn't tell us how time is measured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, I just realized this is exactly what @joostjager had you change the variable names from in his naming nits above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I figured _time
was pretty clear (what else could it possibly be?)
@@ -476,6 +476,9 @@ where L::Target: Logger { | |||
network_graph: G, | |||
logger: L, | |||
channel_liquidities: ChannelLiquidities, | |||
/// The last time we were given via a [`ScoreUpdate`] method. This does not imply that we've | |||
/// decayed every liquidity bound up to that time. | |||
last_update_time: Duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, why not last_update_duration_since_epoch
?
Landing but will add test coverage in a trivial followup. |
Should backport the first commit to 0.1 if we do another point release. |
Backported in #3610 |
When doing background probing, its important to get a diverse view of the network graph to ensure you have as many options when pathfinding as possible.
Sadly, the naive probing we currently recommend users do does not accomplish that - using the same success-optimized pathfinder when sending as when probing results in always testing the same (good) path over and over and over again.
Instead, here, we add a
probing_diversity_penalty_msat
parameter to the scorer, allowing us to penalize channels for which we already have recent data. It applies a penalty which scales with the square of the inverse time since the channel was last updated, up to one day.