Skip to content

Added DeserializeFailure to decode error enum for better error mapping #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

SWvheerden
Copy link
Contributor

Added From for DecodeError implementation so that a serialized error can return an error and not panic from unwrap().
This should allow for greater error handling in future as well.

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Aug 31, 2018 via email

@SWvheerden
Copy link
Contributor Author

I agree deserialization from a buffer should not fail, but I have seen weird stuff happen before.

But I will submit a update copying the channel id rather than deserializing it.

@TheBlueMatt
Copy link
Collaborator

Well in this case not just shouldn't, but in this case the implementation pretty clearly just calls read_u8 at https://github.com/rust-bitcoin/rust-bitcoin/blob/a61ad5d382c374b61fa6b111c5ca359c561a0120/src/network/encodable.rs#L67 in the loop at https://github.com/rust-bitcoin/rust-bitcoin/blob/a61ad5d382c374b61fa6b111c5ca359c561a0120/src/network/encodable.rs#L171, so unless RawDecoder fails to read from a Cursor (which it also can't), that unwrap is definitely unreachable. Adding return codes that are only used by actually-unreachable code isn't really all that more helpful than panic'ing, and given this is financial software, I generally feel more comfortable asserting things are behaving as they clearly should than not.

Obviously ACK the last commit, though.

@SWvheerden
Copy link
Contributor Author

I agree on the concept that theoretically it cant fail. In practice it can thou as local memory can get corrupted like in 1 out of a million times, so not a lot, but best to cater for those. And with more users the frequency increases that one of them will get a corrupted memory bit.

This brings it down to the what should happen when it fails. I have not looked yet on the level above that code, to see how the program will react when panic'ing. Depending on how the threads are used in the node (you will have give insight I think), either the channel open will fail, which I believe is fine. Or the lightning node will go down, which should not happen.

You might be able to give out of hand which of the two cases will happen.

@TheBlueMatt
Copy link
Collaborator

Hmm, yea well we should think a bit more about how we want to handle crazy-error-conditions. As this is financial software, I generally lean very strongly towards abort(), as we'd infinitely rather crash and alert the user that their hardware is corrupting memory (which is a nontrivial problem on overheating RPis - go look at all the failure-to-sync issues on Bitcoin Core where hardware tests immediately fail) than try to do something to recover the situation. Of course rust's panic!() story when embedding as a system shared library is maybe not the best, but I haven't dug into what that would look like in eg an app. Plus obviously we need to indicate to users that they must be using panic=abort.

@SWvheerden
Copy link
Contributor Author

I am not even talking about finicky or broken hardware. Just normal fully working pc's will also corrupt every now and then, it happens.

But coming back to error-conditions. I agree with you that we should be very strict in how errors are handled, but I believe we should avoid "crashing" the node.

Do you think it will be worth while if I deep dive into a proposal to see how and what we can do to the error's and how we should handle them?

@TheBlueMatt
Copy link
Collaborator

Hmm, I dunno, when I think about user stories around financial software, I'd think they are usually prefer a crash over potentially-unsafe continuation (assuming you have a watchtower, of course). Sure, it can happen randomly, but memory corruption or other errors are much more often a sign of overheating hardware or so than anything else, and if something unexpected happens that is either a sign of a bug in the program or a sign of potential corruption and continuation may be unpredictable, crashing seems way safter than almost anything else we could possibly come up with.

My only concern with that is ensuring crashing actually happens in rust in a intended-to-be-C-callable library is not obvious. In fact, the default is to just unwind and many libraries like tokio will just kill the thread instead of panic=abort. Worse still, if someone were to embed rust-lightning in a mobile app, will panic!() properly cross ffi boundaries and result in the app crashing (probably what we want) or will the user have to do something explicit to make sure that happens.

@SWvheerden
Copy link
Contributor Author

I am not a big fan of panic in production software. My opinion is that panics going off is letting the app go into a free fall crash and not something we have complete control off. That thread might have stopped immediately, but that thread was working in conjunction with more than one thread locally and then more than one remotely on different machines in this case. This might cause another thread to do something its not suppose to or wat we want it to do.

The opinion I am off is that every error case( at least an attempt on this) should be mapped and catered to. That way we have complete control of what happens. On an example of using the library on an android app over which we have no control over (which is a good user story).

Steps Panic Internal error handling
1 open channel panics open channel panics
2 app is suppressing error with try empty catch error is logged and try to reestablish channel
3 app is programmed to believe everything is fine and try's to send funds error is logged and channel is not opened
4 app can potentially do weird stuff close down library, perhaps try soft closing or hard forcing app to reinitialise library(or restart)

@SWvheerden
Copy link
Contributor Author

I am closing this one, because of #155.

@SWvheerden SWvheerden closed this Sep 10, 2018
@SWvheerden SWvheerden deleted the from-error-wrapping-to-prevent-panic-on-deserialize branch September 11, 2018 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants