Added DeserializeFailure to decode error enum for better error mapping #142

SWvheerden · 2018-08-31T08:24:39Z

Added From for DecodeError implementation so that a serialized error can return an error and not panic from unwrap().
This should allow for greater error handling in future as well.

… van return an error and not panic

TheBlueMatt · 2018-08-31T14:15:24Z

I don't believe it's possible for a Sha256dHash deserialization from a 32-byte buffer to fail? Though the temporary_channel_id field should probably be copied directly instead of using rust-bitcoin's deserializer (I probably forgot to change it over when the type changed).

…

On August 31, 2018 8:28:12 AM UTC, SW van Heerden ***@***.***> wrote: Added From for DecodeError implementation so that a serialized error can return an error and not panic from unwrap(). This should allow for greater error handling in future as well. You can view, comment on, or merge this pull request online at: #142 -- Commit Summary -- * Added From<SerializeError> for DecodeError so that a seriliazed error van return an error and not panic * added extra enum of decoderror -- File Changes -- M src/ln/msgs.rs (16) M src/ln/peer_handler.rs (2) -- Patch Links -- https://github.com/rust-bitcoin/rust-lightning/pull/142.patch https://github.com/rust-bitcoin/rust-lightning/pull/142.diff -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: #142

SWvheerden · 2018-09-01T13:28:24Z

I agree deserialization from a buffer should not fail, but I have seen weird stuff happen before.

But I will submit a update copying the channel id rather than deserializing it.

…lise.

TheBlueMatt · 2018-09-03T21:00:43Z

Well in this case not just shouldn't, but in this case the implementation pretty clearly just calls read_u8 at https://github.com/rust-bitcoin/rust-bitcoin/blob/a61ad5d382c374b61fa6b111c5ca359c561a0120/src/network/encodable.rs#L67 in the loop at https://github.com/rust-bitcoin/rust-bitcoin/blob/a61ad5d382c374b61fa6b111c5ca359c561a0120/src/network/encodable.rs#L171, so unless RawDecoder fails to read from a Cursor (which it also can't), that unwrap is definitely unreachable. Adding return codes that are only used by actually-unreachable code isn't really all that more helpful than panic'ing, and given this is financial software, I generally feel more comfortable asserting things are behaving as they clearly should than not.

Obviously ACK the last commit, though.

SWvheerden · 2018-09-04T04:41:25Z

I agree on the concept that theoretically it cant fail. In practice it can thou as local memory can get corrupted like in 1 out of a million times, so not a lot, but best to cater for those. And with more users the frequency increases that one of them will get a corrupted memory bit.

This brings it down to the what should happen when it fails. I have not looked yet on the level above that code, to see how the program will react when panic'ing. Depending on how the threads are used in the node (you will have give insight I think), either the channel open will fail, which I believe is fine. Or the lightning node will go down, which should not happen.

You might be able to give out of hand which of the two cases will happen.

TheBlueMatt · 2018-09-04T14:16:17Z

Hmm, yea well we should think a bit more about how we want to handle crazy-error-conditions. As this is financial software, I generally lean very strongly towards abort(), as we'd infinitely rather crash and alert the user that their hardware is corrupting memory (which is a nontrivial problem on overheating RPis - go look at all the failure-to-sync issues on Bitcoin Core where hardware tests immediately fail) than try to do something to recover the situation. Of course rust's panic!() story when embedding as a system shared library is maybe not the best, but I haven't dug into what that would look like in eg an app. Plus obviously we need to indicate to users that they must be using panic=abort.

SWvheerden · 2018-09-05T05:46:23Z

I am not even talking about finicky or broken hardware. Just normal fully working pc's will also corrupt every now and then, it happens.

But coming back to error-conditions. I agree with you that we should be very strict in how errors are handled, but I believe we should avoid "crashing" the node.

Do you think it will be worth while if I deep dive into a proposal to see how and what we can do to the error's and how we should handle them?

TheBlueMatt · 2018-09-05T14:32:49Z

Hmm, I dunno, when I think about user stories around financial software, I'd think they are usually prefer a crash over potentially-unsafe continuation (assuming you have a watchtower, of course). Sure, it can happen randomly, but memory corruption or other errors are much more often a sign of overheating hardware or so than anything else, and if something unexpected happens that is either a sign of a bug in the program or a sign of potential corruption and continuation may be unpredictable, crashing seems way safter than almost anything else we could possibly come up with.

My only concern with that is ensuring crashing actually happens in rust in a intended-to-be-C-callable library is not obvious. In fact, the default is to just unwind and many libraries like tokio will just kill the thread instead of panic=abort. Worse still, if someone were to embed rust-lightning in a mobile app, will panic!() properly cross ffi boundaries and result in the app crashing (probably what we want) or will the user have to do something explicit to make sure that happens.

SWvheerden · 2018-09-06T05:35:19Z

I am not a big fan of panic in production software. My opinion is that panics going off is letting the app go into a free fall crash and not something we have complete control off. That thread might have stopped immediately, but that thread was working in conjunction with more than one thread locally and then more than one remotely on different machines in this case. This might cause another thread to do something its not suppose to or wat we want it to do.

The opinion I am off is that every error case( at least an attempt on this) should be mapped and catered to. That way we have complete control of what happens. On an example of using the library on an android app over which we have no control over (which is a good user story).

Steps	Panic	Internal error handling
1	open channel panics	open channel panics
2	app is suppressing error with try empty catch	error is logged and try to reestablish channel
3	app is programmed to believe everything is fine and try's to send funds	error is logged and channel is not opened
4	app can potentially do weird stuff	close down library, perhaps try soft closing or hard forcing app to reinitialise library(or restart)

SWvheerden · 2018-09-10T06:17:52Z

I am closing this one, because of #155.

SWvheerden added 3 commits August 30, 2018 10:23

Added From<SerializeError> for DecodeError so that a seriliazed error…

3bcee11

… van return an error and not panic

added extra enum of decoderror

a4fc681

added deserialize to fuzz testing as well.

73fd37d

changed temporary_channel_id to copy from message rather than deseria…

1289a10

…lise.

SWvheerden mentioned this pull request Sep 6, 2018

patch to change temp_channel from deserialise to copy #155

Closed

SWvheerden closed this Sep 10, 2018

SWvheerden deleted the from-error-wrapping-to-prevent-panic-on-deserialize branch September 11, 2018 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added DeserializeFailure to decode error enum for better error mapping #142

Added DeserializeFailure to decode error enum for better error mapping #142

Uh oh!

SWvheerden commented Aug 31, 2018

Uh oh!

TheBlueMatt commented Aug 31, 2018 via email

Uh oh!

SWvheerden commented Sep 1, 2018

Uh oh!

TheBlueMatt commented Sep 3, 2018

Uh oh!

SWvheerden commented Sep 4, 2018

Uh oh!

TheBlueMatt commented Sep 4, 2018

Uh oh!

SWvheerden commented Sep 5, 2018

Uh oh!

TheBlueMatt commented Sep 5, 2018

Uh oh!

SWvheerden commented Sep 6, 2018

Uh oh!

SWvheerden commented Sep 10, 2018

Uh oh!

Uh oh!

Added DeserializeFailure to decode error enum for better error mapping #142

Added DeserializeFailure to decode error enum for better error mapping #142

Uh oh!

Conversation

SWvheerden commented Aug 31, 2018

Uh oh!

TheBlueMatt commented Aug 31, 2018 via email

Uh oh!

SWvheerden commented Sep 1, 2018

Uh oh!

TheBlueMatt commented Sep 3, 2018

Uh oh!

SWvheerden commented Sep 4, 2018

Uh oh!

TheBlueMatt commented Sep 4, 2018

Uh oh!

SWvheerden commented Sep 5, 2018

Uh oh!

TheBlueMatt commented Sep 5, 2018

Uh oh!

SWvheerden commented Sep 6, 2018

Uh oh!

SWvheerden commented Sep 10, 2018

Uh oh!

Uh oh!