Skip to content

Add a bitcoin-io crate #2066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from
Closed

Conversation

TheBlueMatt
Copy link
Member

@TheBlueMatt TheBlueMatt commented Sep 13, 2023

EDIT posthumous: This PR was merged not closed, some GitHub bug led to it being marked as closed.

In order to support standard (de)serialization of structs, the
rust-bitcoin ecosystem uses the standard std::io::{Read,Write}
traits. This works great for environments with std, however sadly
the std::io module has not yet been added to the core crate.

Thus, in no-std, the rust-bitcoin ecosystem has historically
used the core2 crate to provide copies of the std::io module
without any major dependencies. Sadly, its one dependency,
memchr, recently broke our MSRV.

Worse, because we didn't want to take on any excess dependencies
for std builds, rust-bitcoin has had to have
mutually-exclusive std and no-std builds. This breaks general
assumptions about how features work in Rust, causing substantial
pain for applications far downstream of rust-bitcoin crates.

This is mostly done, I'm still finalizing the io::Error commit at the end to drop the core2 required dep in no-std, but its getting there. Would love further feedback on the approach or code-level review on these first handful of commits.

@TheBlueMatt TheBlueMatt force-pushed the master branch 2 times, most recently from 1a4b012 to b579940 Compare September 13, 2023 05:28
@tcharding
Copy link
Member

tcharding commented Sep 13, 2023

I wrote this while reviewing, posting here in case its useful to add. It explains the impl_write macro feature gating

// !(core2 || std) : crate::io::Write
// std             : std::io::Write
// core2 && !std   : core2::io::Write

Partial review only, will come back, thanks for working on this man, its been a long time itch huh.

Comment on lines 907 to 918
// Note that we must temporarily use std::io::Cursor here until libsecp256k1 converts to
// bitcoin_io.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on doing this yourself after this merges? If not can you please use the form // TODO: ... so I find it when I'm clearing todo's.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened the followup for removal at #2080. I may or may not get around to secp256k1 moving to bitcoin_io.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's secp256k1 using IO? rg [^a-z]io[^a-z] doesn't return anything relevant neither does rg core2

Also Cursor wasn't required here in the first place, just use &mut &[u8].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol oops, I misread the error message and confused the secp publickey type for the rust-bitcoin publickey type (why are there two identically named types here 😭 )

@tcharding
Copy link
Member

tcharding commented Sep 14, 2023

Finished going over it, all looking pretty sane to me.

Its draft still so this isn't code style review but if it is at all helpful you could get this to a state that its technically correct then I could take over and do the boring stuff of getting it to conform to rust-bitcoin style (whitespace, cut'n pasting docs from core/std etc). Just an offer to save you time, in no way meaning to step of your toes. I'm going to have to be reading core/std docs on the implemented traits anyways to do a deeper review.

TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Sep 14, 2023
We're working with rust-bitcoin to remove the `core2` dependency
at rust-bitcoin/rust-bitcoin#2066 but until
that lands and we can upgrade rust-bitcoin we're stuck with it. In
the mean time, we should still pass our MSRV tests.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Sep 15, 2023
We're working with rust-bitcoin to remove the `core2` dependency
at rust-bitcoin/rust-bitcoin#2066 but until
that lands and we can upgrade rust-bitcoin we're stuck with it. In
the mean time, we should still pass our MSRV tests.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Sep 15, 2023
We're working with rust-bitcoin to remove the `core2` dependency
at rust-bitcoin/rust-bitcoin#2066 but until
that lands and we can upgrade rust-bitcoin we're stuck with it. In
the mean time, we should still pass our MSRV tests.
@TheBlueMatt TheBlueMatt marked this pull request as ready for review September 18, 2023 21:51
@TheBlueMatt
Copy link
Member Author

Sorry for the delay here, I finished up the patchset. I ended up dropping the error::Error type entirely since its not really practical to get compatibility with the std one, but the io::Error type is pretty decent, I think.

@TheBlueMatt TheBlueMatt force-pushed the master branch 3 times, most recently from 75042df to baf0fc2 Compare September 18, 2023 22:20
Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK baf0fc26c198d9d238661cfa2a4b09febb0bfdfe

Although I'd like to see the package import suggestion used. All my other comments are just comments.

I'm guessing you mostly cut'n paste this from stdlib, right? What guided your choices about what trait methods to include? I'm just interested, for example why not read _to_string?

Nice PR to review, I glad it was not me working out the compiler errors that led to all the "Stop relying on ..." changes.

Thanks

Comment on lines 43 to 44

bitcoin_io = { version = "0.1", default-features = false }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bitcoin_io = { version = "0.1", default-features = false }
io = { package = "bitcoin_io", version = "0.1", default-features = false }

Then we can:

  • use plain old "io" instead of "bitcoin_io" in the featuers above
  • remove use bitocin_io::io from lib.rs
  • remove all the use crate::io statements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you choose to use this suggestion its a non-trivial amount of changes, I did it locally to prove it works, can push a branch for you if you want it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branch tmp-review-core2-pr on my tree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmm I guess we could. When I'd originally written this I had an error type too, and was thus matching the std layout. I'm open to changing it, but want to make sure we want to do that before I go down the path, indeed its a good bit of work to walk back. @apoelstra?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the tradeoff here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question is mostly do we have an io module at the top level of this crate, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly prefer crate renaming but it's not worth huge rewrites.

//!
//! Further, with the `core2` feature, if the `std` feature is not set, the crate traits will be
//! implemented for `core2`'s `io` traits as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have:

// Coding conventions
#![warn(missing_docs)]

But adding it causes loads of warnings, I can add it and all the extra docs after this merges if you like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was figuring we'd skip it here - the traits are identical to std, and either we copy the std docs or we have much worse docs than std, which I'm not a fan of either. I'd much rather just link to std and call it a day.


/// Standard I/O stream definitions which are API-equivalent to `std`'s `io` module. See
/// [`std::io`] for more info.
pub mod io {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, why the module? Is this because of how the crate is imported? (I removed the module to get the "package" dependency trick to work.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to replicate std::*, and it made much more sense when I also had an error module. Now that there's no error module I think I agree with you, but #2066 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'm totally fine to move forward with the module layout as it is.

}
}

macro_rules! define_errorkind {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice macro!

fn from_std(o: std::io::ErrorKind) -> ErrorKind {
match o {
$(std::io::ErrorKind::$kind => ErrorKind::$kind),*,
_ => ErrorKind::Other
Copy link
Member

@tcharding tcharding Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging this even though I don't see a better way to handle the non_exhaustive. This is a maintenance burden, if std::io::ErrorKind gets a new variant we are going to silently convert it to Other, even if there are multiple new variants. Said another way, in reality we are never going to update this line. It would be nice if there was a way to break the build if another variant was added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I can leave a TODO about including more any time we bump our MSRV, but indeed I'm not sure any better way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will get a warning when we bump MSRV so that's good but I also don't like it. Maybe we should have another hidden variant Uncategorized like std does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean...isn't that Other?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, Other is meant for user-defined errors, Uncategorized for internal errors (and having an out-of-date library is one)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so if we add an Uncategorized, what do we map that into when converting to std? We can't access the std Uncategorized as its marked unstable, so we'd have to make it Other, and then we're back to square one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point. I guess let's keep it as is, I hope we have generic error at some point anyway and then all of this is moot.

io/src/lib.rs Outdated
Ok(0) =>
return Err(ErrorKind::UnexpectedEof.into()),
Ok(len) => {
buf = &mut buf[len..];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have time to educate me, why does the stdlib implementation use tmp?

            Ok(n) => {
                let tmp = buf;
                buf = &mut tmp[n..];
            }

For what its worth, I rekon n is nicer than len :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely no idea. I definitely prefer len over the generic n :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is from time before NLL was a thing and nobody bothered to change it.

io/src/lib.rs Outdated
}
#[inline]
#[cfg(any(feature = "alloc", feature = "std"))]
fn read_to_end(&mut self, buf: &mut alloc::vec::Vec<u8>) -> Result<usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since stdlib's version has some heavy optimisations I wonder if we should just call through to it if the "std" feature is enabled?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 9a1c9e5:

I am actually a little tempted to go in the opposite direction, changing behavior from stdlib to add a max allocation size and return an error if we over-read. This function has always made me a little nervous (same with read_to_string) since it's a memory-DoS vector.

In either case, can wait for a followup PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I didn't think of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read the std implementations here cause it has a different license than us, but given our reads tend to be quite short, I'd be surprised if you could do much better than this. Maybe increase the read buffer to 4096.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an argument to be made that bitcoin&co should never call this method so perhaps it'd make sense to remove it completely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere that we read vectors we check the varint-encoded length and reject if it's too high.

AFAIK we don't use read_to_end or read_to_string anywhere in this library exactly because they are uncontrolled DoS vectors. It is true that you can simulate them to get a DoS anyway, but that's no reason for us to implement them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, we wrote a whole bunch of trait methods just to solve the DoS vector for consensus-encoded types, read_to_end is a missing piece.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_to_end is used in psbt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I already checked. But it's actually reimplemented there so the method is not needed. And still I wonder if we have a DoS vector. I wasn't able to find out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I dropped the commit which inlines it in Read rather than just having it in psbt/raw.rs.

@tcharding
Copy link
Member

You can rebase on master to pick up #2077 to fix CI.

/// users using those features. In any case, this crate's `io::Write` feature will be implemented
/// for the given type, even if indirectly.
#[cfg(not(any(feature = "core2", feature = "std")))]
macro_rules! impl_write {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dd8dd88:

Fine to wait for a followup PR, but all these macros should have invocations that "look like" the code that they generate. Rather than being lists of exprs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, yea, good point.

apoelstra
apoelstra previously approved these changes Sep 19, 2023
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK baf0fc26c198d9d238661cfa2a4b09febb0bfdfe

@TheBlueMatt
Copy link
Member Author

Happy to respond to all the comments later today, or can do so in a followup, up to y'all.

@apoelstra
Copy link
Member

For my part I'd prefer a followup since this is such a big PR. But I'm fine either way.

tcharding
tcharding previously approved these changes Sep 19, 2023
Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to ack and do as a follow up. We should ensure this doesn't get released to crates.io with the underscore in the crate name. Also I'd like to change (later) the dependency style to use "package" before release.

ACK baf0fc26c198d9d238661cfa2a4b09febb0bfdfe

@tcharding
Copy link
Member

Needs rebase to get green on the MSRV ci job.

@tcharding
Copy link
Member

tcharding commented Sep 22, 2023

I was just going through the Rust API guidelines and all our crates trying to work out how we get to v1.0.0 I realised that because the new io crate includes core2 in the public API we cannot stabalize it.

What do we want to do about that? I assume core2 is never going to get to v1.0.0.

ref: https://rust-lang.github.io/api-guidelines/necessities.html

@apoelstra
Copy link
Member

@tcharding I think it's fine to be 1.0 with the caveat that if you enable an unstable feature then you'll get unstable stuff.

@tcharding
Copy link
Member

And mark "core2" as unstable somewhere you mean? Or just assume its implied because of the version number of core2?

@apoelstra
Copy link
Member

@tcharding yeah, we'd put it in the README or something. We could name the feature core2_unstable and use some tricks to make sure compiling with core2 directly doesn't work, but TBH that feels like overkill.

@tcharding
Copy link
Member

tcharding commented Sep 24, 2023

Cool, well this is good to go in then. Let me know @TheBlueMatt if you want help doing the docs and trivial clean up, happy to help but I don't want to tread on your toes.

EDIT: After rebase

@TheBlueMatt
Copy link
Member Author

Okay, squashed and rewrote the commit messages where relevant.

In the next commit, `std::io::Error` will be replaced with a
different type, and as a result `std::io::Error` must be referred
to explicitly.
In order to move towards our own I/O traits in the `rust-bitcoin`
ecosystem, we have to slowly replace our use of the `std` and
`core2` traits.

This is the final step in removing the explicit `core2` dependency
for I/O in no-std - replacing the `io::Error` type with our own.

Sadly the `std::io::Error` type requires `std::error::Error` as a
bound on the inner error, which is rather difficult to duplicate in
a way that allows for mapping to `std` and back.

To take a more general approach, we use bound on any `Debug`
instead.
tcharding
tcharding previously approved these changes Nov 7, 2023
Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d6b33bae4115ca4d86796e129579caad5441a4db

@tcharding
Copy link
Member

Thanks man, this is sick.

@tcharding
Copy link
Member

Need to remove the memchr pin from contrib/test.sh.

@TheBlueMatt
Copy link
Member Author

Done

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK add371d

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK add371d

@tcharding tcharding changed the title Add a bitcoin_io crate Add a bitcoin-io crate Nov 17, 2023
@tcharding
Copy link
Member

@Kixunil can we merge this but not release the crate, and then get fix up your concerns before we release it? I'm super keen to get hacking on top of this.

@apoelstra
Copy link
Member

I'm also super keen to merge this before it needs a rebase :)

Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK add371d

While there are thing I hope will change, removing core2 is a huge improvement.

apoelstra added a commit that referenced this pull request Nov 19, 2023
add371d Remove `core2` dependency entirely (Matt Corallo)
b7dd16d [IO] Use our own io::Error type (Matt Corallo)
c95b593 Explicitly use `std::io::Error` when implementing `std` traits (Matt Corallo)
9e1cd37 Use `io::Error::get_ref()` over `std::error::Error::source()` (Matt Corallo)
3caaadf [IO] Replace the `io::Cursor` re-export with our own `Cursor` (Matt Corallo)
141343e [IO] Move to custom `Read` trait mirroring `std::io::Read` (Matt Corallo)
7395093 Stop relying on `Take`'s `by_ref` method (Matt Corallo)
2364e1a Stop relying on blanket Read impl for all &mut Read (Matt Corallo)
6aa7ccf [IO] Replace `std::io::Sink` usage with our own trivial impl (Matt Corallo)
7eb5d65 [IO] Provide a macro which implements `io::Write` for types (Matt Corallo)
ac678bb [IO] Move to custom `Write` trait mirroring `std::io::Write` (Matt Corallo)
5f2395c Add missing `?Sized` bounds to `io::Write` parameters (Matt Corallo)
2348449 Stop relying on `std::io::Write`'s `&mut Write` blanket impl (Matt Corallo)
5e02095 Use `io::sink` rather than our custom `EmptyWrite` utility (Matt Corallo)
a0ade88 [IO] Move io module into selected re-exports (Matt Corallo)
27c7c4e Add a `bitcoin_io` crate (Matt Corallo)

Pull request description:

  In order to support standard (de)serialization of structs, the
  `rust-bitcoin` ecosystem uses the standard `std::io::{Read,Write}`
  traits. This works great for environments with `std`, however sadly
  the `std::io` module has not yet been added to the `core` crate.

  Thus, in `no-std`, the `rust-bitcoin` ecosystem has historically
  used the `core2` crate to provide copies of the `std::io` module
  without any major dependencies. Sadly, its one dependency,
  `memchr`, recently broke our MSRV.

  Worse, because we didn't want to take on any excess dependencies
  for `std` builds, `rust-bitcoin` has had to have
  mutually-exclusive `std` and `no-std` builds. This breaks general
  assumptions about how features work in Rust, causing substantial
  pain for applications far downstream of `rust-bitcoin` crates.

  This is mostly done, I'm still finalizing the `io::Error` commit at the end to drop the `core2` required dep in no-std, but its getting there. Would love further feedback on the approach or code-level review on these first handful of commits.

ACKs for top commit:
  tcharding:
    ACK add371d
  apoelstra:
    ACK add371d
  Kixunil:
    ACK add371d

Tree-SHA512: 18698ea8b1b65108ee0f695d5062d2562c8df2f50bf85d93442648da3b35a4184a5d5d2a493aed0adaadc83f663f0cd2ac735c34941cc9a6fa58d826e548e091
@apoelstra
Copy link
Member

This was merged by me; Matt had nothing to do with it. It's just a github bug showing that he closed it.

@tcharding
Copy link
Member

tcharding commented Nov 20, 2023

The additiveness is still not fully addressed.

Is this still an issue? I lost the discussion in this long thread.

Also this seems to be missing BufRead which is more important trait for parsing from IO (which we admittedly don't use yet and we should). I guess I could accept it as a temporary solution but not something I'm entirely satisfied with.

Since I pushed to get this PR merged I feel like its my responsibility to do the follow ups, mentioning it first incase you have the BufRead stuff queued up already @Kixunil? (Or anyone else :)

EDIT: Oh it will probably be part of #2179

@Kixunil
Copy link
Collaborator

Kixunil commented Nov 23, 2023

@tcharding I'm still a bit sick so if you have the time to try rebasing #2179 feel free to go ahead! Even if I manage to do some work I will likely focus on other things for a bit.

@tcharding
Copy link
Member

Sure thing man, can do!

@tcharding
Copy link
Member

Hey man, the flurry of action on io is in preparation for me working out how to rebase #2179 - I have to add a stripped down version of BufRead that just serves our purposes, right? (As we do in the new io crate for the Write/Read traits.) I looked at the stuff in lgio but it seems too thorough so I started to hack something together using stdlib, am I on the right track?

@Kixunil
Copy link
Collaborator

Kixunil commented Nov 28, 2023

Yeah, use the same error type as for Read and add some simple BufRead trait. fill_buf and consume might be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants