-
Notifications
You must be signed in to change notification settings - Fork 649
Replace curl
with reqwest
#1499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There's some minor differences in behavior here -- We're no longer setting the `Date` header, as clients aren't supposed to send it for requests without bodies (and even for post/put it's optional). We're also not manually setting the host header because there's literally no reason for us to be doing so.
This behavior was split into two functions so that one of the 5 places we're calling it from could intercept a 404 response code. Since the happy path otherwise is just calling `.json`, this felt really awkward to me. Instead I've opted to return `NotFound`, and downcast to it in the error path for that one place. I expected to just be able to call `Any::is`, but it turns out this method is an inherent method not a trait method (which makes sense, otherwise `Any` wouldn't be object safe). However, a side effect of that is that even though `Any` is a supertrait of `CargoError`, we can't call `Any::is` since `dyn CargoError` can't be cast to `dyn Any`. This may change at some point in the future (see rust-lang/rfcs#2035), but we have to duplicate the body of `Any::is` for now. Except we can't even just duplicate the body of `Any::is`, because the only trait method for `Any` is unstable so we have to duplicate that method, too.......... Other notable changes: - We're no longer sending `Proxy-Connection: Keep-Alive`. According to Uncyclopedia, this is "Implemented as a misunderstanding of the HTTP specifications. Common because of mistakes in implementations of early HTTP versions". Firefox recently removed it, as have recent versions of curl. - We will accept gzipped responses now. This is good. - We send an actual user agent, instead of "hello!"
Notable differences in behavior: - We now send "User-Agent: reqwest/0.9.1" to S3 - We no longer send "Proxy-Connection: Keep-Alive" (which was useless) - We now send "Accept-Encoding: gzip", which has no effect on these requests, as we do not receive any response body We are no longer manually specifying the `Date` header, since this is at best completely pointless, and at worse a latent bug waiting to crop up. We are still setting the `Date` header, as S3 requires either that header or `x-amz-date` when `Authorization` is set, since the date is part of the signature used. This commit has the largest effect from removing curl, as this was the oldest code using it, had accumulated the most cruft, and had an API most affected by its use. Going through this code was a fun jump through history. There were a few notable interesting things about the old code. The return type of `Transfer` appeared to be to work around some lifetime issues from taking a slice. Notably, `reqwest` actually requires a static lifetime on the request body (due to the API it wants to expose, and the fact that it's backed by tokio, it cannot deal with non-static lifetimes). Luckily, there were no callers that actually needed to pass a slice. My favorite part of this code though, was using a curl method that has no purpose other than to opt into `Expect: 100-continue`, and then explicitly overriding the `Expect:` header due to not understanding it's purpose. (For the record, `Expect: 100-continue` is used when sending the request body is "inappropriate or inefficient", and specifies that the server should reply with 100 CONTINUE before the client will send the request body). I'm actually surprised this interfered with the recorder. The HTTP/1.1 specification points out that older servers may not recognize the `Expect` header, and that clients should not wait an indefinite period of time before sending the request body.
And for our final replacement, we fix some code that never actually runs, and would fail if it did because running it requires credentials for two specific github users that none of us have! Hooray!
jtgeibel
approved these changes
Oct 12, 2018
bors: r+ |
bors-voyager bot
added a commit
that referenced
this pull request
Oct 12, 2018
1499: Replace `curl` with `reqwest` r=sgrif a=sgrif **The diff stats may look scary, but the vast majority of the added lines is because `reqwest` depends on `tokio`, which caused 448 lines to be added to `Cargo.lock`. The actual code changes are not particularly large** Why should we do this? --- `curl::Easy` is anything but. Its API is verbose, and consistently misunderstood in our codebase. These commits replace the `curl` crate with `reqwest`, which is designed to be a higher level HTTP client. It doesn't have every random nicety we might want (like setting `Date` on `POST`/`PUT` requests), but does have a ton of things we want built in like helpers for JSON request/response bodies that assumes the use of serde, and "turn 4xx/5xx response codes into an error" helper functions. At best the curl crate required some very verbose code, or confusing APIs to make it possible to work with stack data, but at worst it led to some code that was confusing or even dangerous. A few things that were weird or gave me pause while writing this were: - Setting the `Host` header manually, which is at best pointless and at worst a major latent bug waiting to strike. I suspect that the first instance of this was in the S3 code, which was done because `Host` is mentioned in the S3 docs for common headers for some reason, and the other places this was done were due to that code being copied from the S3 code - Specifically calling a CURL function that opted into `Expect: 100-Continue`, and then manually overriding the `Expect` header with a comment that it's not clear what it's purpose is - Setting the `Date` header on requests for no reason (which on requests with no body, is actually a violation of the HTTP/1.1 spec) - Choosing "hello!" as a User-Agent string for endpoints which require it What changes occur in our HTTP requests? ---- The `http-data` files are basically impossible to review, so here are the changes that were made: All requests have the following headers added: - `User-Agent: reqwest/0.9.1` - `Accept-Encoding: gzip` All requests had the following header removed: - `Proxy-Connection: Keep-Alive` - Uncyclopedia describes this header as "Implemented as a misunderstanding of the HTTP specifications. Common because of mistakes in implementations of early HTTP versions." It has been removed from Firefox and more recent versions of curl. Some requests had the following header removed: - `User-Agent: hello!` - The HTTP specifications states that all user agents SHOULD send this header. All requests that we make are a direct result of user interaction, making us a user agent (and it's pretty standard to expect this header even from non-user-agents like crawlers). Arguably we should be setting this to `crates.io`, but this is at least marginally useful unlike "hello!" which is not appropriate here. What notable changes happened in the code? --- For the most part, the code is just replacing manual stuff with built-in stuff. I've removed anything that set a `Date` or `User-Agent` header manually (unless `Date` was required, which is the case for S3, since the date is used as part of the auth signature). I've removed all cases that were manually setting a `Host` header, since if this ever differed from the URL we used to lookup the domain, it would be very bad. While making this change, a few refactorings occurred. The S3 code was taking a `file_length` parameter as a `u64`. This was expected to be the length of the bytes passed in, but... we have a slice so we can just ask it the length. Due to a combination of the API reqwest wants to expose, and the fact that its backed by tokio, it can only work with static data. This was only relevant for S3 uploads, but changing the relevant functions from taking `&[u8]` to `Vec<u8>` was a trivial change. The functions for interacting with the GitHub API were split into two pieces, one which made the request, and one which processed it. The only reason for this split was so that one consumer could intercept 404 responses and return `Ok(false)`. I've opted to merge all of this into a single function, and return `NotFound` for 404, which we can check for in that one place. I was expecting to use `Any::is` there, but due to the way its written, it required duplicating its implementation (see the commit message for that commit for more details) The only other point of note is that the changes to `GhUser::init` are not tested, since they only run if some files in this repo are not present, and for that code to run we'd need the passwords for two GitHub users that we don't have access to. I suspect we should probably just delete that code and panic if we don't have the file, but I haven't done so here. Co-authored-by: Sean Griffin <[email protected]>
Build succeeded |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The diff stats may look scary, but the vast majority of the added lines is because
reqwest
depends ontokio
, which caused 448 lines to be added toCargo.lock
. The actual code changes are not particularly largeWhy should we do this?
curl::Easy
is anything but. Its API is verbose, and consistently misunderstood in our codebase. These commits replace thecurl
crate withreqwest
, which is designed to be a higher level HTTP client. It doesn't have every random nicety we might want (like settingDate
onPOST
/PUT
requests), but does have a ton of things we want built in like helpers for JSON request/response bodies that assumes the use of serde, and "turn 4xx/5xx response codes into an error" helper functions.At best the curl crate required some very verbose code, or confusing APIs to make it possible to work with stack data, but at worst it led to some code that was confusing or even dangerous. A few things that were weird or gave me pause while writing this were:
Host
header manually, which is at best pointless and at worst a major latent bug waiting to strike. I suspect that the first instance of this was in the S3 code, which was done becauseHost
is mentioned in the S3 docs for common headers for some reason, and the other places this was done were due to that code being copied from the S3 codeExpect: 100-Continue
, and then manually overriding theExpect
header with a comment that it's not clear what it's purpose isDate
header on requests for no reason (which on requests with no body, is actually a violation of the HTTP/1.1 spec)What changes occur in our HTTP requests?
The
http-data
files are basically impossible to review, so here are the changes that were made:All requests have the following headers added:
User-Agent: reqwest/0.9.1
Accept-Encoding: gzip
All requests had the following header removed:
Proxy-Connection: Keep-Alive
Some requests had the following header removed:
User-Agent: hello!
crates.io
, but this is at least marginally useful unlike "hello!" which is not appropriate here.What notable changes happened in the code?
For the most part, the code is just replacing manual stuff with built-in stuff. I've removed anything that set a
Date
orUser-Agent
header manually (unlessDate
was required, which is the case for S3, since the date is used as part of the auth signature). I've removed all cases that were manually setting aHost
header, since if this ever differed from the URL we used to lookup the domain, it would be very bad.While making this change, a few refactorings occurred.
The S3 code was taking a
file_length
parameter as au64
. This was expected to be the length of the bytes passed in, but... we have a slice so we can just ask it the length. Due to a combination of the API reqwest wants to expose, and the fact that its backed by tokio, it can only work with static data. This was only relevant for S3 uploads, but changing the relevant functions from taking&[u8]
toVec<u8>
was a trivial change.The functions for interacting with the GitHub API were split into two pieces, one which made the request, and one which processed it. The only reason for this split was so that one consumer could intercept 404 responses and return
Ok(false)
. I've opted to merge all of this into a single function, and returnNotFound
for 404, which we can check for in that one place. I was expecting to useAny::is
there, but due to the way its written, it required duplicating its implementation (see the commit message for that commit for more details)The only other point of note is that the changes to
GhUser::init
are not tested, since they only run if some files in this repo are not present, and for that code to run we'd need the passwords for two GitHub users that we don't have access to. I suspect we should probably just delete that code and panic if we don't have the file, but I haven't done so here.