-
Notifications
You must be signed in to change notification settings - Fork 61
Fix Host header for proxy client requests #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -161,11 +161,14 @@ private function sendRequests(array $requests) | |||
$allRequests = array(); | |||
|
|||
foreach ($requests as $request) { | |||
$headers = $request->getHeaders(); | |||
// Force to re-create Host header for each proxy request | |||
unset($headers['Host']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems unconditional both whether host is non-empty and of the type of request.
looking at the tests, i think you should only unset the host if its actually empty. if it is set to something meaningful, like the tests do, we should not unset it. the idea of the HOST is to represent the general web application domain name, not whatever ip or system name the machine running varnish happens to have. if you have 2 varnishes on 2 ips, both serving a.com and b.com, you want to invalidate a.com on all machines, not 168.192.1.1. can you explain what exactly goes wrong? or should we somehow guess a host name if none is specified during the ban? |
Exactly, we need the Host header for differentiating between application hostnames.
Why is this a problem? An empty Host header is perfectly fine. According to the specs every request must contain a Host header. So when we don’t care about the host (when no host is configured in the client), we include the header but just leave it empty. |
Thanks for the review, I'll update the PR this morning.
An empty host header, at least on latest version of PHP, will make curl trigger invalid requests (with no host) because value of the provided header takes over the one provided in the URL of the server (2 different curl options set by Guzzle). So yes, a host header must always be present, but curl automatically adds it from the provided URL if not present. What is happening is that when ban requests are queued, host is provided. This makes Guzzle fill the host header with an empty string (parsed from URL). This sounds OK as this is not the request that will be actually sent. However, when building the final request to send, headers are copied, including the empty host, which prevents Guzzle to re-create it properly. That's why I propose to remove it. I couldn't run the tests locally. What should I do for this? Thanks |
Which PHPUnit version are you running? Do the tests work if you upgrade? |
I was using v3.7.20, I know use v4.2.5, but I still have errors (connection refused). I guess I need to start a Varnish instance or something ? |
The VarnishTestCase should start a Varnish server for you, based on values provided in your phpunit.xml. |
0fc037c
to
47c1c54
Compare
Just updated the patch according to review remarks. |
OK, I think I found out what is the difference between the use case tested and what I'm using. In my code (will link my PR on eZ Publish shortly), I don't have a base_url (I don't know it). Hence I expect my BAN requests to be sent out to the Varnish servers directly. This is what the code in foreach ($this->servers as $server) {
$proxyRequest = $this->client->createRequest(
$request->getMethod(),
$server . $request->getResource(),
$headers
);
$allRequests[] = $proxyRequest;
} Here we create a new Guzzle request based on the previous one, but with a new URL, based on the reverse proxy server. If you have a base URL, I assume it will work even if the code is misleading, since your reverse proxy is supposed to be in front of your frontend server. So the request will be caught by the reverse proxy. However, this piece of code suggests that we send the BAN/PURGE request to the reverse proxy directly (new URL), which can't happen since we still have the previous I hope I'm clear enough 😜 |
As promised, here is the implementation in eZ Publish (still WIP) : ezsystems/ezpublish-kernel#1025 |
sorry for being unresponsive. i just read through all this and try to understand:
are you saying that when there is no base_url, the request that is queued with the call to i still don't see the use case for that, as the varnish configuration won't care about Host, only x-host. and i guess if you have a firewall that prevents random access from outside, you would rather not go through that anyways, as i would tell that firewall to not even accept any BAN request from untrusted sources. for purge, things do not look better, as your varnish configuration would need to translate the varnish system name to the website Host name. purge with Host 192.168.0.1 is not the same as purge with Host "example.com". hm, i hope i managed to explain what i don't understand. do you see where i think in the wrong way? i don't see much harm in the PR as you propose it, but also no benefit - but i want to understand what we are doing before merging ;-) |
hm, and re-reading the discussion: is the whole problem that guzzle/curl has a hickup when there is an empty Host header, even though varnish on the other end would not care about it? can you provide a test that fails without this PR? could we also put "42.net" as Host and the problem would be fixed too? |
Yes, when you don't have any When you use PURGE requests, it's completely different since then you must have a base URL for such requests to work properly.
Yes, I'll try to do that today.
No, then BAN requests will be sent to |
okay, i think we are getting there. so you say even though we create the request again, passing the varnish ip as target, curl uses the the tests we do in this bundle all run on the same machine so this is not visible. we could try the oposite however: invent a host name that is not defined. the request should go to the varnish ip even though the dns would not work. |
Yes, this is the issue. I'm currently working on a functional test for Varnish. Just trying to understand how I can run functional tests 😉 |
OK, I found that actually the request is sent by cURL as is, with empty host and full URL, but it's then up to the server receiving the request to accept this request or not. It seems that Varnish is always accepting it, so I cannot make it fail in a functional test.
So Apache is stricter than Varnish in that case. But anyway, we shouldn't have an empty host as this is against HTTP 1.1 specification. I'll try checking with a unit test instead. |
f74a562
to
9e1bc7d
Compare
Test added |
9e1bc7d
to
596a734
Compare
alright, now it makes sense to me :-) thanks for your patience. are you sure your fix is all we need? i think if you can not know the base url, purge/refresh won't work at all - depending on how you do your cache keys in the proxy. should we validate something somewhere to tell the user he is going to have a problem? or can we leave this for now as is? guzzle ending up with an empty Host header feels like an oddity of guzzle, btw. i would have expected it to not set a Host header at all in that case. |
@dbu AFAIK, my patch is all we need. We don't have the use case of simple PURGE requests in eZ, but anyway, if one wants to do that without base URL, FOSHttpCache already sends an exception. So I think we're good.
I have mixed feelings about this, since the culprit is clearly the ProxyClient which is wrongly setting a host. Maybe Guzzle could check if it's empty, but well... |
Oh and one more thing : If once merged you can tag a new maintenance release, it would be awesome 😃 |
@ddeboer i am ok to merge this and would consider it a bugfix. fine for you or do you have concerns? |
i will wait for feedback from the other david until tomorrow. sure, i will tag once merged. @lolautruche when you mention apache and the symfony proxy: do you have |
@dbu This is planned to contribute, yes. However it will probably not cover 100% of the scope, like tagging. |
It seems this PR has suffered from some misunderstanding. So just to clear things up:
Is this right? |
if i read lolatruche correctly here he says that curl sends a completely empty Host header which is not well received by apache. reading the spec, apache is probably wrong to refuse the request (or not consider himself a proxy). its a bit bizarre to say the Host can be empty but a proxy must fill in the Host if it was empty... but maybe actually curl gets confused and removes the Host header at some point - lolatruche did you ever dump the actual traffic that goes over the network (wireshark or such) to see what really happens? but either way, i think at this point we need to do this change. when we are exactly sure what happens, we should put an explanation in the code so we remember why exactly we do this. |
According to HTTP 1.1 specification (section 14.23), a But the request correctly arrives to the Apache Server (the error I mentioned is from Apache log), so it might wrongly sends a 400 error. This PR indeed fixes the problem to the source, removing the ambiguity of the empty host. |
@dbu No I didn't do traffic dump. Not sure how to use Wireshark 😊 |
I read this differently:
reading this, the empty Host does make some sense for BAN, as indeed we do not care about a single host (there is X-Host for the regular expression pattern). so really this is either a workaround for a curl bug not sending the empty Host header or an Apache bug not accepting the empty Host header. |
@@ -161,11 +161,16 @@ private function sendRequests(array $requests) | |||
$allRequests = array(); | |||
|
|||
foreach ($requests as $request) { | |||
$headers = $request->getHeaders()->toArray(); | |||
// Force to re-create Host header if empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :-)
When using `ban` requests with Varnish proxy client, requests are queued and sent with `flush()`. Queued requests are registered without host, making Guzzle create an empty `Host` header. When flushing, Guzzle request is re-created starting with the original one (so without host), with all registered headers. The problem is that the empty `Host` header is also copied, and thus not re-created by Guzzle with the server URL. In the backend, cURL sees it and takes it into account, overriding passed URL silently. This causes requests with empty host being fired by Guzzle, which is kind of annoying 😃.
596a734
to
25a7a49
Compare
Fix Host header for proxy client requests
thanks a lot for your patience. tagging the bugfix version right now. |
Thanks! |
When using
ban
requests with Varnish proxy client, requests are queued and sent withflush()
. Queued requests are registered without host, making Guzzle create an emptyHost
header.When flushing, Guzzle request is re-created starting with the original one (so without host), with all registered headers. The problem is that the empty
Host
header is also copied, and thus not re-created by Guzzle with the server URL. In the backend, cURL sees it and takes it into account, overriding passed URL silently.This causes requests with empty host being fired by Guzzle, which is kind of annoying 😃.
This patch ensures
Host
header is removed before sending the real request.