Skip to content

Fix Host header for proxy client requests #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 20, 2014

Conversation

lolautruche
Copy link
Contributor

When using ban requests with Varnish proxy client, requests are queued and sent with flush(). Queued requests are registered without host, making Guzzle create an empty Host header.

When flushing, Guzzle request is re-created starting with the original one (so without host), with all registered headers. The problem is that the empty Host header is also copied, and thus not re-created by Guzzle with the server URL. In the backend, cURL sees it and takes it into account, overriding passed URL silently.

This causes requests with empty host being fired by Guzzle, which is kind of annoying 😃.

This patch ensures Host header is removed before sending the real request.

@@ -161,11 +161,14 @@ private function sendRequests(array $requests)
$allRequests = array();

foreach ($requests as $request) {
$headers = $request->getHeaders();
// Force to re-create Host header for each proxy request
unset($headers['Host']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems unconditional both whether host is non-empty and of the type of request.

@dbu
Copy link
Contributor

dbu commented Oct 7, 2014

looking at the tests, i think you should only unset the host if its actually empty. if it is set to something meaningful, like the tests do, we should not unset it.

the idea of the HOST is to represent the general web application domain name, not whatever ip or system name the machine running varnish happens to have. if you have 2 varnishes on 2 ips, both serving a.com and b.com, you want to invalidate a.com on all machines, not 168.192.1.1.

can you explain what exactly goes wrong? or should we somehow guess a host name if none is specified during the ban?

@ddeboer
Copy link
Member

ddeboer commented Oct 7, 2014

Exactly, we need the Host header for differentiating between application hostnames.

This causes requests with empty host being fired by Guzzle, which is kind of annoying

Why is this a problem? An empty Host header is perfectly fine. According to the specs every request must contain a Host header. So when we don’t care about the host (when no host is configured in the client), we include the header but just leave it empty.

@lolautruche
Copy link
Contributor Author

Thanks for the review, I'll update the PR this morning.

Why is this a problem? An empty Host header is perfectly fine. According to the specs every request must contain a Host header. So when we don’t care about the host (when no host is configured in the client), we include the header but just leave it empty.

An empty host header, at least on latest version of PHP, will make curl trigger invalid requests (with no host) because value of the provided header takes over the one provided in the URL of the server (2 different curl options set by Guzzle). So yes, a host header must always be present, but curl automatically adds it from the provided URL if not present.

What is happening is that when ban requests are queued, host is provided. This makes Guzzle fill the host header with an empty string (parsed from URL). This sounds OK as this is not the request that will be actually sent. However, when building the final request to send, headers are copied, including the empty host, which prevents Guzzle to re-create it properly. That's why I propose to remove it.

I couldn't run the tests locally. What should I do for this? Thanks

@ddeboer
Copy link
Member

ddeboer commented Oct 8, 2014

Which PHPUnit version are you running? Do the tests work if you upgrade?

@lolautruche
Copy link
Contributor Author

I was using v3.7.20, I know use v4.2.5, but I still have errors (connection refused). I guess I need to start a Varnish instance or something ?

@ddeboer
Copy link
Member

ddeboer commented Oct 8, 2014

The VarnishTestCase should start a Varnish server for you, based on values provided in your phpunit.xml.

@lolautruche
Copy link
Contributor Author

Just updated the patch according to review remarks.

@lolautruche
Copy link
Contributor Author

OK, I think I found out what is the difference between the use case tested and what I'm using. In my code (will link my PR on eZ Publish shortly), I don't have a base_url (I don't know it). Hence I expect my BAN requests to be sent out to the Varnish servers directly. This is what the code in sendRequests() suggests:

foreach ($this->servers as $server) {
    $proxyRequest = $this->client->createRequest(
        $request->getMethod(),
        $server . $request->getResource(),
        $headers
    );
    $allRequests[] = $proxyRequest;
}

Here we create a new Guzzle request based on the previous one, but with a new URL, based on the reverse proxy server. If you have a base URL, I assume it will work even if the code is misleading, since your reverse proxy is supposed to be in front of your frontend server. So the request will be caught by the reverse proxy. However, this piece of code suggests that we send the BAN/PURGE request to the reverse proxy directly (new URL), which can't happen since we still have the previous Host header, which will override the one in the ban/purge URL.

I hope I'm clear enough 😜

@lolautruche
Copy link
Contributor Author

As promised, here is the implementation in eZ Publish (still WIP) : ezsystems/ezpublish-kernel#1025

@lolautruche
Copy link
Contributor Author

Gentle ping @dbu @ddeboer :)

@dbu
Copy link
Contributor

dbu commented Oct 13, 2014

sorry for being unresponsive. i just read through all this and try to understand:

  • lets assume a network with varnishes on 192.168.0.1 and 192.168.0.2 and apache on 192.168.0.3. we send invalidation requests to both varnishes from the apache machine
  • we send the request to both reverse proxies separately. $server is 192.168.0.1 resp .2 (or some locally resolvable hostname)
  • for a BAN, the Host should actually not matter, as varnish looks at the x-host which contains the pattern of host names to BAN.

are you saying that when there is no base_url, the request that is queued with the call to ban has an empty Host header? and you "hack" around this in the sendRequest to have curl set it to the ip/machine name where the reverse proxy is running?

i still don't see the use case for that, as the varnish configuration won't care about Host, only x-host. and i guess if you have a firewall that prevents random access from outside, you would rather not go through that anyways, as i would tell that firewall to not even accept any BAN request from untrusted sources.

for purge, things do not look better, as your varnish configuration would need to translate the varnish system name to the website Host name. purge with Host 192.168.0.1 is not the same as purge with Host "example.com".

hm, i hope i managed to explain what i don't understand. do you see where i think in the wrong way? i don't see much harm in the PR as you propose it, but also no benefit - but i want to understand what we are doing before merging ;-)

@dbu
Copy link
Contributor

dbu commented Oct 13, 2014

hm, and re-reading the discussion: is the whole problem that guzzle/curl has a hickup when there is an empty Host header, even though varnish on the other end would not care about it? can you provide a test that fails without this PR? could we also put "42.net" as Host and the problem would be fixed too?

@lolautruche
Copy link
Contributor Author

are you saying that when there is no base_url, the request that is queued with the call to ban has an empty Host header? and you "hack" around this in the sendRequest to have curl set it to the ip/machine name where the reverse proxy is running?

Yes, when you don't have any base_url (which is not required when using BAN), requests are sent with an empty host. And even if you have one, BAN requests will be sent to the wrong hosts, since Guzzle/Curl will take the Host header in priority, not the server URL we're in the patched method. It's not a matter of Varnish not knowing which host to play with, it's all about sending the requests to the right host(s). Without this patch requests are sent by Apache to the base_url, which is wrong since in your example, only one of the 2 varnishes might be purged.

When you use PURGE requests, it's completely different since then you must have a base URL for such requests to work properly.

can you provide a test that fails without this PR

Yes, I'll try to do that today.

could we also put "42.net" as Host and the problem would be fixed too?

No, then BAN requests will be sent to 42.net, that's the whole problem 😉

@dbu
Copy link
Contributor

dbu commented Oct 14, 2014

okay, i think we are getting there. so you say even though we create the request again, passing the varnish ip as target, curl uses the Host header instead? this would be quite a problem, as then it would be impossible to target each varnish server individually with purge / refresh requests.

the tests we do in this bundle all run on the same machine so this is not visible. we could try the oposite however: invent a host name that is not defined. the request should go to the varnish ip even though the dns would not work.

@lolautruche
Copy link
Contributor Author

so you say even though we create the request again, passing the varnish ip as target, curl uses the Host header instead? this would be quite a problem, as then it would be impossible to target each varnish server individually with purge / refresh requests.

Yes, this is the issue.

I'm currently working on a functional test for Varnish. Just trying to understand how I can run functional tests 😉

@lolautruche
Copy link
Contributor Author

OK, I found that actually the request is sent by cURL as is, with empty host and full URL, but it's then up to the server receiving the request to accept this request or not. It seems that Varnish is always accepting it, so I cannot make it fail in a functional test.
BUT, if we use another server (e.g. Apache with Symfony reverse proxy, that I was testing for eZ), Apache receives the requests, but rejects it saying:

client sent HTTP/1.1 request without hostname (see RFC2616 section 14.23): /

So Apache is stricter than Varnish in that case. But anyway, we shouldn't have an empty host as this is against HTTP 1.1 specification.

I'll try checking with a unit test instead.

@lolautruche lolautruche force-pushed the fixHostHeader branch 2 times, most recently from f74a562 to 9e1bc7d Compare October 14, 2014 10:11
@lolautruche
Copy link
Contributor Author

Test added

@dbu
Copy link
Contributor

dbu commented Oct 14, 2014

alright, now it makes sense to me :-) thanks for your patience.

are you sure your fix is all we need? i think if you can not know the base url, purge/refresh won't work at all - depending on how you do your cache keys in the proxy. should we validate something somewhere to tell the user he is going to have a problem? or can we leave this for now as is?

guzzle ending up with an empty Host header feels like an oddity of guzzle, btw. i would have expected it to not set a Host header at all in that case.

@lolautruche
Copy link
Contributor Author

@dbu AFAIK, my patch is all we need. We don't have the use case of simple PURGE requests in eZ, but anyway, if one wants to do that without base URL, FOSHttpCache already sends an exception. So I think we're good.

guzzle ending up with an empty Host header feels like an oddity of guzzle, btw. i would have expected it to not set a Host header at all in that case.

I have mixed feelings about this, since the culprit is clearly the ProxyClient which is wrongly setting a host. Maybe Guzzle could check if it's empty, but well...

@lolautruche
Copy link
Contributor Author

Oh and one more thing : If once merged you can tag a new maintenance release, it would be awesome 😃

@dbu
Copy link
Contributor

dbu commented Oct 14, 2014

@ddeboer i am ok to merge this and would consider it a bugfix. fine for you or do you have concerns?

@dbu
Copy link
Contributor

dbu commented Oct 14, 2014

i will wait for feedback from the other david until tomorrow.

sure, i will tag once merged.

@lolautruche when you mention apache and the symfony proxy: do you have
something working that you could contribute ? see
FriendsOfSymfony/FOSHttpCacheBundle#26

@lolautruche
Copy link
Contributor Author

@dbu This is planned to contribute, yes. However it will probably not cover 100% of the scope, like tagging.

@ddeboer
Copy link
Member

ddeboer commented Oct 15, 2014

It seems this PR has suffered from some misunderstanding. So just to clear things up:

  • Sending empty Host headers is no problem (according to HTTP specs).
  • But the way Guzzle sets cURL options causes the Host header to disappear completely (so not an empty header but no Host header at all), which goes against HTTP specs.
  • This PR fixes the above problem.

Is this right?

@dbu
Copy link
Contributor

dbu commented Oct 15, 2014

if i read lolatruche correctly here he says that curl sends a completely empty Host header which is not well received by apache.

reading the spec, apache is probably wrong to refuse the request (or not consider himself a proxy). its a bit bizarre to say the Host can be empty but a proxy must fill in the Host if it was empty...

but maybe actually curl gets confused and removes the Host header at some point - lolatruche did you ever dump the actual traffic that goes over the network (wireshark or such) to see what really happens?

but either way, i think at this point we need to do this change. when we are exactly sure what happens, we should put an explanation in the code so we remember why exactly we do this.

@lolautruche
Copy link
Contributor Author

@ddeboer

  • Sending empty Host headers is no problem (according to HTTP specs).
  • But the way Guzzle sets cURL options causes the Host header to disappear completely (so not an empty header but no Host header at all), which goes against HTTP specs.

According to HTTP 1.1 specification (section 14.23), a Host header must always be present. It must be empty if the URI contains the host.

But the request correctly arrives to the Apache Server (the error I mentioned is from Apache log), so it might wrongly sends a 400 error.

This PR indeed fixes the problem to the source, removing the ambiguity of the empty host.

@lolautruche
Copy link
Contributor Author

@dbu No I didn't do traffic dump. Not sure how to use Wireshark 😊

@dbu
Copy link
Contributor

dbu commented Oct 15, 2014

It must be empty if the URI contains the host.

I read this differently:

If the requested URI does not include an Internet host
name for the service being requested, then the Host header field MUST
be given with an empty value.

reading this, the empty Host does make some sense for BAN, as indeed we do not care about a single host (there is X-Host for the regular expression pattern). so really this is either a workaround for a curl bug not sending the empty Host header or an Apache bug not accepting the empty Host header.

@@ -161,11 +161,16 @@ private function sendRequests(array $requests)
$allRequests = array();

foreach ($requests as $request) {
$headers = $request->getHeaders()->toArray();
// Force to re-create Host header if empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add here something like "... if empty, as Apache chokes on this. See #128 for discussion."

then i think we just merge. ok @ddeboer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :-)

When using `ban` requests with Varnish proxy client, requests are queued
and sent with `flush()`. Queued requests are registered without host,
making Guzzle create an empty `Host` header.

When flushing, Guzzle request is re-created starting
with the original one (so without host), with all registered headers.
The problem is that the empty `Host` header is also copied, and thus not
re-created by Guzzle with the server URL. In the backend, cURL sees it
and takes it into account, overriding passed URL silently.

This causes requests with empty host being fired by Guzzle, which is
kind of annoying 😃.
dbu added a commit that referenced this pull request Oct 20, 2014
Fix Host header for proxy client requests
@dbu dbu merged commit 9c65814 into FriendsOfSymfony:master Oct 20, 2014
@dbu
Copy link
Contributor

dbu commented Oct 20, 2014

thanks a lot for your patience. tagging the bugfix version right now.

@lolautruche lolautruche deleted the fixHostHeader branch October 20, 2014 16:43
@lolautruche
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants