[squid-users] accelerated mode, feature request

From: Nathan Hand <nathanh@dont-contact.us>
Date: Tue, 29 May 2001 20:48:06 +1000

I have a problem with squid 2.4 serving as an accelerator for several sites.
I have multiple backend servers. Some of them have name based virtual hosts
but for the most part they have IP based virtual hosts sitting on IP aliases
or stand-alone servers. The servers all sit on a private network 10.0.0.0/24.

    server1: 10.0.0.1 hosting site1.com
    server2: 10.0.0.2 hosting site2.com
    server3: 10.0.0.3 hosting site3.com, site4.com, and site5.com

These sites are represented in the DNS as follows

    site1.com IN A 200.100.50.1
    site2.com IN A 200.100.50.2
    site3.com IN A 200.100.50.3
    site4.com IN CNAME site3.com
    site5.com IN CNAME site3.com

The accelerator has interface bindings for all three public addresses. As
an added complication these web servers are incredibly stupid and always
return the Host: header you supply. So if the cache accidentally requests

    GET http://10.0.0.3/directory HTTP/1.0
    Host: 10.0.0.3

Then the web server will return a 302 redirect to http://10.0.0.3/directory/
which the user can't get to. This can't be fixed on the web server due to
internal politics. The only to deal with it is to make requests of the form

    GET http://10.0.0.3/directory HTTP/1.0
    Host: site3.com

Then the web server returns the 302 redirect to http://site3.com/directory/
and everything works. In case you were wondering the web server never checks
the Host header and trusts anything I pass it.

Ok, so I've constructed the following cache configuration for squid. I
started with ./configure --disable-internal-dns. I then built the /etc/hosts
file on the accelerator as follows with nsswitch.conf using files only.

    10.0.0.1 site1.com
    10.0.0.2 site2.com
    10.0.0.3 site3.com site4.com site5.com

Squid configuration is

    acl myservers dst 10.0.0.1 10.0.0.2 10.0.0.3
    acl http protocol http
    acl port80 port 80
    acl all src 0.0.0.0/0
    http_access allow myservers http port80
    http_access deny all

    httpd_accel_uses_hosts_header on
    httpd_accel_host site1.com
    httpd_accel_port 80

No redirector in this configuration. I'll get to that bit later.

First problem. Requests from older browsers that don't send Host headers
will always get content from http://site1.com/. For example, a connection
comes in on 200.100.50.2 with the following HTTP request.

    GET /

There is no Host header so squid substitutes site1.com. So the user sees
site1.com when they expected site2.com! This is normal behaviour for
name-based virtual hosts but site1.com and site2.com are IP-based virtual
hosts! I don't know if this is really a problem: how many browsers don't
send Host headers?

Second problem. A user requests http://200.100.50.1/ expecting to get
site1.com. The request comes in to the bound address 200.100.50.1 with the
HTTP request looking like

    GET / HTTP/1.0
    Host: 200.100.50.1

Squid gives precedence to the Host header and translates the request to

    GET http://200.100.50.1/ HTTP/1.0
    Host: 200.100.50.1

This request then fails: there's no web server running on the accelerator
address 200.100.50.1. There's no way (as near as I can tell) to translate
this request into GET http://site1.com/ HTTP/1.0.

The short and sweet of the problem is that the /etc/hosts trick only works
when the Host header contains a name. If the Host header is missing or
contains an IP address then there's no resolve and the request fouls up.

I looked into using a redirector to fix this. I made the following changes
to the squid configuration.

    redirect_program squidguard
    redirect_rewrites_host_header on

The redirector then looks for the public IP addresses in the URL and
substitutes the appropriate site names. This means replicating the
information already in /etc/hosts.

    s@http://200.100.50.1/@http://site1.com/@
    s@http://200.100.50.2/@http://site2.com/@
    s@http://200.100.50.3/@http://site3.com/@

The redirected URL is then parsed by squid for the destination host and the
Host header in the request is rewritten to use the new destination. After
the redirect/rewrite the request finally falls for the /etc/hosts trick and
everything works. This system still fails if the browser doesn't send a Host
header and expects something other than http://site1.com/. This is pretty
close to what I want.

It seems that Squid needs some sort of method to map incoming requests to
backend servers. The map needs to be able to translate based upon the

    - IP address that the request came in on (IP-based virtual hosts)
    - Host header in the request (name-based virtual hosts)
    - URL in the request (redirections)

I can see how the 2nd and 3rd methods are doable, but as near as I can tell
there is no support for the 1st method. So I'd like to get some feedback on
a new directive. The idea is something like

    httpd_accel_translate \
        incoming_address host_header_regex \
        destination_server destination_host_header

I could then write

    httpd_accel_translate 200.100.50.1 * 10.0.0.1 site1.com
    httpd_accel_translate 200.100.50.2 * 10.0.0.2 site2.com
    httpd_accel_translate 200.100.50.3 site4.com 10.0.0.3 site4.com
    httpd_accel_translate 200.100.50.3 site5.com 10.0.0.3 site5.com
    httpd_accel_translate 200.100.50.3 * 10.0.0.3 site3.com

The first parameter is a regular expression. The second parameter is a
regular expression. The third parameter must be of the form host:port where
the port is optional. The fourth parameter is a string and optional.

The translation only occurs if the first and second parameters match the
incoming address and the host header in the request. The translation will
insert the third parameter into the host component of the URL in the request
and so the third parameter is effectively the IP address of the backend
server to contact. The translation will rewrite the Host header to be the
fourth parameter. Leaving the fourth parameter blank would leave the Host
header alone.

I believe this proposed directive handles any arrangement of name-based and
IP-based virtual hosts. In the example given above I have a default site for
each bound IP address. I don't need the /etc/hosts trick or a redirector. I
can map incoming IP-based virtual hosts into backend name-based virtual
hosts, or vice versa. I can "fix" my backend web servers - which don't
support the equivalent of ServerName in Apache - because I can override the
Host header.

Another benefit of httpd_accel_translate is that it deprecates the three
directives httpd_accel_single_host, httpd_accel_port and httpd_accel_host.
You can replace httpd_accel_single_host 10.0.0.1 using the following
translation.

    httpd_accel_translate * * 10.0.0.1 site1.com

And you can replace httpd_accel_host virtual with the following translation.

    httpd_accel_translate * * 10.0.0.1

And you can replace httpd_accel_port 8000 with the following translation.

    httpd_accel_translate * * 10.0.0.1:8000 site1.com

I haven't considered virtual ports via httpd_accel_port 0. I don't
understand the implications of this directive.

Now what I'd like to know is 1) have I missed an obvious and/or easier way
to do this, and 2) is there anything fundamentally wrong with the directive
I'm proposing. I realise there is already a strong argument against
introducing this new directive because I already have something which mostly
works.

An alternate solution is to extend the redirector specification to pass the
incoming IP address and the Host header to the redirector program. A
redirector would do all the work of matching and substituting. Squid would
honour the redirected URL and the redirected Host header information
returned by the redirector program. The redirect_rewrites_host_header
directive then becomes redundant and the /etc/hosts tricks would not be
necessary. As a person who has never contributed to squid, I just use it
occasionally, I prefer httpd_accel_translate.

-- 
The more I know about the WIN32 API the more I dislike it. It is complex and
for the most part poorly designed, inconsistent, and poorly documented.
                                                                - David Korn
Received on Tue May 29 2001 - 04:48:12 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:00:19 MST