Re: [squid-users] Caching issue with http_port when running in transparent mode from Amos Jeffries on 2012-05-29 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 29 May 2012 19:07:03 +1200

On 29/05/2012 6:12 p.m., Hans Musil wrote:
> Amos Jeffries wrote:
>> On 29.05.2012 08:13, Eliezer Croitoru wrote:
>>> hey there Hans,
>>>
>>> are you serving squid on the same machine as the gateway is?(wasnt
>>> sure about the DNAT).
>>> your problem is not directly related to squid but to the way that tcp
>>> and browsers works.
>>> for every connection that the client browser uses exist a tcp windows
>>> that stays alive for a period of time after the page was served.
>>> this will cause to all the connections that was served using port
>>> 3128 to still exist for i think 5 till 10 more minutes or whatever is
>>> your tcp stack settings.
>>
>> While that is true for the TCP details I think HTTP connection
>> behaviour is why that matters. For the TCP timeouts closure to start
>> happening HTTP has to first stop using the connection.
>>
>> iptables NAT only affects SYN packets (ie new connections). So any
>> existing TCP connections made by HTTP WILL continue to operate
>> despite any changes to NAT rules.
>>
>> HTTP persistent connections, CONNECT tunnels and HTTP
>> "streaming"/large objects have no fixed lifetime and several minutes
>> for idle timeout. It is quite common to see client TCP connections
>> lasting whole hours or days with HTTP traffic flow throughout.
>>
>>>
>>> On 28/05/2012 22:34, Hans Musil wrote:
>>>> Hi,
>>>>
>>>> my box is running on Debian Sqeeze, which uses SQUID version
>>>> 2.7.STABLE9, but my problem also seems to affect SQUID version 3.1.
>>>>
>>>> These are the importend lines from my squid.conf:
>>>>
>>>> http_port 3128 transparent
>>>> http_port 3129 transparent
>>>> url_rewrite_program /etc/squid/url_rewrite.php
>>>>
>>>>
>>>> First, I did configure my Linux iptables like this:
>>>>
>>>> # Generated by iptables-save v1.4.8 on Mon May 28 21:04:09 2012
>>>> *nat
>>>> :PREROUTING ACCEPT [0:0]
>>>> :POSTROUTING ACCEPT [0:0]
>>>> :OUTPUT ACCEPT [0:0]
>>>> -A PREROUTING -i eth1 -p tcp -m tcp --dport 80 -j DNAT
>>>> --to-destination 10.17.0.1:3128
>>>> COMMIT
>>>>
>>>> and everything works fine.
>>>>
>>>> But when I change the redirect port in the iptables settings from
>>>> 3128 to 3129, Squid behaves strange: My URL rewrite program still
>>>> gets send myport=3128, althought there is definitely no more
>>>> request on this port, but only on 3129. This only affects HTTP
>>>> domains that already have been requested before, i.e. with
>>>> redirection to port 3128, and it works fine again when I do a
>>>> force-reload on my browser. Also, things turn well when waiting
>>>> some minutes.
>>>>
>>>> I suppose there is some strange caching inside Squid that maps the
>>>> HTTP domain to an incoming port.
>>
>> No. There is only an active TCP connection. Multiple HTTP request can
>> arrive on the connection long after you start sending unrelated new
>> connections+requests through other ports.
>>
>>
>> What your helper was passed is the details about the request Squid
>> received. It arrived on a TCP connection which was accepted through
>> Squid port 3128. The fact that you changed the kernel settings after
>> that connection was setup and operating is irrelevant.
>>
>>
>> URL-rewriting is a form of NAT on the URL, but with far worse
>> side-effects than IP-layer NAT and is often a sign of major design
>> mistakes somewhere in the network. Why do you have to re-write in the
>> first place? perhapse we could point you at a simpler more standards
>> compliant setup.
>>
>> Amos
>>
> Thanks Amos. This makes things even clearer. Actually, I'd say that my
> problem is solved with the help of both of you. But well, let's have a
> look on my design.
>
> My goal is to build up an access control mechanism for my client
> machines to the internet. As long as a user has not yet logged in, his
> client box should be completely cut off the internet, not only HTTP.
>
> The login is done by a web interface. This is where I redirect the URL
> rewriting for any web traffic. After the user has logged in, the
> client's HTTP packets will be DNATed to the other squid port in order
> to be regularly proxied. I need the HTTP proxy for logging my users'
> HTTP requests.
>
> Since the users' client machines are out of my control, it is
> important for me that they don't need any special configuration,
> That's why the squid must run in transparent mode.

Okay. As expected a design problem. The huge problem with transparent
intercept is that the browser is 100% unaware that the proxy exists. As
far as it is concerned the re-written splash page or redirect response
is the actual response to somebody elses domain name (google or your
bank for example). It has zero reason to think that a new TCP connection
is needed for followup requests. Just because the server of that page
replied Connection:close is no reason to expect Squid to pass the
closure on to the client (quite the reverse, Squid will go out of its
way to keep client connections open and re-used).

To fit in with your existing config that would be:

  acl port3128 myportname 3128
  deny_info http://your-login.example.com/ port3128
  http_access deny port3128

The full details and some other tricks can be found at
http://wiki.squid-cache.org/ConfigExamples/Portal/Splash

This still hits the DNAT problems. I would suggest finding an
external_acl_type helper that accesses whatever database your login
script is recording client logins with. Using that as the ACL to deny /
bounce new clients to the login page. With that design you can authorize
a client on their initial request and continue using the connection
afterwards.

NP: I recenty posted to the list a version of the external_acl_type
helper I use myself for exactly this type of portal setup.

Amos
Received on Tue May 29 2012 - 07:07:24 MDT

This archive was generated by hypermail 2.2.0 : Tue May 29 2012 - 12:00:05 MDT