Re: [squid-users] Caching issue with http_port when running in transparent mode

From: Hans Musil <hans.musil_at_gmx.de>
Date: Tue, 05 Jun 2012 21:04:19 +0200

-------- Original-Nachricht --------
> Datum: Tue, 05 Jun 2012 19:54:12 +0200
> Von: "Hans Musil" <hans.musil_at_gmx.de>
> An: Amos Jeffries <squid3_at_treenet.co.nz>, squid-users_at_squid-cache.org
> Betreff: Re: [squid-users] Caching issue with http_port when running in transparent mode

> Amos Jeffries wrote:
>
> > On 29/05/2012 6:12 p.m., Hans Musil wrote:
> > > Amos Jeffries wrote:
> > >> On 29.05.2012 08:13, Eliezer Croitoru wrote:
> > >>> hey there Hans,
> > >>>
> > >>> are you serving squid on the same machine as the gateway is?(wasnt
> > >>> sure about the DNAT).
> > >>> your problem is not directly related to squid but to the way that
> tcp
> > >>> and browsers works.
> > >>> for every connection that the client browser uses exist a tcp
> windows
> > >>> that stays alive for a period of time after the page was served.
> > >>> this will cause to all the connections that was served using port
> > >>> 3128 to still exist for i think 5 till 10 more minutes or whatever
> is
> > >>> your tcp stack settings.
> > >>
> > >> While that is true for the TCP details I think HTTP connection
> > >> behaviour is why that matters. For the TCP timeouts closure to start
> > >> happening HTTP has to first stop using the connection.
> > >>
> > >> iptables NAT only affects SYN packets (ie new connections). So any
> > >> existing TCP connections made by HTTP WILL continue to operate
> > >> despite any changes to NAT rules.
> > >>
> > >> HTTP persistent connections, CONNECT tunnels and HTTP
> > >> "streaming"/large objects have no fixed lifetime and several minutes
> > >> for idle timeout. It is quite common to see client TCP connections
> > >> lasting whole hours or days with HTTP traffic flow throughout.
> > >>
> > >>>
> > >>> On 28/05/2012 22:34, Hans Musil wrote:
> > >>>> Hi,
> > >>>>
> > >>>> my box is running on Debian Sqeeze, which uses SQUID version
> > >>>> 2.7.STABLE9, but my problem also seems to affect SQUID version 3.1.
> > >>>>
> > >>>> These are the importend lines from my squid.conf:
> > >>>>
> > >>>> http_port 3128 transparent
> > >>>> http_port 3129 transparent
> > >>>> url_rewrite_program /etc/squid/url_rewrite.php
> > >>>>
> > >>>>
> > >>>> First, I did configure my Linux iptables like this:
> > >>>>
> > >>>> # Generated by iptables-save v1.4.8 on Mon May 28 21:04:09 2012
> > >>>> *nat
> > >>>> :PREROUTING ACCEPT [0:0]
> > >>>> :POSTROUTING ACCEPT [0:0]
> > >>>> :OUTPUT ACCEPT [0:0]
> > >>>> -A PREROUTING -i eth1 -p tcp -m tcp --dport 80 -j DNAT
> > >>>> --to-destination 10.17.0.1:3128
> > >>>> COMMIT
> > >>>>
> > >>>> and everything works fine.
> > >>>>
> > >>>> But when I change the redirect port in the iptables settings from
> > >>>> 3128 to 3129, Squid behaves strange: My URL rewrite program still
> > >>>> gets send myport=3128, althought there is definitely no more
> > >>>> request on this port, but only on 3129. This only affects HTTP
> > >>>> domains that already have been requested before, i.e. with
> > >>>> redirection to port 3128, and it works fine again when I do a
> > >>>> force-reload on my browser. Also, things turn well when waiting
> > >>>> some minutes.
> > >>>>
> > >>>> I suppose there is some strange caching inside Squid that maps the
> > >>>> HTTP domain to an incoming port.
> > >>
> > >> No. There is only an active TCP connection. Multiple HTTP request can
> > >> arrive on the connection long after you start sending unrelated new
> > >> connections+requests through other ports.
> > >>
> > >>
> > >> What your helper was passed is the details about the request Squid
> > >> received. It arrived on a TCP connection which was accepted through
> > >> Squid port 3128. The fact that you changed the kernel settings after
> > >> that connection was setup and operating is irrelevant.
> > >>
> > >>
> > >> URL-rewriting is a form of NAT on the URL, but with far worse
> > >> side-effects than IP-layer NAT and is often a sign of major design
> > >> mistakes somewhere in the network. Why do you have to re-write in the
> > >> first place? perhapse we could point you at a simpler more standards
> > >> compliant setup.
> > >>
> > >> Amos
> > >>
> > > Thanks Amos. This makes things even clearer. Actually, I'd say that my
> > > problem is solved with the help of both of you. But well, let's have a
> > > look on my design.
> > >
> > > My goal is to build up an access control mechanism for my client
> > > machines to the internet. As long as a user has not yet logged in, his
> > > client box should be completely cut off the internet, not only HTTP.
> > >
> > > The login is done by a web interface. This is where I redirect the URL
> > > rewriting for any web traffic. After the user has logged in, the
> > > client's HTTP packets will be DNATed to the other squid port in order
> > > to be regularly proxied. I need the HTTP proxy for logging my users'
> > > HTTP requests.
> > >
> > > Since the users' client machines are out of my control, it is
> > > important for me that they don't need any special configuration,
> > > That's why the squid must run in transparent mode.
> >
> > Okay. As expected a design problem. The huge problem with transparent
> > intercept is that the browser is 100% unaware that the proxy exists. As
> > far as it is concerned the re-written splash page or redirect response
> > is the actual response to somebody elses domain name (google or your
> > bank for example). It has zero reason to think that a new TCP connection
> > is needed for followup requests. Just because the server of that page
> > replied Connection:close is no reason to expect Squid to pass the
> > closure on to the client (quite the reverse, Squid will go out of its
> > way to keep client connections open and re-used).
> >
> >
> > To fit in with your existing config that would be:
> >
> > acl port3128 myportname 3128
> > deny_info http://your-login.example.com/ port3128
> > http_access deny port3128
> >
> > The full details and some other tricks can be found at
> > http://wiki.squid-cache.org/ConfigExamples/Portal/Splash
> >
> > This still hits the DNAT problems. I would suggest finding an
> > external_acl_type helper that accesses whatever database your login
> > script is recording client logins with. Using that as the ACL to deny /
> > bounce new clients to the login page. With that design you can authorize
> > a client on their initial request and continue using the connection
> > afterwards.
> >
> > NP: I recenty posted to the list a version of the external_acl_type
> > helper I use myself for exactly this type of portal setup.
> >
> > Amos
>
> Amos, I'm back. Thanks for your last posting.
>
> Your trick with acl, deny_info and http_access was a big help.
>
> As far as I understand, the external_acl_type helper needs to decide every
> few seconds whether a client is logged in or not. With some hundreds of
> clients, this means hundreds of database lookups per second. That's what I
> wanted to avoid by flipping the squid port when a user logs in or out,
> respectively. This way, I only have one iptables rule instead of multiple DB
> lookups.
>
> As far as the DNAT problem, I consider to simply run a "contrack -D" with
> appropriate -s and -d options from my login/logout script.
>
> Hans

Ups, an other problem: Amos, your solution looks fine, but there is one problem. My login/logout script needs to know the client's IP, but it only sees my squid's IP. I know, there is format tag %i, but this would require the non-stable version 3.2. Any better idea?

Hans

-- 
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
Received on Tue Jun 05 2012 - 19:04:35 MDT

This archive was generated by hypermail 2.2.0 : Wed Jun 06 2012 - 12:00:03 MDT