Odd interaction between local and parent web caches

From: Mike Brudenell <pmb1@dont-contact.us>
Date: Thu, 01 Jun 2000 10:36:27 +0100

Greetings -

We have been seeing some problems since reconfiguring our Local Web Cache
(LWC) at the start of this week to route more requests through our parent
caches at the National Web Cache (NWC).

I have been doing some testing and am now thoroughly bemused as to what is
(or isn't!) happening. If you can suggest anything I'd be very grateful!

The changes we made to our LWC configuration was to send POST requests, and
"query" requests (ie, those whose URLs contain "cgi-bin" or "?"), on to the
NWC instead of handling themselves and connecting to the origin server
directly.

Since the change we have been observing extremely long delays -- of the
order of 2-5 minutes -- when requesting such items. One example is
submitting information using SpamCop's form, at:
        http://spamcop.net/
Another example is in logging in to Hotmail:
        http://www.hotmail.com

Here are various scenarios I have tried to date...

Config #1 -- Browser goes direct
--------------------------------
If I configure my browser (IE 5 for Mac) to bypass all the proxies and
issue all requests direct to the origin server then responses are timely:
logging in to Hotmail takes only a few seconds.

Config #2 -- Browser -> LWC; LWC -> queries direct to origin servers
--------------------------------------------------------------------
If I configure our LWC to route queries (POSTs, and URLs containing
"cgi-bin" or "?") direct to the origin servers, and my browser to use the
LWC, then logging in to Hotmail again takes only seconds.

Config #3 -- Browser -> NWC
---------------------------
If I configure my browser to use the NWC machine as its proxy then again
logging in to Hotmail takes only seconds.

Config #4 -- Browser -> LWC; LWC -> queries to NWC
--------------------------------------------------
HOWEVER... If I configure my browser to use the LWC, and have this
configured to route queries through to the NWC ("prefer_direct" is "off",
and the "always_direct" which used to route queries direct to origin
servers is commented out) then there is a problem.

Specifically, logging in (when it works at all) to Hotmail takes (almost
exactly) 2 minutes. This figures comes from both my stopwatch and Squid's
access.log. Occasionally the request hangs for longer (5 minutes) and then
comes back not with a logged in Hotmail session but an error from the NWC
(lifetime of connection exceeded).

Perhaps significantly the 2 minute "successful" response is logged in the
access.log as a DIRECT connection (eg, our LWC gave up waiting for the NWC
and instead failed over to go direct instead). In contrast the 5 minute
"unsuccessful" request is logged as a "FIRST_UP_PARENT" or similar,
suggesting our LWC got a bit further and thought that the NWC was going to
return something (but it didn't?).

Sample extracts from access.log for a 5-minutes "failed" login to Hotmail
(lines wrapped to assist legibility):

        ... Request www.hotmail.com ...

959850285.345 418 144.32.128.9 TCP_MISS/302 678
    GET http://www.hotmail.com/ -
    FIRST_UP_PARENT/york0.sites.wwwcache.ja.net text/html
959850405.636 120271 144.32.128.9 TCP_MISS/200 4306
              ^^^^^^ !!!!!
GET http://lc1.law5.hotmail.passport.com/cgi-bin/login -
    DIRECT/lc1.law5.hotmail.passport.com text/html

        ... Enter password and press Return ...

959850444.603 917 144.32.128.9 TCP_MISS/000 2707
    CONNECT lc4.law5.hotmail.passport.com:443 -
    DIRECT/lc4.law5.hotmail.passport.com -
959850448.715 3801 144.32.128.9 TCP_MISS/302 1033
    GET http://lw9fd.law9.hotmail.msn.com/cgi-bin/sbox? -
    FIRST_UP_PARENT/york0.sites.wwwcache.ja.net text/html
959850748.056 299253 144.32.128.9 TCP_MISS/000 0
              ^^^^^^ !!!!!
    GET http://lw9fd.law9.hotmail.msn.com/cgi-bin/HoTMaiL? -
    FIRST_UP_PARENT/york0.sites.wwwcache.ja.net -

These tests make me think that in isolation browser + our LWC + direct
queries works fine, as does browser + NWC.

However there appears to be "something" which causes a problem when our LWC
tries to route POSTs or query URLs through the NWC.

We are running Squid 2.2STABLE5 (the "vanilla" distribution; ie, we don't
have any of Henrik Nordstrom's patches applied) as our LWC. I believe the
NWC are using 2.2STABLE5 + Henrik's patches of 13-Jan-2000 (ie,
hno.20000103).

We have:

  * Our two NWC parents marked as being "no-query" (are using cache digests
    only with them at present because their overloaded links are causing
    packet loss and hence problems with the ICP over UDP stuff);

  * CONNECT (https:) requests going direct from our LWC to the origin
    servers ("always_direct allow");

  * "prefer_direct off" to prefer sending other stuff through the parent
    caches over connecting direct to the origin servers.

So... is anyone familiar with this problem please? Is there anything I can
do to our existing setup to get routing POST and query URLs through the NWC
working (eg, "do this in your configuration", "apply Henrik's patches",
"upgrade to Squid 2.3", etc)?

With many thanks for any help you can give,

Mike Brudenell

-- 
The Computing Service, University of York, Heslington, York Yo10 5DD, UK
Tel:+44-1904-433811  FAX:+44-1904-433740
                                 Web: http://www-users.york.ac.uk/~pmb1/
* Unsolicited commercial e-mail is NOT welcome at this e-mail address. *
Received on Thu Jun 01 2000 - 03:40:22 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:53:49 MST