Re: Squid 2.2.STABLE2 & choice of parent [another patch] from John Line on 1999-05-08 (squid-dev)

From: John Line <webadm@dont-contact.us>
Date: Sat, 8 May 1999 17:27:28 +0100 (BST)

Henrik Nordstrom wrote:
> > The counts above also reinforce my feeling that RTT estimates should
> > influence routing even for requests that get ICP timeouts from all peers
> > (assuming that's not because all peers are down... :-) - the FIRST_UP_PARENT
> > choice was cam0.sites but that parent is clearly being avoided by the
> > ICP-based routing and it seems to have problems at present - continual
> > stream of TCP connection failed (with occasional succeeded), etc. Being
> > first in the list does not mean it's a good (or even reasonable) choice.
>
> Attached is a patch designed to select the parent with lowest
> statistical RTT when ICP times out.

I've now looked at the cases where I still got TIMEOUT_FIRST_UP_PARENT and
they appear to be due to the chosen FIRST_PARENT_MISS peer rejecting Squid's
connection (reason unknown, it was the fastest parent for most requests :-)
resulting in it dropping back to FIRST_UP_PARENT.

That prompts two followup questions:

(1) Why is it logged as TIMEOUT_FIRST_UP_PARENT? That's misleading, since
it wasn't chosen due to ICP timeout... (I'm not sure what it should be
logged as instead, though; FIRST_UP_PARENT would tend to imply it was the
first-choice route, when it wasn't; perhaps TIMEOUT_ is has the most
appropriate meaning. The more general question, I suppose, with Squid
retrying failed retrievals automatically, is whether there is intended to be
a distinction between the description in the log of requests that are routed
in a particular way as a result of failure by a preferred retrieval option,
of if they're intended to be logged as if the routing that was actually used
had been the first choice.

(2) Is there a check somewhere to avoid the same parent being chosen as
FIRST_PARENT_MISS peer and also as the fallback FIRST_UP_PARENT peer? Not a
major problem, I suppose, unless a *lot* of requests were routed in that way
and the peer was unresponsive, so that the requests would take twice as long
to fail (assuming the problem affected all requests; it's possible the first
would fail and the second succeed, if the peer status was fluctuating
rapidly e.g. close to running out of filedescriptors for peer sockets).

John

-- 
University of Cambridge WWW manager account (usually John Line)
Send general WWW-related enquiries to webmaster@ucs.cam.ac.uk

Received on Tue Jul 29 2003 - 13:15:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:07 MST