Re: Squid 2.2.STABLE2 & choice of parent [another patch]

From: John Line <webadm@dont-contact.us>
Date: Sat, 8 May 1999 16:37:29 +0100 (BST)

Henrik Nordstrom wrote:
>
> > The counts above also reinforce my feeling that RTT estimates should
> > influence routing even for requests that get ICP timeouts from all peers
> > (assuming that's not because all peers are down... :-) - the FIRST_UP_PARENT
> > choice was cam0.sites but that parent is clearly being avoided by the
> > ICP-based routing and it seems to have problems at present - continual
> > stream of TCP connection failed (with occasional succeeded), etc. Being
> > first in the list does not mean it's a good (or even reasonable) choice.
>
> Attached is a patch designed to select the parent with lowest
> statistical RTT when ICP times out.

Thank you! That seems to work - I see a few TIMEOUT_FIRST_UP_PARENT entries
soon after startup, but once RTT information has been accumulated they
more-or-less disappear, with only TIMEOUT_FIRST_PARENT_MISS for the ICP
timeouts. A few TIMEOUT_FIRST_UP_PARENT entries do appear intermittently,
though. I'll let you know if I manage to spot any pattern to when they
happen!

The parent and sibling log entries (for the last 2000 such requests) are

1437 FIRST_PARENT_MISS/cam1.sites.wwwcache.ja.net
 275 FIRST_PARENT_MISS/cam2.sites.wwwcache.ja.net
  10 FIRST_UP_PARENT/cam0.sites.wwwcache.ja.net
  48 FIRST_UP_PARENT/cam1.sites.wwwcache.ja.net
  11 PARENT_HIT/cam0.sites.wwwcache.ja.net
  40 PARENT_HIT/cam1.sites.wwwcache.ja.net
  52 PARENT_HIT/cam2.sites.wwwcache.ja.net
   4 SIBLING_HIT/wwwcache.damtp.cam.ac.uk
  41 TIMEOUT_FIRST_PARENT_MISS/cam0.sites.wwwcache.ja.net
  77 TIMEOUT_FIRST_PARENT_MISS/cam1.sites.wwwcache.ja.net
   5 TIMEOUT_FIRST_UP_PARENT/cam0.sites.wwwcache.ja.net

I've got moree logging enabled (thanks to your point below), and will see
later on if I can find any obvious explanation of the
TIMEOUT_FIRST_UP_PARENT entries (whether reasonable or apparently wrong).

> > A few other miscellaneous points that arose in this morning's testing:
> >
> > (1) The first couple of times I attempted to start the recompiled Squid 2.2
> > with the patch, startup failed with an "Arithmetic Exception" message
> > appearing in the middle of the startup script's output.
>
> That is usually divide by zero.. is it a fatal error on your platform,
> or did Squid continue?

I'm still using a locally modified RunCache script (with -N option to
Squid), and that attempted to restart it - so it crash. :-)

> The easiest way to find those startup errors is to start the program
> from inside a debugger (preferably gdb).
> gdb squid
> gdb> r -CNd1

If it was repeatable, yes... it happened several times before I abandoned
that attempt and restarted it later without any problems. I'll let you know
if I manage to get any further information about that problem.

> > (2) I've mentioned before that adding an extra debug option sometimes seem
> > to cause all cache.log output to cease, and it was happening again, though
>
> I think the squid.conf syntax requires you to write all debug options on
> a single line. Later lines override the previous completely.

Sigh... that explanation didn't occur to me, and since I didn't know what
debugging output to expect, didn't realise I was getting output for only one
directive. The example in the sample squid.conf has only one definition (so
does not show by example how to have multiple options), and the comments
don't say how to do it, either. Since some directives are cumulative
(indeed, for access control you have to use multiple directives to specify
alternatives, since ACLs on the same directive are ANDed...), I think the
documentation (of which the sample config is the most visible) needs clear
indications of such syntactic details!

> > My guess (without checking the source code) is that something's not allowing
> > enough time for a clean shutdown.
>
> Hmm.. your list included swap.state, and swap.state is always closed on
> shutdown. The same is true for the other log files.

Implying it shouldn't happen in the way the log showed, I presume...? Once
again, if I see any pattern, I'll let you know.

                                John

-- 
University of Cambridge WWW manager account (usually John Line)
Send general WWW-related enquiries to webmaster@ucs.cam.ac.uk
Received on Tue Jul 29 2003 - 13:15:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:07 MST