Re: Helpers idea from Dancer Vesperman on 2000-09-14 (squid-dev)

From: Dancer Vesperman <dancer@dont-contact.us>
Date: Fri, 15 Sep 2000 12:16:07 +1100

Robert Collins wrote:

> > -----Original Message-----
> > From: dancer@zeor.simegen.com [mailto:dancer@zeor.simegen.com]
> > Sent: Friday, 15 September 2000 9:55 AM
> > To: Robert Collins; squid-dev@squid-cache.org
> > Subject: Re: Helpers idea
> >
> >
> > Robert Collins wrote:
> >
> > > I don't know if this is on the to-do list, but I'm thinking
> > of coding up
> > > self-adjusting helper counts.
> > >
> > > something along the lines of
> > >
> > > n is the number of running helper of a given type
> > > when new jobs are submitted, if the queue length for n is
> > more than 2 * n,
> > > and no helpers have been spawned for n/2 jobs spawn n/10
> > (rounded up) new
> > > helpers.
> > > if the queue length stays at zero for n/2 jobs kill n/10 helpers.
> > >
> > > Yes I know magic numbers are bad - I'll have them in the conf..
> > >
> > > Comments/already done/bad idea?
> > >
> > > Rob
> >
> > I'm just experimenting with skipping accept() if the queue
> > length is over a
> > arbitrary number.
> >
> > D
> >
> >
>
> I presume you mean you do stop accepting new http requests?
> There are two essential problems with that
> 1. Existing persistent connections can still make requests. In the event
> of a request, a helper may be called (i.e. a DNS lookup may be needed
> (->so the dns helper gets called). (And yes I know about internal dns
> :-]). So a fixed limit on helpers is still problematical.

Thought about that. In our case, however, our redirectors are completely
and utterly CPU bound (and, in fact, to avoid sinusoidal asymptotic
scheduling, we've switched scheduler algorithms for those, so that they are
not preemptable until completion. This significantly improves overall
machine performance - We have either two or four CPUs in our boxen,
depending on the box)

If our redirector queue is running at less than say 500 then that's alright
(the average service time on our redirectors is sub 1ms, and peaks around
3ms........unless the redirector queue grows to 2000+ at which point, the
redirectors begin to slow down considerably). However, because of the
relationship (non-linear, and complex) between outstanding requests waiting
for redirector service and the actual redirector service time, it seems
logical to attempt to suppress new connections at some given point...say
1000 or so.

> 2. If you don't accept() then the user may time out if the (whatever
> helper queue you measure on) queue length doesn't drop fast enough.
> Network problems that affect one particular helper may not affect all
> helpers, so I think it is unreasonable to affect all users. (DNS is a
> good example here. If your dns server is down, new name lookups will
> fail, and a queue may build up. But http should still be working fine.
> Redirectors are also a good example. What if all but one redirector
> process has hung? Redirections are still happening, but one will never
> complete, and the queue will be getting pretty long.

We've sort of cheated a little here. One small part of the redirection
process may involve a DNS lookup, however, we have convinced squid to
handle the DNS lookup asynchronously in advance of the synchronous
redirector request, so that the result is always cached, and waiting. All
other operations are completely CPU bound (lots of rule-matching). Our
redirectors don't hang at all, ever, so we've never come up against the
hung process problem.

I figure if we spin a little on the number of accepts, we'll manage
alright. Also our histograms show us that we rarely ever get as many as
three requests on a single connection, and only occasionally do we get two.

> I think a better approach than skipping accept() would be stop listening
> in the event that everything has crashed and burnt.. that way there will
> be no time out, the users machine will know immediately they can't get
> thru and their .pac script can switch proxies (or whatever their fall
> back is - ring the admin etc etc)

We've got around....Ummm...don't quote me...142,000 users per proxy? Not
all at the same time, of course, but that's the allocation. We almost never
lose a box, but they _do_ bog down rather a lot at peak, with incoming
requests exceeding the rate that we can handle them. What my idea is is to
slow down acceptance of new connections, which will have a direct impact on
the browsing patterns of the end users, and keep things within some
theoretical limit. However, with some new layer-4 switch hardware that
we've gotten, your idea may be good. unlistening would give us instant
failover on the layer-4 hardware. The problem then would be....what happens
if all the units hit their limit as load cascades across the farm?

>
>
> Rob
Received on Thu Sep 14 2000 - 19:15:10 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:37 MST