Re: [PATCH] Bug 2680: ** helper errors after -k rotate

From: Robert Collins <robertc_at_robertcollins.net>
Date: Thu, 16 Jul 2009 11:13:14 +1000

On Thu, 2009-07-16 at 12:44 +1200, Amos Jeffries wrote:
>
>
> Well, as I'm seeing it now. The rotate/reconfigure case is are where
> you
> both argue its reasonable to have more than N helpers running.

Yes. Trivial example of problems: assume that the user configures one
stateful helper. They then change the command line and reconfigure.
We must start a new helper or its stderr goes to the wrong log
indefinitely.
We are not meant to send new requests to the old helper (we're trying to
shut it down)
If we do not permit multiple helpers, then we cannot start the new
helper until the old one completes, and as that is [modulo timeouts]
under client control, we could be waiting for an arbitrary time with all
new requests getting queued up waiting for the helper to shutdown.
That could be several seconds.

Note that with NTLM this is realistic - there's no need for multiple
helpers as a single helper can serve many concurrent requests.

> The hidden bugs I've seen are the ntlm helper infinite reservation
> bug. It
> shows up clearly when squid is checking the number of helpers running.
> It
> shows up rarely and only under high-load when squid is allowed to open
> more
> helpers to fill the gap each time they run low. Then crashes claiming
> too
> many helpers are running, nothing to indicate the reservation hangs
> that
> caused it.

So two things:
 - theres no reason that the overload support code can't hard-limit the
   number of *non-shutting-down-helpers*. (reconfigure marks all old
   helpers as being queued to shutdown). So you can fix/address that
   DOS case without altering the reconfigure case.
 - The above sounds like a bug that isn't related to reconfigure.

> Also the sudden memory swapping issues mentioned when either the above
> or a
> rotate/reconfigure hits. OpenWRT and OLPC are the limited-memory cases
> here. If OpenWRT gets bloated memory the network effects are
> unpredictable
> leading to loss.

If someone is running hundreds of helpers on openwrt/olpc then things
are broken already :). I'd really suggest that such environments
pipeline through a single helper rather than many concurrent helpers.
Such platorms are single core and you'll get better usage of memory
doing many requests in a single helper than one request each to many
helpers.

> We would need to document that during rotate 200% of the allocated
> helpers
> are running, and that at any other time 150% of the helpers may be
> running.
> But I don't see a real need to do it this way other than "its easy"

I don't see what the 150% is about - that sounds unconnected to
reconfigure ;). Perhaps the helper limit you added is just failing to
ignore a 'shutting_down' helper?

> Would it be reasonable to do this for a clean bounded rollover:
> send a flag to shutdown indicating a shutdown schedule of M helpers,
> then
> a restart back to N, then another M helpers to shutdown. This would
> all be
> scheduled from the shutdown loop to keep track of which helpers are in
> which set. Without the flag shutdown skips the restart attempts and
> does
> the current behavior.
> So that the rollover happens async and cleanly, but we never have no
> excess
> helpers running?

No. That makes reconfigure _more_ of a critical section. reconfigure
needs to create a new config, activate for all new requests, and then
let refcounting take care of letting the old config (and all the things
it used) go away).

-Rob

Received on Thu Jul 16 2009 - 01:13:20 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 16 2009 - 12:00:05 MDT