Re: [PATCH] Bug 2680: ** helper errors after -k rotate

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 16 Jul 2009 12:44:39 +1200

On Thu, 16 Jul 2009 09:15:22 +1000, Robert Collins
<robertc_at_robertcollins.net> wrote:
> On Thu, 2009-07-16 at 01:19 +1200, Amos Jeffries wrote:
>
>
>> A Henrik said,
>> people with large memory-hog helpers have issues when Squid allocates

>> more than N bunches of their carefully tuned available memory to its
>> helpers. This is also important in low-memory systems requiring auth.
>>
>> It's a simple 'start N' call now checks the number of running helpers
>> before blindly starting new ones. Making Squid actually follow its
>> numerous children=N settings.
>>
>>
>> I'm fine with reverting it in 3.1. But this is a nasty mix of sync and
>> async operations that does need cleaning up in 3.2. It's semi-hiding
>> about 4 bugs in a helpers and auth.
>
> I'm not sure it was hiding bugs - as Henrik also said, we sync
> *initiate* async shutdown of helpers, and startup new helpers. Similarly
> ACL processing of in-flight requests used to be refcounted, so the old
> config applies to existing requests, and new requests get new policies
> applied to them (because of the pointer into the acl chain that requests
> hold). Existing requests that move to new ACL chains get the new config,
> and it all works.
>
> As far as memory goes, I think we should document that when you
> reconfigure squid, *all* settings are duplicated until the transition is
> complete, so if you ask for 100 helpers, up to 200 will be held open per
> configuration-that-is-active.

Well, as I'm seeing it now. The rotate/reconfigure case is are where you
both argue its reasonable to have more than N helpers running.

The hidden bugs I've seen are the ntlm helper infinite reservation bug. It
shows up clearly when squid is checking the number of helpers running. It
shows up rarely and only under high-load when squid is allowed to open more
helpers to fill the gap each time they run low. Then crashes claiming too
many helpers are running, nothing to indicate the reservation hangs that
caused it.
Also the sudden memory swapping issues mentioned when either the above or a
rotate/reconfigure hits. OpenWRT and OLPC are the limited-memory cases
here. If OpenWRT gets bloated memory the network effects are unpredictable
leading to loss.

We would need to document that during rotate 200% of the allocated helpers
are running, and that at any other time 150% of the helpers may be running.
But I don't see a real need to do it this way other than "its easy"

Would it be reasonable to do this for a clean bounded rollover:
 send a flag to shutdown indicating a shutdown schedule of M helpers, then
a restart back to N, then another M helpers to shutdown. This would all be
scheduled from the shutdown loop to keep track of which helpers are in
which set. Without the flag shutdown skips the restart attempts and does
the current behavior.
So that the rollover happens async and cleanly, but we never have no excess
helpers running?

Amos
Received on Thu Jul 16 2009 - 00:44:44 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 16 2009 - 12:00:05 MDT