Re: Multiple issues in Squid-3.2.3 SMP + rock + aufs + a bit of load

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 03 Dec 2012 15:53:14 -0700

On 12/03/2012 02:01 PM, Henrik Nordström wrote:
> mån 2012-12-03 klockan 09:07 -0700 skrev Alex Rousskov:

>> I was going to agree with that wholeheartedly, but then I thought about
>> signals. If we visit each non-waiting engine once, then we will only
>> process signals once per main loop step. Is that OK?
>
> Not 100% OK, but not too bad. Causes max 1 seconds delay of processing
> signals, and the only signals I know of which is not comm related (and
> also have comm events) are squid -k signals.

True (although the current small maximum select(2) delay is a result of
bugs elsewhere in the code and should not really be there). If we can do
the right thing here easily, we should (instead of adding more future
problems). And I think we can.

>> I am worried that this will result in 50% or more zero-delay select(2)
>> calls for nearly all Squids because there is usually some AsyncCall
>> activity per I/O loop (but possibly no timed-events).

> So need to make sure AsyncCall is drained last in RunOnce.

Ah, I see another problem with the "each non-waiting engine runs once"
approach. One engine may create work for other engines or for itself,
including async calls and "run this now" lightweight events. This means
we should really continue to use the sawActivity loop so that all
lightweight events are processed before I/O wait starts. This design is
more complex, but it is actually "more correct".

We just need to make sure that heavy events (certain timed events and
signals) interrupt the sawActivity loop. This will fix the issue you are
facing without introducing new problems.

The new condition for the sawActivity loop should be:

    while(sawActivity && !sawHeavyEvent)

EventLoop::runOnce() will set sawHeavyEvent member to false before
looping. EventLoop::checkEngine() will set sawHeavyEvent member to true
when an engine returns AsyncEngine::EVENT_HEAVY. The event engine will
return AsyncEngine::EVENT_HEAVY when it encounters an event that
warrants ending the current main loop iteration ASAP.

Attached untested patch implements the above. I am sure it can be
improved further. For example, the loop_delay after heavy event should
probably be set to zero (because there are unprocessed events waiting
and also in anticipation of more heavy events to come -- we are too busy
to wait!).

> Is it sufficient to call dispatchCalls() or do one need to loop over it
> until no activity remains?

One call is sufficient -- the async call queue drains itself completely,
including any calls scheduled during the draining process itself, but
the dispatchCalls() call should remain inside the sawActivity loop
(because events create calls create "now" events create calls create
"now" events ...).

Does the attached patch makes sense to you? Does it solve the "I/O
starvation during rebuild" problem you found?

HTH,

Alex.

Received on Mon Dec 03 2012 - 22:53:30 MST

This archive was generated by hypermail 2.2.0 : Tue Dec 04 2012 - 12:00:05 MST