Re: Multiple issues in Squid-3.2.3 SMP + rock + aufs + a bit of load

From: Henrik Nordström <henrik_at_henriknordstrom.net>
Date: Mon, 03 Dec 2012 09:44:03 +0100

sön 2012-12-02 klockan 22:27 -0700 skrev Alex Rousskov:

> My understanding is that the bug is in Squid store rebuilding(?) code
> that assumes that Squid will have time to do some I/O before the next
> timed event is fired, but actually there is a lot of non-I/O work to do,

In such case the bug is in handling of events added by eventAdd(), not
in the store rebuilding code.

The store rebuilding does

        eventAdd("storeRebuild", RebuildStep, rb, 0.01, 1);

which says both

delay = 0.01, avoid saturating things too much

weight = 1, this is a heavy job, slow down event.cc processing when this
is seen. Only one heavy job should be scheduled per event loop.

> and the next rebuild step event becomes "ready" before any I/O can
> happen. That event schedules another event that, again, fires before we
> can get to I/Os. In this context, "I/O" is regular select-based I/O
> orchestrated by the waitingEngine and not direct disk reading or writing
> that Store code might do during rebuild.

events added by eventAdd must not fire until next event loop takes
place.

> I doubt this should be fixed by allowing I/O into the "process current
> non-I/O events" engines loop. The problem is not in the lack of I/Os,
> but in an unrealistic scheduling expectations by non-I/O code (or lack
> of API to schedule those rebuild events on a realistic schedule).

It's not lack of API. Only the event loop not scheduling things in a
sendible manner.

> > Fixed by this change. I think it's correct but review please.
>
> The change does not really fix the bug if I understand the bug
> correctly. It just masks it. Squid would violate the intended "wait x
> seconds before every rebuld step" schedule even more after this change.

No.

> More importantly, I am concerned that this change may inject new I/O
> callbacks between old async calls if we still have some I/O callbacks
> that are not done using async calls API. That is likely to create
> hard-to-find bugs because the order of events (in a broad sense) will be
> disrupted.

But yes, this may happen and is not good.

> 1) When entering the non-I/O engines loop, remember "now" and only
> process timed events that are ready at that remembered time. Do not
> advance that time value until we exit the loop and let I/O engine run.
> This does not change the structure of the engines loop. Some events will
> still be late, but there will be no disruptive injections of new I/O
> activity that might not use async calls. This is probably a
> smaller/easier change, but it also masks the true problem.

event.cc actually does this (and always have), but gets broken by
EventLoop::runOnce() doing this:

   do {
        // generate calls and events
        typedef engine_vector::iterator EVI;
        for (EVI i = engines.begin(); i != engines.end(); ++i) {
            if (*i != waitingEngine)
                checkEngine(*i, false);
        }

        // dispatch calls accumulated so far
        sawActivity = dispatchCalls();
        if (sawActivity)
            runOnceResult = false;
    } while (sawActivity);

    if (waitingEngine != NULL)
        checkEngine(waitingEngine, true);

This will poll non-comm event loops until they are completely idle
having nothing to do.

The sawActivity loop should go away I think, polling each event engine
once per loop. It's the responsibility of each event engine to run as
many events as it needs per loop.

Looking at the code I think I see why this sawActivity loop got added
around the non-comm event engines. There is a AsyncCalls induced delay
procssing of events fired by checkEngine() making it fail to properly
set the required loop_delay. We need to extend the event engine API with
a call to return the minimum loop delay each engine needs after
dispatchCalls() have been run allowing it to account for new events
added.

An interim solution without changing event engine API to include such
call is to drop loop_delay down to 0 if there is any activity,
disregarding what the event engine said in such case. This will cause a
slight CPU penalty under lower loads as the event loop fires more often
than needed but is a very easy change. I.e. something like this:
(completely untested)

=== modified file 'src/EventLoop.cc'
--- src/EventLoop.cc 2012-02-05 06:09:46 +0000
+++ src/EventLoop.cc 2012-12-03 08:42:39 +0000
@@ -107,19 +107,26 @@
     if (!waitingEngine && !engines.empty())
         waitingEngine = engines.back();
 
- do {
- // generate calls and events
- typedef engine_vector::iterator EVI;
- for (EVI i = engines.begin(); i != engines.end(); ++i) {
- if (*i != waitingEngine)
+ // generate calls and events
+ typedef engine_vector::iterator EVI;
+ for (EVI i = engines.begin(); i != engines.end(); ++i) {
+ if (*i != waitingEngine)
                 checkEngine(*i, false);
- }
+ }
 
- // dispatch calls accumulated so far
- sawActivity = dispatchCalls();
- if (sawActivity)
- runOnceResult = false;
- } while (sawActivity);
+ // dispatch calls accumulated so far
+ sawActivity = dispatchCalls();
+ if (sawActivity) {
+ runOnceResult = false;
+ /* Delay is unknown if there was events triggered. Schedule next
+ * event loop immediately so it can calculate delay proper
+ * TODO: Maybe replace this with polling each engine on their
+ * requested delay here instead of trying to do so in
checkEngine()
+ * checkEngine() can't as it needs to be calculated after
+ * dispatchCalls()
+ */
+ loop_delay = 0;
+ }
 
     if (waitingEngine != NULL)
         checkEngine(waitingEngine, true);

Regards
Henrik
Received on Mon Dec 03 2012 - 08:44:10 MST

This archive was generated by hypermail 2.2.0 : Mon Dec 03 2012 - 12:00:06 MST