Re: Multiple issues in Squid-3.2.3 SMP + rock + aufs + a bit of load

From: Henrik Nordström <henrik_at_henriknordstrom.net>
Date: Wed, 05 Dec 2012 01:03:21 +0100

tis 2012-12-04 klockan 08:39 -0700 skrev Alex Rousskov:

> There are several ways to interpret the designer intent when looking at
> undocumented code. I cannot say whether all of the currently remaining
> zero-delay events use your interpretation, but I am certain that I have
> added zero-delay events in the past that used a comm-independent
> interpretation ("get me out of this deeply nested set of calls but
> resume work ASAP"). And, IMO, that is actually the right definition for
> the reasons discussed in my response to Amos email.

I can only speak for the Squid-2 code base on this, and the intents
behind event.c[c] event handling.

> Agreed. Both patches support that AFAICT.

What I ended up with is a combination of both.

I kept sawActivity loop for now, pending review of zero-delay events and
store callbacks. I still think the sawActivity loop is undesired and not
needed, but it's not the bug we are working on right now.

But your patch also needs timeTillNextEvent to calculate the loop delay.
or the timed events starve due to loop_delay being calculated wrongly
(not taking into account for events added "just now").

Note: There is a slight but critical error in my timeTillNextEvent patch
as posted. loop_delay < requested should be >. Other than this it seems
to work.

And there was a couple of other problems seen today:

*) UFS store rebuilding crashes if there is cache collisions with
non-ufs stores competing for the same object.
assertion failed: ufs/ufscommon.cc:709: "sde"
have a patch for this.

*) some undiagnosed crash
assertion failed: comm.cc:1259: "isOpen(fd)"
seen this randomly earlier, but now it hit all the workers at about the
same time +/- some seconds. This was randomly seen before event loop
rework as well.

*) shortly after the above after the workers had restarted and rebuilt
their caches all started to spew
Worker I/O push queue overflow: ipcIo1.13951w6
and calmed down after some minutes. Continued for ~10 min I think. I/O
load on the server was marginal.

Regards
Henrik
Received on Wed Dec 05 2012 - 00:03:27 MST

This archive was generated by hypermail 2.2.0 : Wed Dec 05 2012 - 12:00:09 MST