Re: Multiple issues in Squid-3.2.3 SMP + rock + aufs + a bit of load

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sun, 02 Dec 2012 22:27:57 -0700

On 12/02/2012 06:46 PM, Henrik Nordström wrote:
> fre 2012-11-30 klockan 15:38 -0700 skrev Alex Rousskov:
>> On 11/30/2012 02:25 PM, Henrik Nordström wrote:
>>
>>> We should look into why it is at all needed. From what I can understand
>>> it should not be needed.
>>
>> Agreed. Please do that if you can.
>
> Found it. The event loop gets saturated with timed events during the
> rebuild and never exectues comm events.

My understanding is that the bug is in Squid store rebuilding(?) code
that assumes that Squid will have time to do some I/O before the next
timed event is fired, but actually there is a lot of non-I/O work to do,
and the next rebuild step event becomes "ready" before any I/O can
happen. That event schedules another event that, again, fires before we
can get to I/Os. In this context, "I/O" is regular select-based I/O
orchestrated by the waitingEngine and not direct disk reading or writing
that Store code might do during rebuild.

I doubt this should be fixed by allowing I/O into the "process current
non-I/O events" engines loop. The problem is not in the lack of I/Os,
but in an unrealistic scheduling expectations by non-I/O code (or lack
of API to schedule those rebuild events on a realistic schedule).

> There is a little premature
> optimization there skipping the primary engine while checking all
> others, causing comm to starve to death.

It was not an optimization IIRC. The idea was that Squid should process
all non-I/O events that are ready "now" _before_ generating new events
triggered by new I/O. We also wanted the select loop to run once, with a
timeout as opposed to twice (per main loop iteration).

> Fixed by this change. I think it's correct but review please.

The change does not really fix the bug if I understand the bug
correctly. It just masks it. Squid would violate the intended "wait x
seconds before every rebuld step" schedule even more after this change.

More importantly, I am concerned that this change may inject new I/O
callbacks between old async calls if we still have some I/O callbacks
that are not done using async calls API. That is likely to create
hard-to-find bugs because the order of events (in a broad sense) will be
disrupted.

For example, the connection closing code is written under the assumption
that the very last async call scheduled by the closing code will indeed
be the last async call related to the corresponding descriptor. After
this change, this might no longer be true because some I/O processing
(for a different FD) may register new async calls for the same [closing]
FD. In the past, we struggled a lot to avoid such cases!

I can think of a couple of good approaches to solve this:

1) When entering the non-I/O engines loop, remember "now" and only
process timed events that are ready at that remembered time. Do not
advance that time value until we exit the loop and let I/O engine run.
This does not change the structure of the engines loop. Some events will
still be late, but there will be no disruptive injections of new I/O
activity that might not use async calls. This is probably a
smaller/easier change, but it also masks the true problem.

2) Allow [Store] to schedule events "ASAP but after the next I/O loop".
This will fix the schedule as the scheduled events will no longer be
"late". This solution preserves the engines loop structure but requires
an additional/new eventAdd API.

I think #2 is a cleaner solution to the problem at hand but perhaps #1
and #2 can be combined to prevent similar bugs caused by other code
(that this fix will not rewrite to use the new "after next I/O" API from
#2).

What do you think?

> diff -ru squid-3.2.3/src/EventLoop.cc squid-3.2.3-dpl/src/EventLoop.cc
> --- squid-3.2.3/src/EventLoop.cc 2012-10-20 15:39:49.000000000
> +0300
> +++ squid-3.2.3-dpl/src/EventLoop.cc 2012-12-03 01:52:25.381523614
> +0200
> @@ -111,7 +111,6 @@
> // generate calls and events
> typedef engine_vector::iterator EVI;
> for (EVI i = engines.begin(); i != engines.end(); ++i) {
> - if (*i != waitingEngine)
> - checkEngine(*i, false);
> + checkEngine(*i, false);
> }
>

> With this
> * original timeouts works fine
> * rebuild is now done in background as expected

Yes, but other things will probably break in subtle ways unless we are
certain that the waitingEngine (i.e., select loop) no longer triggers
callbacks that do not use async calls and, hence, cut in front of the
async call queue. This change also results in waitingEngine being called
twice in a row (once without a timeout and once with a timeout IIRC),
which feels like a bad idea from efficiency point of view.

> Also in testing this I observed that fast ACLs do not do inverse of last
> rule on no match. Excplicit allow/deny all rule is needed. Don't think
> this is intentional.

Definitely not! There are some serious known ACL processing bugs, but
this is not one of them.

Thank you,

Alex.
Received on Mon Dec 03 2012 - 05:28:09 MST

This archive was generated by hypermail 2.2.0 : Mon Dec 03 2012 - 12:00:06 MST