squid-3 - cleanup of select_* stuff.

From: Robert Collins <robertc@dont-contact.us>
Date: Sat, 22 Apr 2006 22:23:06 +1000

I'd like to (slowly, I'm expecting about 4 hours a week on this) cleanup
the select loop in squid to allow things like using
poller/libevent/completion ports. I've been using twisted a bit over the
last couple of years, and their event loop - their reactor - has a lot
in parallel with the ACE reactor/proactor pair of patterns.
http://www.cs.wustl.edu/~schmidt/PDF/reactor-rules.pdf
http://www.cs.wustl.edu/~schmidt/PDF/proactor.pdf

Our current structure is partway between a reactor and proactor - we
have multiple reactor style objects - the async io thread stuff for disk
io, the single instance select loop.

Using the language of the proactor pattern..
 Right now we have one thread for each reactor-like thing - we have one
thread for the async disk engine when its in use, and one thread (the
main thread) for the comms queue. This is a little ugly because its
asymmetrical: theres nothing intrinsically special about sockets to make
them be the reactor in the main thread. I think its ugly because it
presupposes that our efficiency on sockets will be better than that on
disk. So I'd like to propose that we tweak our code so that we no longer
assume comms requests happen in the main thread.

One way of doing this is to have a dispatcher instance for each type of
event we can dispatch, and loop on them in the main loop. (this is a
trivial tweak to what we have today). If we also have objects to
represent each async-activity that occurs (i.e. a select loop, a poll l
loop, a completion-port loop), I'd like to give all async-operating code
paths an object to represent them.

Now, as we would like to not busy wait, we need to pass in a non-zero
timeout to any select/poll style calls, but this will cause latency if
other async activity does occur concurrently. So - each async engine
will have a method on it which can be called to 'cheaply' notify it of
activity that is occuring (i.e. the threaded-async disk engine can
inform the current primary engine that a disk io has completed).

And because we don't know a-priori whether an async engine is os-backed
(i.e. completion ports with overlapped I/O) or polled, there needs to be
a poll() or checkEvents() or similar method called on each engine once
per loops.

our main loop can then become something like:

while (!finished) {
  for (dispatchers::iterator i = dispatchers.first();
       i != dispatchers.end(); ++i) {
    i->dispatch();
  }
  for (engines::iterator i = engines.first();
       i != engines.end(); ++i) {
    i->checkEvents();
  }
}

This will have the following benefits:
 - We can properly support Overlapped IO on windows
 - We can change the engines in use at runtime - just keep
   an engine in the engines list until all the pending events on it have
   completed and then remove it.
 - management of the main loop and reconfiguration becomes conceptually
   clearer - we move the special casing out of the main loop and into
   specific event handlers. (processing signals like 'please
   reconfigure' becomes just another event that can be dispatched).

 I think we want a single completion dispatcher class for each group of
events that occur asynchronously. I.e. a dispatcher that knows how to
dispatch socket events, one that knows how to dispatch disk events, one
that knows how to dispatch timer events etc, one that knows how to
dispatch informational signals.

How does this sound in principal? If it sounds ok, I'll start doing a
series of small (a few hours each) patches heading in this direction.
One of the reasons I want to do this is to make it possible to write a
test harness that can exercise callback requiring code by having a
trivial controllable event loop that can be invoked in a test.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Received on Sat Apr 22 2006 - 06:23:42 MDT

This archive was generated by hypermail pre-2.1.9 : Mon May 01 2006 - 12:00:03 MDT