Re: Async I/O

From: Alex Rousskov <>
Date: Tue, 15 Sep 1998 23:35:19 -0600 (MDT)

On Wed, 16 Sep 1998, Henrik Nordstrom wrote:

> I have a hard time beleiving the request structure pool in async_io.c
> dried out in your tests. See below.

Believe it. I should have saved a log full of (added by me) messages that
free_list is null.

> Looking at the code yet another time I see that aioRead/aioWrite is not
> very well done. There is a attempt at handling multiple IOs for one FD, but
> if both a aioRead and a aioWrite issued on the same fd then the result is
> unpredictable. But then again, it should not happen...

Looks like I should just shut up and wait till all my points will come
across. :)

> I did some testing, and it looks like I thought it would. Squid never
> issues two IO operations on the same FD. What I did discover (and should have
> noted earlier) is that async_io.c request structures are eaten by finished
> closes until the main thread notices that the close is completed, but this
> should not be a problem unless you are running with a huge amount of threads
> compared to number of filedescriptors (SQUID_MAX_FD).

I think what you may be missing is that request structures are also eaten by
_pending_ requests. With out-of-the-box Squid, I can dry the request pool
with virtually any number of threads (small or big). Just give me the fast
network and slow disk.

> only if the async_io.c requests pool is empty, or if there already is a
> active operation on this FD, both of which should not happen.

The first one used to happen when disk could not catch up with the network.
You probably can simulate that by inserting sleep() calls in thread_loop or
some of the _do_ functions. (We got the effect by stressing Squid with
cachable miss load on 100Mbs Ethernet).

> > - diskHandleWriteComplete() seems to call commSetSelect() if
> > /* another block is queued */
> True. This is most likely not the right thing to do here. I actually did
> beleive this called aioWrite again, but it doesn't.

You cannot call aioWrite, you have to _wait_ somewhere (or enqueue and
process the queue when the IO is finished).

> I have no idea why the code is written like it is, async or not.

Must be The Real Life intrusion.

> > Yes, this would be the hardest part along with updating the offsets. Simple
> > but not efficient if done in the main thread. Tricky but efficient if done by
> > other threads.
> I do not agree here. I would say both simple and efficient if done in
> the main thread.
> * Main thread knows about which writes are being made, and when they
> complete.
> About the only thing we need to add locking to is the object data.

The problem is that communication between main thread and children may eat
all the benefits of combining IOs. Locking the object data does not appeal to
me either.

> * If done by the threads then more locking would be needed, making it
> both inefficient and complex.

If so, yes. But I still _hope_ there is a cheap, virtually-no-locking
solution. I do not have one though.

> I think queue limits are a better measure than pending swapouts, but
> obviously not the raw queue-length values. A more appropriate value
> is probably (max?) queue length average for some period of time

From the queue length traces I saw, I would say that would not work (unless
you are OK with thousands of requests queued when variation _may_ be
relatively small). Averages work only when the variance is finite and
relatively small. However, I do not have a hard proof that your suggestion
will not work, of course.

> I wouldn't say it is very complicated. It is 2 passes throught the
> available disks until a suitable disk is found. If none found then
> there is no suitable disk available.

Complexity (here) to me is not the amount of CPU work (that's cost), but
rather my ability to predict the side effects of the given algorithm. From
the first glance, I cannot understand what the side effects (if any) your
algorithm has, and I've seen huge effects from (good or bad) load balancing.
So I will step aside and let others decide...
Received on Tue Jul 29 2003 - 13:15:54 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:55 MST