Re: Async I/O

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Mon, 14 Sep 1998 00:35:40 -0600 (MDT)

On Mon, 14 Sep 1998, Henrik Nordstrom wrote:

> Now I understand what you was refering to. This is if the I/O queue
> grows
> to large for async_io.c to handle (more than SQUID_MAXFD pending I/O
> operations in total). It is not when there is more than one outstanding
> I/O operation for the same FD.

Both, unfortunately. There are two "if (wrong) return EWOULDBLOCK" statements
in aioRead/Write, I think.
 
> Note that the "queue" in async_io.c is only a pool of free request
> structures, not limited to any particular file descriptor.
> If you are seeing this pool dry out then there is certainly a large
> problem somewhere.

The pool "dried out" almost immediately under load in our tests.
Unfortunately, it took me a while to discover that... The "large" problem is
the one I described in previous e-mails: fast-network / slow disk. We
partially solved it by introducing configurable swap-out limits.
 
> > > And I do not beleive select is currently used on async-io disk
> > > operations..
> >
> > See above. Please correct me if I am wrong.
>
> See above.

See above. :)
Two cases: fixed-size pool of request structures is empty OR two IOs
got submitted for one FD.

> > How does it "wait" then??
>
> By having write() queued in disk.c:file_write, and never more than one
> pending read() per fd by Squid design.
>
> When a write() is completed diskHandleWriteComplete is called which
> reschedules a aioWrite call. No select() here.

Here is how I see the select being called:
- diskHandleWrite() calls aioWrite() with diskHandleWriteComplete() as a
  callback
- aioWrite() may call diskHandleWriteComplete() with EWOULDBLOCK
- diskHandleWriteComplete() seems to call commSetSelect() if
  /* another block is queued */

Again, there are a lot of "if"s in-between so I could easily miss something.

> How do you define perfect? ;)

"being entirely without fault or defect, satisfying all requirements, and
corresponding to an ideal standard or abstract concept"!
 
> I think the hardest part is to ensured that the memory is not reused
> until the call is completed. The current async-io code copies all data
> between the main thread and I/O threads, avoiding any locking of memory
> buffers.

Yes, this would be the hardest part along with updating the offsets. Simple
but not efficient if done in the main thread. Tricky but efficient if done by
other threads.
 
> In my tests I gained more from having a huge number of threads than from
> queueing several operations on one thread, but this was without the
> disk bypass issue, and on a platform where context switches is quite
> cheap.

Interesting.
 
> > Agree 100%.
> > Moreover, if possible we should avoid all sync-IOs when async-io is enabled.
> > The current code is happy to call sync-ios if something goes a bit wrong.
>
> Are you refering to the select() issue here, or something else?

Select() plus close() calls in many aio* functions executed by the main
thread. There might be others that I overlooked.
 
> Today all swapin/outs are done in 8K chunks and I still beleive that
> there is a gain in having larger I/O chunks for async-io.

Yes, ideally we should write everything that is available to a thread in one
IO.
 
> No we don't. Saturated is when the disks response time is to high.
> This is most easily measured in number of outstanding operations (on
> threads + queued for threads).

OK. I have implemented the queue limits, but found that they do not work very
well, probably because information about requests-to-come and actual queue
length cannot be synchronized well. The queue keeps fluctuating and going
off-limits a *lot*. We (and the user) might be better off configuring Squid
in terms of outstanding swap-ins/swap-outs (not individual requests). I have
implemented the limit on swap-outs and found it working well, much better that
the limit on queue length. The two approaches could be combined though.
 
> To maintain priorities between operations you need queues. To actually
> saturate the disk you only need a number of threads banging at it. The
> balance between number of threads and queue size is a matter of CPU
> usage and how the threads interact with the I/O queues.

Agree. Note that our task is not just to saturate the disk, but get the most
out of it. The latter can be accomplished by introducing priority queues and
saturating the disk with the "right" requests. The notion of "right" depends
on user's priorities and, thus, should be semi-configurable.
 
> If we then define when a disk is available then we get three cases:
> 1) There is a idle thread
> 2) The wait queue for the disk is not to large
> 3) The disk is saturated (to large queue)
>
> then this can be used to get a much more even distribution that extends
> to all disks when load grows, regardless of space distribution.
> a) the disk that has most free space and a idle thread
> b) the disk that has most free space and not saturated
> (or some weighting between (a) and (b) )
> c) bypass disk

Sounds a bit complicated to me, but we can try it. Just note that there is no
up-to-date information on actual queue length that will not change while you
are making your decisions and queueing requests (see the problem with queue
length limit above).
 
> That patch should be seen as a statement. I do hate the client
> interactions with pump.c.

Then let us all pray that pump.c is the last Squid module you hate! :)

Alex.
Received on Tue Jul 29 2003 - 13:15:53 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:55 MST