Re: 2.3.DEVEL2 now on the web site

From: Arjan de Vet <Arjan.deVet@dont-contact.us>
Date: Wed, 4 Aug 1999 07:50:25 +0200 (CEST)

In article <37A7555B.7C357162@hem.passagen.se> you write:

>> One of the most significant changes is with Async IO. I'm not sure how
>> robust it is. It certainly didn't work well under FreeBSD, but I tend
>> to blame that on either FreeBSD pthreads bug, or an incompatibility
>> with Squid's threading.
>
>Does your FreeBSD box have kernel based pthreds? The ones I have looked
>at only had userspace pthreads, which is of no use for async-io. I have

FreeBSD has only userspace pthreads at the moment. They never worked
really well with Squid but I recently saw some messages on the cvs
commit mailing list about new bugfixes; haven't tried those yet.

>seen some references to a project porting Linux Threads to FreeBSD, but
>I could find very little information on it (the project seemed to have
>stalled),

I tried it some time ago, so far I could see it worked OK with Squid.
The URL is http://lt.tar.com/.

FreeBSD has however an experimental implementation of POSIX 1003.1B
AIO/LIO. Christopher Sedore is currently improving that implementation,
see the message below. He made a comparison between the select() based
approach and an AIO based approach, the results look interesting
although I have no idea whether the test he did resembles the I/O
behavior of an application like Squid correctly.

Arjan

-----------------------------------------------------------------------------
From: cmsedore@mailbox.syr.edu (Christopher Sedore)
Subject: async io and sockets update
Date: 29 Apr 1999 07:15:21 +0200
Message-ID: <Pine.SOL.3.95.990428202157.17938A-100000@rodan.syr.edu>
Reply-To: Christopher Sedore <cmsedore@mailbox.syr.edu>

I've mostly finished what I set out to do with the kernel aio routines.
Below is a summary:

1. I've added a new system call, aio_waitcomplete(struct aiocb **cb,
struct timespec *tv). This system call will cause a process to sleep
until the next async io op completes or the timeout expires. When the
operation completes, a pointer to the userland aiocb is placed cb. This
makes fire and forget async io programming both possible and easy.

2. I've changed the way that async operations on sockets get handled.
        a. Sockets are checked to see if the operations will complete
           immediately. If not, they are placed on a separate queue are
           processed when upcalled by the sowakeup routine.
        b. When upcalled as writeable all pending writes are moved to the
           regular io queue to be processed.
        c. When upcalled as readable, reads are executed in the upcall
           routine as long as the socket stays readable.

3. I believe I fixed a bug in aio_process that would allow it to try to
execute operations on descriptors that have been closed, causing a panic.

Notes:

Ideally, operations on sockets that would complete immediately should be
executed during the aio_read system call, and the results made ready to be
picked up later.

Benefits:

The old aio code would pass socket operations on to the aio daemons
immediately, causing them to block (sbwait). Once you blocked the maximum
number of aiod's, no more operations would progress until one of the aiods
could complete an operation.

This methodology can be significantly faster than using select() to poll
sockets. A simple test program showed that before optimization 2c above,
the async io routines would only be faster than select() after about 37
descriptors were being monitored. With optimization 2c, async io is
faster for the all testing I did (I did not test with less than 10
descriptors).

The performance difference (again with a simple test program) between aio
and select() for reading looks something like this:

        select() aio_read()/aio_waitcomplete()
num fds kb/s secs kb/s secs
10 26315 19 35714 14
20 20833 24 35714 14
30 17241 29 33333 15
40 14285 35 33333 15
50 12195 41 33333 15
60 10416 48 33333 15
70 9259 54 31250 16
80 8196 61 33333 15
90 7575 66 31250 16
100 6944 72 33333 15

select() continues to trail off up to 250 descriptors, while aio shows no
significant degradation. Note that using aio_suspen instead of
aio_waitcomplete would probably be non-trivially slower than
aio_waitcomplete, but still faster than select on large numbers of
descriptors (though it might not be much faster, depending on the order
that operations completed vs the order of the pointers to them passed
into aio_suspend).

The test program simply creates the requested number of descriptors using
socketpair(), and either places an outstanding aio_read on each, or puts
each in an fd_set for select. Then, descriptors are chosen at random()
out of this set, and written to. aio_waitcomplete or select() are used to
get the [completed aio_read aiocb/fd to read], and then [aio_read is done
again/the fd_set is reset]. The tests above were done with 1000000 writes
of 512 bytes each, and a corresponding read of 1000000 buffers of 512
bytes each.

One remaining problem with the aio code is that aio operations won't
"cross over" to other kernel threads, because they are based on the procs
that issue them, rather than the file descriptor itself. I may
investigate creating a variation of NT's io completion ports to enable
async io with kernel threads.
           
I don't think that the modifications are too invasive. There are numerous
mods to kern/vfs_aio.c, some mods to uipc_socket.c and uipc_socket2.c and
small changes sys/aio.h, and sys/socketvar.h. (Plus the syscall
addition). I hope to do some more tweaking and see if I can get some one
to look it over with an eye to committing some or all of it.

-Chris
Received on Tue Jul 29 2003 - 13:15:59 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:16 MST