RE: small patch for async writes

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Tue, 15 Mar 2005 04:40:56 +0100 (CET)

On Tue, 15 Mar 2005, Steven Wilton wrote:

> Just to check that I understand correctly, if there was an error doing a
> write once close_request is set to 1, it is object will (most likely) still
> be incomplete on disk when squid thinks the object is completely written to
> disk.

No, in such case the I/O callback will tell the core the write failed and
the object gets discarded. Any disk clients currently reading from the
object may get a short object however.

The problem I was talking about in the blocking case is when this has
recursed, which can easily happen initially as writes gets queued up while
the file is opened

    aufsWriteDone()
    ->storeAufsKickWriteQueue()
      ->aufsWrite()
        ->file_write
          ->aufsWriteDone(error)
            ->storeAufsIOCallback(error)
              this call frees the storeIOState
    <-~~~~~~~~
    if (close_request)
      storeAufsIOCallback(ok)

As you can see above, when this call chain unwinds aufsWrite ends up
accessing a freed storeIOState in the last few lines (close_request etc),
and may also make a false second I/O callback indicating success. It also
calls storeAufsKickWriteQueue() on the freed I/O state..

Conditions to trigger this isse:

   - ASYNC_WRITE not enabled (the default)

   - A storeAufsCreate aioOpen() call takes relatively long time to
complete (saturated system)

   - Object is fetched relatively quickly, making a queue of writes build
up waiting for the create to finish (fast server serving object > 4KB)

   - Disk almost full or other I/O error condition, making the now queued
writes fail partly into the queue (not on the first block).

> The reason for doing this work is that I was doing tests and getting around
> 6 megabytes/sec for cache misses, and 700kilobytes/sec for cache hits using
> diskd. I tried enabling async opeartions (using aio_read and aio_write in
> glibc) and an internal read cache in diskd to see if that would improve
> performance, with no success. Changing to aufs caused the cache hits to
> download at 7 megabytes/sec, without affecting the other load on the
> machine. Is it normal to see such a big performance increase on cache hits
> using aufs compared to diskd?

Depends on your request load.

diskd in it's current shape will have trouble with the speed of cache hits
if the request load is in the lower range if I am not mistaken, this due
to Squid returning and waiting in the network I/O loop between blocks of
data. aufs had the same problem some time ago and iirc resulted in an
isolated cache hit speed of ca 700Kbyte/s. You can test if this is the
case by making sure there is a high speed cache miss in parallell. If the
problem is the same you should see a significantly better hit speed..

aufs can also push the drive somewhat harder than diskd, simply from the
fact that it can execute more than one concurrent operation at the time.
Have been able to fully saturate the hardware using aufs. But this does
not make any difference when only processing a handful of requests.

So I am not aware of any huge differences between the two when under a
normal proxy workload in the ranges where async disk I/O is needed, but my
opinion is that aufs should be somewhat faster but not very much.

but ot be honest I haven't used diskd very much. With aufs being partly my
baby and it running very well on Linux I am kind of biased to using this.
diskd is mostly of interest to FreeBSD guys as there aufs won't work that
well..

on a related note: The CPU consumption of aufs on can be reduced quite
significantly compared with what we have today, primarily on cache misses
but in theory also on cache hits. (see the bug report)

Also, what kills harddrive performance is mainly seeks, and any request
pattern involving largeish files will give significantly better results
than a request pattern made or mostly randomly accessed small files. This
assuming the data set doesn't fit in memory.

Regards
Henrik
Received on Mon Mar 14 2005 - 20:40:58 MST

This archive was generated by hypermail pre-2.1.9 : Fri Apr 01 2005 - 12:00:04 MST