COSS questions

From: Adrian Chadd <adrian@dont-contact.us>
Date: Wed, 3 May 2006 21:46:32 +0800

Hiya,

Just a warning: This work has been done on Squid-2.5 - I've checked
the COSS code in Squid-3 and as far as I can tell nothing to fix this
has been checked in. I'd love to be proved wrong though. :)

I've noticed over the last few years that the COSS code, although
fast, is a tad bit unstable. It occasionally causes swapin failures,
which I've narrowed down to the two following conditions:

* An object is opened in the stripe immediately before the current
  stripe which, filling up, causes the next stripe to be deleted.
  If IO takes too long, and it does under my kinds of test loads,
  occasionally the aio_read() from the disk position happens /after/
  the membuf representing its new contents is blitted to disk
  (This one, as you guess, is pretty freakishly rare, and only really
  happens under extreme disk loads)
* This happens more often, and during object reallocation.
  Squid asks COSS for an object, COSS notices its not in an in-memory
  membuf and schedules a read + reallocate. A subsequent request for
  the object comes in before its data can be read from the old position
  and blitted into the membuf - and this next hit is a 'memory hit'
  and is copied from the membuf (which, unfortunately, contains lots of
  NULs.

Now, neither of these are particularly show-stopping, but I'd like
to fix em. I really do like the COSS model; it makes object
replacement really damned cheap, much cheaper than the cost of writing
popular objects back to the cache. The two showstoppers deploying it
are the two above problems and some rebuild-from-dirty code.
It really does perform damned well compared to ufs/aufs/diskd for small
objects (< 64k.)

So, does anyone have any suggestions?

Adrian
Received on Wed May 03 2006 - 08:49:54 MDT

This archive was generated by hypermail pre-2.1.9 : Thu Jun 01 2006 - 12:00:04 MDT