Proposed COSS patch

From: Steven <swilton@dont-contact.us>
Date: Tue, 1 Aug 2006 19:52:01 +0800 (WST)

As mentioned yesterday, I've been doing a fair amount of work on COSS over
the past few weeks. Attached is a patch that I've come up with that fixes
a couple of bugs, and adds a couple of configuration options to COSS.

The disk load and I/O wait times reported by the kernel have dropped to
10% of theie original values after converting from 3 * reiserfs+aufs
cache_dirs to 2*COSS (direct partition access) + 1 *
reiserfs+aufs. (interestingly enought, the load on the reiser partition
has dropped by a similar amount due to the fact that it's only caching
larger objects). We are seeing slightly reduced hit rates compared to
having all *ufs cache_dirs, but I'm pretty sure this is related to the
size of the cache (22Gb total cache size).

We are running this patch on a couple of our production servers, and it
appears to be stable so far. Once it's been running for a few more weeks
and on a larger number of caches (including caches with 160Gb cache size)
I'll post back with updates on the hit rates and disk loads.

I'll go through a quick rundown of what it does:

 - Fix for bug 1680:
http://www.squid-cache.org/bugs/show_bug.cgi?id=1680

When COSS reads in all objects currently stored in the cache, it needs to
store the length of the data + headers (and not just the size of the
data). The last chunk of the patch to store_dir_coss.c fix this problem.

 - Fix for race condition under high load
Bug not found in bugzilla, but I encountered it in testing.

When relocating data onto the head of the disk, it was possible for the
destination buffer to disappear from memory if the client cancelled their
request before the data was read from disk. This caused an
assert() error.

The addition of storeCossMemBufLockPending() and
storeCossMemBufUnlockPending(), along with the extensions to the pending
struct are one solution to this problem that appears to work.

- New features (overwrite-percent, membufs and max-stripe-waste)
The first 2 options allow us to fill the COSS partitions to >90% of
the available space (compared to 56% without the patch). This provided
a large improvement in hit rates on our caches (at the expense of breaking
the LRU algortihm slightly).

The third option allows us to store larger objects (probably up to
512k) in the COSS cache_dir without wasting large segments of disk space
to stripe alignment overflows.

I'm currently testing a few different configurations to work out what
gets the best hit rate, and I'll update once I have some data.

- Reworking of the load calculation
* this does touch code outside of the COSS tree *

We needed to rework the load calculation to avoid COSS starving one disk
of new objects relative to another. As soon as we lifted the load above
0, the AUFS partition started getting preferred too much, so I moved all
load metrics into src/defines.h so they can be easily compared. I also
added this information to the data sent to cachemgr.

Any thoughts or comments would be appreciated.

thanks

Steven

Received on Tue Aug 01 2006 - 05:52:12 MDT

This archive was generated by hypermail pre-2.1.9 : Fri Sep 01 2006 - 12:00:03 MDT