COSS work update

From: Adrian Chadd <adrian@dont-contact.us>
Date: Sat, 13 May 2006 11:37:44 +0800

Hi everyone,

I've made progress with my COSS work:

* COSS now seems to work fine with 64-bit file offsets; I'm doing all of
  my testing with 10 gigabyte stripes and I'll test it with larger once
  the code is stable

* I've fixed the object reading race condition: this needed me to flesh
  out the internals a little to look like a 'real' filesystem. Read operations
  now result in creating a 'readop' which can either be completed immediately
  or are put on hold pending the completion of another read (the object relocate
  read.) This seems to work fine.

* I've removed the xmalloc/xfree stuff Eric did for object reads; all object
  reads are now copied straight out of a membuf. I had to change the membuf
  semantics to allow them to hang around after a write has completed.

* Little fixes here and there

Whats left!

* The 'other' race condition - object reads from disk being scheduled from an
  area that is just about to be written over - needs to be fixed. This doesn't
  happen often at all but I'd prefer it to be fixed for completeness.

* Fix dirty/clean rebuilds to work properly. My local tree writes the object
  size out, if known, when writing out the TLV swap metadata. I've hacked
  up a little utility which lets me read a COSS stripe - since stripes now
  begin at fixed multiples in the COSS file I can parse the store a stripe
  at a time. This gives me some hope in figuring out how to complete a
  COSS dirty rebuild.

* POSIX AIO scheduling: the default 128 pending aiops get used up really
  quickly when the cache is full; they're almost always going to be read
  events. Something 'better' needs to be dreamt up to better schedule the
  AIO calls. I'll leave this until after I've fixed everything else; this
  way I can ask for testers and I won't have to worry (too much, I hope!)
  about fixing random crashes.

* COSS only caches objects that specify their size up-front; this doesn't
  happen very often. I haven't yet done any analysis to get exact numbers
  but I think the store layer might need a little tweaking to allow some
  kind of "delayed" swap. Again I have no idea if this has been implemented
  in Squid-3 but I'll do a proof of concept in my local 2.5 tree, test it
  out and provide feedback here so we can all discuss it.

Adrian
Received on Fri May 12 2006 - 21:39:34 MDT

This archive was generated by hypermail pre-2.1.9 : Thu Jun 01 2006 - 12:00:04 MDT