COSS outline

From: Adrian Chadd <adrian@dont-contact.us>
Date: Tue, 21 Aug 2001 09:29:59 -0600

Hi guys,

Here are the basic COSS notes I just committed to the diskio branch
on sourceforge. Hopefully it fills in some gaps.

Adrian

----- Forwarded message from Adrian Chadd <adrian@ywing.creative.net.au> -----

Date: Tue, 21 Aug 2001 17:24:45 +0200 (CEST)
From: Adrian Chadd <adrian@ywing.creative.net.au>
To: adrian@squid-cache.org

COSS notes

Adrian Chadd <adrian@creative.net.au>

$Id: coss-notes.txt,v 1.1.2.2 2001/08/21 15:22:38 adri Exp $

COSS is a Cyclic Object storage system originally designed by
Eric Stern <estern@logisense.com>. The idea has been extended
and worked into the current framework by myself.

In these notes I'll discuss the current implementation of COSS
and note where the implementation differed from Eric's original
idea and why the design changes were made.

COSS basics
-----------

COSS works with a single file. Eventually the file may actually be
a raw disk device, but since squid doesn't cache the disk reads
in memory the OS buffer cache will need to be employed for reasonable
performance. For the purposes of this discussion the COSS storage
device will be referred to as a file.

Each stripe is a fixed size an in a fixed position in the file. The
stripe size is a compile-time option.

As objects are written to a COSS stripe, their place is pre-reserved
and data is copied into a memory copy of the stripe. Because of this,
the object size must be known before it can be stored in a COSS
filesystem. (Hence the max-size requirement with a coss cache_dir.)

When a stripe is filled, the stripe is written to disk, and a new
memory stripe is created.

When objects are read back from the COSS file, they can either come
from a stripe in-memory (the current one, or one being written),
or from the disk. If the object is still in a memory stripe, then
it is copied from memory rather than read of disk.

If an object is read from disk, it is re-written to the head of
the current stripe (just as if it were a new object.) This is required
for correct operation of the replacement policy, detailed below.

When the entire COSS file is full, the current stripe again becomes the
fist stripe in the file, and the objects in that stripe are released.
Since the objects on disk are kept in a strict LRU representing the
replacement policy LRU linking the StoreEntry's together, this simply
involves walking the tail of the LRU and freeing entries until we
hit an entry in the next stripe.

COSS implementation details
---------------------------

* The stripe size is fixed. In the original COSS code, Eric optimised
  this a little by allowing the stripes to be truncated to not
  waste disk space at the end of the stripe. This was removed
  to simplify the allocation code slightly and make things easier
  when the store log and checksums are combined in the stripe
  for faster rebuilds.

* COSS currently copies object memory around WAY too much. This needs
  to be fixed eventually.

* It would be nice if the storeRead() interface were a little smarter
  and allowed the filesystem to return as much of an object as possible.
  This would be good for COSS since the read from disk could be simplified
  to use a single OS read() call - this would work really well for
  the object types COSS is designed to cache.

* The original coss code used file_read() and file_write() for disk IO.
  The file_* routines were initially used to implement async disk IO,
  and Eric probably wrote some async disk code for windows.
  I've written a very very simple async_io.c module which uses POSIX
  AIO to implement the async IO. POSIX AIO is well-suited to the
  disk IO COSS performs.

COSS direction
--------------

Eventually, when more of squid is rewritten, I'm going to replace
the replacement policy with something a little more flexible.
A shortcut would be to use a slab allocator and have one slab per
stripe for the StoreEntry's. When it comes time to replace a stripe,
you can just treat the stripe as an array. This would not work
well in the current squid codebase, but it would work well in the
planned rewrite. This would also allow alternate replacement policies
to be used. Oh, it'd cut down the storage requirements per
StoreEntry by two pointers (8 bytes on the i386.)

----- End forwarded message -----
Received on Tue Aug 21 2001 - 09:29:59 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:14:14 MST