[RFC] cache architecture

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 24 Jan 2012 15:24:10 +1300

This is just a discussion at present for a checkup and possibly
long-term re-design of the overall Architecture for store logics. So the
list of SHOULD DO etc will contain things Squid already does.

This post is prompted by
http://bugs.squid-cache.org/show_bug.cgi?id=3441 and other ongoing hints
about user frustrations on the help lists and elsewhere.

Getting to the chase;

  Squids existing methods of startup cache loading and error recovery
are slow with side-effects impacting bandwidth and end-user experience
in various annoying ways. The swap.state mechanism speeds loading up
enormously as compared to the DIRTY scan, but in some cases is still too
slow.

Ideal Architecture;

Squid starts with assumption of not caching. Cache spaces are loaded as
soon as possible with priority to the faster types. But loaded
asynchronously to the startup in a plug-n-play design.

1) Requests are able to be processed at all times, but storage ability
will vary independent of Squid operational status.
+ minimal downtime to first request accepted and responded
- lost all or some caching benefits at times

2) cache_mem shall be enabled by default and first amongst all caches
+ reduces the bandwidth impact from (1) if it happens before first
request
+ could also be setup async while Squid is already operating (pro from
(1) while minimising the con)

3) possibly multiple cache_mem. A traditional non-shared cache_mem, a
shared memory space, and an in-transit unstructured space.
+ non-shared cache_mem allows larger objects than possible with the
shared memory.
+ separate in-transit area allows collapsed forwarding to occur for
incomplete but cacheable objects
   note that private and otherwise non-shareable in-transit objects are
a separate thing not mentioned here.
- maybe complex to implement and long-term plans to allow paging
mem_node pieces of large files should obsolete the shared/non-shared
split.

4) config load/reload at some point enables a cache_dir
+ being async means we are not delaying first response waiting for
potentially long slow disk processed to complete
- creates a high MISS ratio during the wait for these to be available
- adds CPU and async event queue load on top of active traffic loads,
possibly slowing both traffic and cache availability

5) cache_dir maintains distinct (read,add,delete) states for itself
+ this allows read-only (1,0,0) caches, read-and-retain (1,1,0) caches
+ also allows old storage areas to be gracefully deprecated using
(1,0,1) with object count decrease visibly reporting the progress of
migration.

6) cache_dir structure maintains a "current" and a "max" available
fileno setting.
     current always starting at 0 and being up to max. max being at
whatever swap.state, a hard-coded value or appropriate source tells
Squid it should be.
+ allows scans to start with caches set to full access, but limit the
area of access to a range of already scanned fileno between 0 and
current.
+ allows any number of scan algorithms beyond CLEAN/DIRTY and while
minimising user visible impact.
+ allows algorithms to be switched while processing
+ allows growing or shrinking cache spaces in real-time

7) cache_dir scan must account for corruption of both individual files,
the index entries, and any meta data construct like swap.state

8) cache_dir scan should account for externally added files. Regardless
of CLEAN/DIRTY algorithm being used.
    by this I mean check for and handle (accept or erase) cache_dir
entries not accounted for by the swap.state or equivalent meta data.
+ allows reporting what action was taken about the extra files. Be it
erase or import and any related errors.

Anything else?

Amos
Received on Tue Jan 24 2012 - 02:24:16 MST

This archive was generated by hypermail 2.2.0 : Tue Jan 24 2012 - 12:00:08 MST