distributed caching

From: Roger Venning <r.venning@dont-contact.us>
Date: Fri, 23 Mar 2001 23:41:45 -0500

I've been thinking just a tiny bit about distributed caching again. Some of
you might have seen the central squid server concept that SE Net of Adelaide
had supported work on (http://www.senet.com.au/css). This was essentially
a centralized cache digest aggregation point, queriable via ICP. I'm not
sure
whether the cost of having a separate well memory-resourced box is worth
the benefits of cache-digests (although of course memory has now dropped
below $1AU per MB... I'm young but can remember when even disk was more
expensive than that).

Essentially for a loose confederation of organisations that are prepared
to act as siblings the problems can be in my (largely uninformed,
_correction and additions desired_) opinion:

o benefits of having large distributed cache a largely negated by the fact
that no-one is prepared to run in 'proxy-only' mode, and so all caches
move to a state where they hold the same objects

o ICP traffic between siblings is n^2, although multicast helps by halving
this (they all have to reply right?). Unreachable peers impose performance
penalities on your own clients (admittedly minimised). Slow peers
continually
impact performance. If your sibling aren't running well dimensioned links...
Of course how many people have got multicast going?

o Cache digests solve most of the above problems, but suffer from becoming
outdated, and issues of accuracy, due to the update interval/size/bandwidth
saving tradeoffs.

In order to overcome the first problem, I think that a method of running a
cache in an intermediate state between 'proxy-only' and normal cache
those objects that are cacheable mode might be useful. I suggest that this
could be done by using past popularity as a indication of future popularity,
and that 'highly popular' objects could migrate into multiple positions in
a distributed cache, while unpopular objects are left on a single cache.

This could be done by keeping popularity state, 'inferring' from last access
time, or done stochastically(?) by simply assigning a 'proxy-only
probability' -
but the number of requests for a single object will normally be too low
for this
last idea to work very successfully as far as I can tell.

I think there are elements of the Central Squid Server (CSS) that attack
the last two points, especially the fact that CSSs could be themselves
formed
into a hierarchy, so a CSS could be kept regionally. The 'proxy-only
probability'
idea could be implemented separately.

Finally, all of ICP, cache digests and CSS are based around HTTP/1.0 style
objects, as recognised by HTCP. Does anyone have estimates of what
percentage of objects are unable to be located by ICP? (Does this make
sense?)

Roger.

 
-------------------------------------------------------------
Roger Venning \ Do not go gentle into that good night
Melbourne \ Rage, rage against the dying of the light.
Australia <r.venning@bipond.com> Dylan Thomas
Received on Fri Mar 23 2001 - 05:34:47 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:40 MST