Re: Cache Digests vs ICP

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Fri, 24 Apr 1998 10:37:25 -0600 (MDT)

--MimeMultipartBoundary
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Fri, 24 Apr 1998, Andres Kroonmaa wrote:

> I'd also propose sort of central index server.

1) I think a central server is a good idea for small tightly coupled cache
meshes with excellent network connectivity between caches, single managing
authority, and cheap bandwidth. In all other situations, "centralized" ICP
would introduce significant delays (even if only one or two ICP queries are
sent), is probably hard to manage, and it waists bandwidth.

2) The idea that ICP with a central index server will give 100% up-to-date
information is, IMHO, far from reality. There always be false hits and false
misses regardless of the indexing scheme in place. Current ICP implementation
has them. If you make ICP more complex by including more expire information,
this would reduce the number of false requests, but will not eliminate them.
I think a robust caching scheme must adapt to an uncertain environment rather
than trying to synchronize everything.

> In overall, this
> - would make main squid process much simpler compared to currently proposed
> digest exchange method.

Not much. ICP, Cache Digests, HTCP, and other peer selection modules are
relatively simple and small compared to the rest of Squid. (And you will still
need some of them to support a central index server).

> Also, much more lightweight, strict AND more uptodate.

Not sure how you define first two, but I really doubt one can improve
synchronization _significantly_ from the current levels of ICP and Cache
Digests.

> - will make squid main more static, that is move much of ICP handling to ICP
> servers, thus more reliable and will let to focuse on performance issues more
> deeply.

Again, current ICP module is relatively simple, not much you can save by
replacing it with a "simple ICP" module. Also, centralized solutions are
rarely more reliable than distributed ones.

> I can see use for hierachies that have 100-200+ cooperating caches, using central
> ICP server(s) (as they may filter incoming updates based on timestamps, too many
> caches holding same objects, remote cache metrics, preferences, etc) and still
> being resource-efficient.
> But I can not see any way to peer same amount of caches using digests implemented
> in main squid processes. mirroring indexes of 100-200 cahes in every peering box
> makes too much redundant data, makes it too complex to keep current.

It depends, I guess. If you have 200 peers they are probably all very small.
Thus, to "digest" each peer, one would need just a few KB of RAM. A few MB
total. Not a big deal. Again, I agree that a central server might work in such
situation, but I am not sure it will work significantly better than Cache
Digests.

> Also, digests with their high false-positives makes almost impossible to establish
> peerings between competing ISPs, unless squid can cope with miss-denies, but then
> use of digests loose lots of their value, as tcp-miss connects waste bandwidth
> and time.

Current implementation of Cache Digests has very low false hit ratio (usually
below 5%). The overhead of a false hit is not very high if persistent
connections between peers are working. We are working on decreasing false hit
ratio a bit more, but, again, there is a practical limit of a few percent that
one cannot overcome in a distributed environment, I guess. And, yes, Squid
must/will handle "miss-denies" _regardless_ of the peer selection algorithm in
use.

> IMHO, we'd wish to implement digests or ICP central lists in separate daemon
> and develop it somewhat separately from the main squid...

Agree. However, I would suggest to narrow the scope of central server
application first (e.g. large meshes of small close caches connected by a fast
network with a single managing authority). In a long run, it may help to
design and build a better product compared to a "general" index server.

$0.02

Alex.

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:48 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:45 MST