Re: Cache Digests vs ICP from Andres Kroonmaa on 1998-04-28 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Tue, 28 Apr 1998 22:08:55 +0300 (EETDST)

--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT

On 24 Apr 98, at 10:37, Alex Rousskov <rousskov@nlanr.net> wrote:
> On Fri, 24 Apr 1998, Andres Kroonmaa wrote:
>
> > I'd also propose sort of central index server.
>
> 1) I think a central server is a good idea for small tightly coupled cache
> meshes with excellent network connectivity between caches, single managing
> authority, and cheap bandwidth. In all other situations, "centralized" ICP
> would introduce significant delays (even if only one or two ICP queries are
> sent), is probably hard to manage, and it waists bandwidth.

Agree, and thats exactly what I meant by including peering index servers.
One index server should service only "well-connected" caches, and for
"all other situations" those index servers could exchange either ICP messages
or digests.

> 2) The idea that ICP with a central index server will give 100% up-to-date
> information is, IMHO, far from reality. There always be false hits and false
> misses regardless of the indexing scheme in place. Current ICP implementation
> has them. If you make ICP more complex by including more expire information,
> this would reduce the number of false requests, but will not eliminate them.
> I think a robust caching scheme must adapt to an uncertain environment rather
> than trying to synchronize everything.

Current false-hit rate is below 0.5% and thats ok, almost. Having caches send
notices about new objects as they arrive gives us realtime info about cached
objects. Of course, ICP messages can be lost and make index slightly out of sync
but IMHO it is still much less than exchanging digests periodically.

> > In overall, this
> > - would make main squid process much simpler compared to currently proposed
> > digest exchange method.
>
> Not much. ICP, Cache Digests, HTCP, and other peer selection modules are
> relatively simple and small compared to the rest of Squid. (And you will still
> need some of them to support a central index server).

Wait a minute, squid as a client of some index server needs only parts that
query the server and take use of the responses. thats almost all about it and
almost already in there.
Having it all inside squid makes it responsible for all the digest creation,
maintenance of them, expiring, rerequesting, sorting, etc... Is it so trivial?

> > Also, digests with their high false-positives makes almost impossible to establish
> > peerings between competing ISPs, unless squid can cope with miss-denies, but then
> > use of digests loose lots of their value, as tcp-miss connects waste bandwidth
> > and time.
>
> Current implementation of Cache Digests has very low false hit ratio (usually
> below 5%). The overhead of a false hit is not very high if persistent
> connections between peers are working. We are working on decreasing false hit
> ratio a bit more, but, again, there is a practical limit of a few percent that
> one cannot overcome in a distributed environment, I guess. And, yes, Squid
> must/will handle "miss-denies" _regardless_ of the peer selection algorithm in
> use.

That sounds good.

> > IMHO, we'd wish to implement digests or ICP central lists in separate daemon
> > and develop it somewhat separately from the main squid...
>
> Agree. However, I would suggest to narrow the scope of central server
> application first (e.g. large meshes of small close caches connected by a fast
> network with a single managing authority). In a long run, it may help to
> design and build a better product compared to a "general" index server.

Agreed 100%. ICP index server would be good for close caches under the same
administration. ICP index server of different administartions should peer
via either ICP or digests, whichever prove more appropriate. In any case,
any single cache box queries only one (closest) index server and thus needs
not worry about mixing data from different peers - its a job for index server.

What makes me prefer ICP is the possibility to send ICP messages back to caches,
like "drop your copy of this url, its old", cache boxes can come and go notifying
index server about that, and you can have strict mapping between URL and index
while digests can't. Besides, moving cache-mesh coordination out of squid allows
to relatively easily change index server algoritms almost on the fly.

----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
----------------------------------------------------------------------

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:48 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:45 MST