Re: Cache Digests from Stephen Baxter on 1998-09-11 (squid-users)

From: Stephen Baxter <steve@dont-contact.us>
Date: Sat, 12 Sep 1998 09:55:37 +0930 (CST)

> > Wouldn't it be better for the digest to be used as really good guess
> > mechanism for ICP. So an object is looked up in all of the digests and
> > found that it may be in squid1, squid5 and squid7 - not bad if there are
> > 20 caches in the mesh - heaps less ICP.
>
> This is already implemented. We scan all the peers and select those with a
> HIT reported by their digest. Then we apply time measurements from NetDB to
> select the best peer if several had HITs in their digests.

I was thinking of using ICP to make sure. This will greatly reduce false
hits to the point where the most common reason they would happen is from a
squid mesh having non common refresh patterns.

> > Instead of using a lossy mechanism such as digests for absolute resolution
> > of an object location let ICP kick in and finish the job for you. The
> > result is fewer and better targetted ICP packets.
>
> Why would you want to double-check digest's guess? In most cases, you end up
> with the same result, and you do not get 100% insurance for false hits
> anyway. Plus you pay for ICP round trip time (at least!) which we were
> actually trying to avoid...

I see the problem with ICP as being its sheer volume and due to this fact
it does not scale awfully well, on one of our squids we have :

525490 recorded accesses where 391033 are ICP_QUERY, this tends to
overwhelm the cache doing work for other people - we have 9 siblings.

We are approaching our work from the point of view that the LAN (same ISP
squid peering and Internet Exchange) is always fast and no or little
cost while WAN (peering between ISPs is other regions or between Internet
Exchanges) is typically fast but not all that cheap to use !

> > Just an idea - it is the same way we are implementing the smart neighbour
> > - in order to get a really hit on the location of the object.
> > This would instanly remove the need to mod squid for false hits !
>
> There is NO algorithm that guarantees the absence of false hits in a
> distributed environment. Some algorithms have better false ratios, some worse,
> that's all. Squid must handle false hits. Also note that a false-hit-like
> situation arises when your peer is temporary down or canceled your request
> for some reason.

OK

> I strongly believe that instead of designing complex and heavy bullet-proof
> algorithms, we should use lightweight techniques and then handle 1-5% of
> exceptional situations in a robust way. Let's optimize for the common case.

On digests (we haven't tried them yet) does anybody have any feedback on
the CPU power and time to generate a digest and what is the best time
interval between digest updates ?

------------------------------------------------------------------
Stephen Baxter SE Network Access/Big Networks Australia
phone : +61 8 8221 5221 222 Grote Street
fax : +61 8 8221 5220 Adelaide 5000, Australia
Received on Fri Sep 11 1998 - 16:27:33 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:41:57 MST