Re: [squid-users] Automatic StoreID ? from Alex Rousskov on 2014-03-11 (squid-users)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 11 Mar 2014 13:43:19 -0600

On 03/11/2014 01:18 PM, Nikolai Gorchilov wrote:
> On Tue, Mar 11, 2014 at 6:10 PM, Alex Rousskov wrote:
>> On 03/11/2014 08:05 AM, Omid Kosari wrote:
>>> Is it possible for Squid to automatically find every similar object based on
>>> something like md5 of objects and serve them to clients without need custom
>>> DB ?

>> No, because clients do not tell Squid what checksum they are looking
>> for.

>> It is possible to avoid caching duplicate content, but that allows you
>> to handle cache hits more efficiently. It does not help with cache
>> misses (when the URL requested by the client has not been seen before).

> Actually, two commercial vendors - PeerApp and ThunderCache - claim
> their products doesn't use urls to identify the objects, thus they
> don't have to maintain StoreID-like de-duplication database manually.
>
> Any ideas how do they do it?

Most likely they do not, and you are simply being mislead by their
marketing claims. In general, it is not possible to ignore the request
URL and still produce the right response (think about it!). They
probably do not store duplicate cache objects, but, as discussed above,
that is far from the "automatic StoreID" functionality that the original
poster is asking about.

In other words, there are at least two de-duplication layers:

* The higher-level one is based on URLs and essentially requires manual
URL mapping. It helps turn cache misses into hits.

* The lower-level one is based on checksums and can be automated. It
helps spend less cache space to serve cache hits. Some commercial
products have implemented this lower-level optimization.

Cheers,

Alex.
(*) where cache hit and miss are determined based on the original URL.
Received on Tue Mar 11 2014 - 19:43:25 MDT

This archive was generated by hypermail 2.2.0 : Fri Mar 14 2014 - 12:00:04 MDT