Re: [squid-users] Duplicate files, content distribution networks

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 14 Jun 2012 22:33:38 +1200

On 14/06/2012 8:53 p.m., Jack Bates wrote:
> On 18/05/12 05:55 AM, Eliezer Croitoru wrote:
>> On 18/05/2012 10:33, Jack Bates wrote:
>>> Are there any resources in Squid core or in the Squid community to help
>>> cache duplicate files? Squid is very useful for building content
>>> distribution networks, but how does Squid handle duplicate files from
>>> content distribution networks when it is used as a forward proxy?
>>>
>>> This is important to us because many download sites present users
>>> with a
>>> simple download button that doesn't always send them to the same
>>> mirror.
>>> Some users are redirected to mirrors that are already cached while
>>> other
>>> users are redirected to mirrors that aren't. We use a caching proxy
>>> in a
>>> rural village here in Rwanda to improve internet access, but users
>>> often
>>> can't predict whether a download will take seconds, or hours, which is
>>> frustrating
>>>
>>> How does Squid handle files distributed from mirrors? Do you know of
>>> any
>>> resources concerning forward proxies and download mirrors?
>> squid 2.7 has the store_url_rewrite option that does what you need.
>> sourceforge is one nice example for a cdn files download based mirrors.
>> and you can always use the cache_peer option to use the main squid as a
>> more updated version and to use only for what you need such as specific
>> domain from the older version.
>
> Thanks very much for pointing out the store_url_rewrite option
> Eliezer. Does it require the proxy administrator to manually configure
> the list of download mirrors?
>
> Does anyone in the Squid community have thoughts on exploiting
> Metalink [1] to address caching duplicate files from content
> distribution networks?
>
> The approach I am pursuing is to exploit RFC 6249, Metalink/HTTP:
> Mirrors and Hashes. Given a response with a "Location: ..." header and
> at least one "Link: <...>; rel=duplicate" header, the proxy looks up
> the URLs in these headers in the cache. If the "Location: ..." URL
> isn't already cached but a "Link: <...>; rel=duplicate" URL is, then
> the proxy rewrites the "Location: ..." header with the cached URL.
> This should redirect clients to a mirror that is already cached
>
> Thoughts?

Well, since our very own Henrik Nordstrom is one of the authors. I'd say
there were thoughts about it in the Squid community :-)

>
> Another idea is to exploit RFC 3230, Instance Digests in HTTP. Given a
> response with a "Location: ..." header and a "Digest: ..." header, if
> the "Location: ..." URL isn't already cached then the proxy checks the
> cache for content with a matching digest and rewrites the "Location:
> ..." header with the cached URL if found
>
> I am working on a proof of concept plugin for Apache Traffic Server as
> part of the Google Summer of Code. The code is up on GitHub [2]
>
> If this is a reasonable approach, would it be difficult to build
> something similar for Squid?

Please contact Alex Rousskov at measurement-factory.com, he was
organising a project to develop Digest handling and de-duplication this
a while back.

Amos
Received on Thu Jun 14 2012 - 10:33:51 MDT

This archive was generated by hypermail 2.2.0 : Sun Jun 24 2012 - 12:00:03 MDT