Re: [squid-users] caching data for thousands of nodes in a compute cluster from Pablo García on 2007-06-11 (squid-users)

From: Pablo García <malevo@dont-contact.us>
Date: Mon, 11 Jun 2007 17:57:53 -0300

Dave, If you can configure a farm of squid nodes and a bunch of web
servers to serve as origin servers, then you could use a load balancer
appliance (Citrix Netscaler, F5, Cisco, etc), that, by using a URLHASH
based algorithm it would request the same url from the same squid and
in case of a crash in that node, you would have only one cache miss.

Regards, Pablo

On 6/11/07, Dave Dykstra <dwd@fnal.gov> wrote:
> Hi,
>
> I have been thinking about the problem of quickly distributing objects
> to thousands of jobs in a compute cluster (for high energy physics). We
> have multiple applications that need to distribute the same data to lots
> of different jobs: some applications distributing hundreds of megabytes
> to thousands of jobs and some distributing gigabytes of data to hundreds
> of jobs. It quickly becomes impractical to distribute all the data from
> just a few nodes running squid, so I am thinking about running squid on
> every node, especially as the number of CPU cores per node increases.
> The problem then is how to determine which peer to get data from. As
> far as I can tell, none of the methods currently supported by squid
> would work very well with thousands of squids (especially considering
> that there would often be a small number of them that are out of service
> so it would be hard to statically configure them). Am I right about
> that? It seems to me that it would work better if there were a couple
> of nodes that could dynamically keep track of which nodes had which
> objects (over a certain size), and could direct requests to other nodes
> that had the objects or were in the process of getting them. It's quite
> a bit like the approach that peer-to-peer systems like bittorrent use,
> although I haven't found any existing implementations that would be
> appropriate for this application and I think it is probably more
> appropriate to extend squid.
>
> - Dave Dykstra
>
Received on Mon Jun 11 2007 - 14:57:59 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Jul 01 2007 - 12:00:04 MDT