Re: [squid-users] caching data for thousands of nodes in a compute cluster

From: <squid3@dont-contact.us>
Date: Tue, 26 Jun 2007 09:58:12 +1200 (NZST)

> Trying again, having got no response. Any reaction to my questions?
>
> - Dave
>
> On Tue, Jun 12, 2007 at 11:42:42AM -0500, Dave Dykstra wrote:
>> On Tue, Jun 12, 2007 at 12:19:26AM +0200, Henrik Nordstrom wrote:
>> > m??n 2007-06-11 klockan 15:17 -0500 skrev Dave Dykstra:
>> >
>> > > of jobs. It quickly becomes impractical to distribute all the data
>> from
>> > > just a few nodes running squid, so I am thinking about running squid
>> on
>> > > every node, especially as the number of CPU cores per node
>> increases.
>> > > The problem then is how to determine which peer to get data from.
>> >
>> > Multicast ICP sounds like it could be a reasonable option there.
>> >
>> > Regards
>> > Henrik
>>
>> I considered that, but wouldn't multicasted ICP queries tend to get many
>> hundreds of replies (on average, half the total number of squids)? It
>> would only use the first response it got back, but it doesn't seem very
>> efficient of network or compute resources to throw away all the others.
>> Do you know of other people who have used multicast ICP for this type of
>> application?
>>
>> The multicast TTL could help a little but probably not much. I expect
>> the servers are usually organized in smaller groups, with better network
>> connectivity within each group, but it isn't practical to ask the system
>> administrators to tell us which servers are in which group so everything
>> has to be automatic. They're very likely all on the same large subnet
>> with the switches sorting out the routing, so it isn't clear that
>> anything at squid's level would be able to tell how far away servers are
>> other than by small differences in response time, or more likely
>> throughput of large transfers. I also don't think we can really expect
>> we know can know the names of all the peers in order to list them in
>> "multicast-responder".
>>
>> - Dave
>

There are some neighbour-discovery features of IPv6 that offer options in
this area. The drawbacks there are:
  The host network between squids MUST be able to handle IPv6 traffic
properly, and with the current squid that means dual-stack linux in some
form.
  It hasn't been written or even experimented with yet AFAIK. So some
sponsorship will be needed to get me or someone doing it earlier than a
few years away.

Amos
PS. yes folks squid3-ipv6 branch is in Beta testing now.
Received on Mon Jun 25 2007 - 15:58:16 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Jul 01 2007 - 12:00:04 MDT