Re: [squid-users] caching data for thousands of nodes in a compute cluster

From: Henrik Nordstrom <henrik@dont-contact.us>
Date: Mon, 25 Jun 2007 23:57:21 +0200

mån 2007-06-25 klockan 15:02 -0500 skrev Dave Dykstra:
> Trying again, having got no response. Any reaction to my questions?

Sorry, your question got lost..

> > I considered that, but wouldn't multicasted ICP queries tend to get many
> > hundreds of replies (on average, half the total number of squids)?

Right.. so not so good when there is very very many Squid's..

you could modify the ICP code to only respond to HIT's on multicast
queries. This would cut down the number of responses considerably..

Another option is to build a hierarchy, grouping the Squid's in smaller
clusters, with only a selected few managing the wider connections.

It's hard to get this fully dynamic however. Some configuration will be
needed to build the hierarchy.

You'll probably have to extend Squid a bit to get what you want running
smoothly, with multicast ICP being one possible component to discover
the nearby nodes and exchanges between those, but I am not familiar with
your network topology of how the cluster nodes is connected together so
it's just a guess.

This kind of setup could also benefit a lot from intra-array CARP. Once
the cluster members is known CARP can be used to route the requests in a
quite efficient manner if the network is reasonably flat.

If the network is more wan like, with significantly different levels of
connectivity between the nodes then a more grouped layout may be needed,
building a hierarchy ontop of the network topology.

Is there some kind of cluster node management, keeping track of what
nodes exists and mapping this to the network topology? Or do everything
need to be discovered on the fly by each node?

Regards
Henrik

Received on Mon Jun 25 2007 - 15:57:28 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Jul 01 2007 - 12:00:04 MDT