Re: [squid-users] cache tree structure

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Thu, 30 May 2002 00:09:20 +0200

Joe Cooper wrote:

> No, Henrik has suggested that for a number of WANs, it does make sense
> to use a parent at each WAN connected location, with siblings beneath
> them.

Sort of, but I find your use of parent/sibling a bit confusing..

The parent should be on the other side of the WAN connection, closer to
the Internet.

Only if you have multiple caches sitting next to each other on the same
side of a WAN connection sibling relations may be of useful value I
think.

Sibling relations should be viewed separately from parent/child
relations. The two serve different purposes.

parent/child relations apply when you have caches located along the
natural request path (i.e. along your WAN links towards the internet). A
parent/child relation is constructed by configuring the child to use the
parent as a parent. Contrary to real life the parent has no knowledge of
who it's childs are.. In most parent/child relations there is no or very
limited value of using ICP. At most times it is more efficient to skip
ICP all together for parent/child relations, and leave it up to the
parent to figure out how to best get the object.

sibling relations apply when you have two caches where you think it may
be beneficial to make them share the cache content and have a network
topolgy where the latency and overhead of ICP roundtrips between the
caches is low compared to the benefit of cache sharing. A sibling
relation requires ICP.

Some reasonable applications of sibling relations are

 * Cache sharing within a tighly coupled clusted of caches located at
the same place.

 * Cache sharing over a good WAN link between two locations each having
a separate Internet connection and possibly tree of child caches below
them, and the cost of using the WAN link is considerably less than
getting the content from the Internet.

 * Same as the above WAN case, but on a higher level like "country" if
there is differentiated costs for international and national traffic, or
if the national connectivity is excellent and international connectivity
very poor. However, cache relations between companies/organizations is a
tricky from an administrative and political/policy point of view.. For
example your peer will be able to tell in detail what your users are
looking at. So often even if such relation makes sense from a technical
point of view in some environments, in practice making the needed
agreements to do such relation proper is often not possible..

> So the ideal hit ratio is a One-Parent-Many-Siblings model, because the
> parent will always hit, if the item was cacheable (assuming a large
> enough cache), but the siblings can offload some of the work and bring
> the content a little closer to users. However, this doesn't work so
> well with the sibling interaction is across WAN links because the
> latency is too high. So in this case one has a parent at each site,
> with any number of siblings. The parent then handles the WAN link, and
> caches everything that comes down to the local site.

Quite often the same hit ratio can be acheived with
One-Parent-Many-Childs model. The parent (which may consist of a cluster
of more than one cache server) sees all traffic from all childs, so if
sized properly it will have a copy of almost all content cached by the
child caches.

The child caches cache what their users request, taking some of the load
off from the parent and the WAN link.

> So, it depends on your environment. One smallish LAN only needs one
> parent with a few siblings. A big, LANs+WAN environment needs more
> distribution of data to bring cache content closer to users.

In most designs I have done, a flat structure has been selected. If
there is need for more than one cache server then distribute different
destinations to different caches, and possibly sibling relations among
the caches to deal with misdirected requests.

If there is WAN links involved, then placing child caches on the side
closer to the user may make sense, but it depends on how much of the WAN
link is used for Internet traffic and the user base.

Regards
Henrik
Received on Wed May 29 2002 - 16:51:28 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:08:15 MST