Re: Squid-FS from Dancer on 1998-04-23 (squid-dev)

From: Dancer <dancer@dont-contact.us>
Date: Fri, 24 Apr 1998 09:28:10 +1000

--MimeMultipartBoundary
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Henrik Nordstrom wrote:
>
> Dancer wrote:
>
> > It might be worth considering as a disk storage module, however. For
> > many cache-setups disk-access is not going to be a noticeable
> > bottleneck. For example, for Brisnet, I simply asked for "the slowest,
> > largest HD I can get for $X". Even if we triple our link capacity, a
> > slow IDE is going to keep pace for the foreseeable future.
>
> I'll second that, but in a more general way. Squid needs to be
> redesigned in a modular way with well defined interfaces between the
> modules. Not only for a storage module.

Amen to that. It'd be nice to be able to plug in
storage/expiration/authentication componenture.

> > We noticed this on the schnet setup recently, where we were so
> > close to the wall on file-descriptors, that minor network jitters
> > on the australian backbone caused us to max out. Just a slight
> > fluctuation was all it took.
>
> I don't regard that as a argument for snappy hit-response, but rather as
> a sign of a major design error in the cache hierarchy, misconfiguration
> or not the right OS for the job. Squid tries to not accept connections
> it can't handle, but if the OS imposes limitations unknown to Squid bad
> things may happen. The recent Squid versions should self-adjust if it
> does occur.

Oh, it's not intended as an argument for snappy hit-response, just a
curious datum. Squid _did_ in fact stay within it's limits, but hitting
the wall so quickly into the trial was unexpected (except by me, but
nobody listened at the time). It caused a little consternation.

> If you can't select a OS that has at least a huge margin in active
> filedescriptors/sockets, then you need to design a cache hierarchy that
> gives you this. If a heavily used link goes down the number of active
> connections may sky-rocket, resulting in a denial-of-service.
>
> The basic Mirror-Image design seems like a feasible way to implement a
> hierarchy. The frontend caches is the same that contacts the origin
> servers effectively distributing the load in a scalable way, and only
> inter-cache traffic goes throught backend servers. But I don't agree on
> the idea of "Terabyte-Servers".
>
> My current idea what a good cache design looks like: (farm model)
> * Any number of frontend caches, that does the actual work
> * 1 (or 2 to provide redundancy) "grand centrals" that keeps track of
> the whole cache.
> * The grand central keeps track of local servers as well, signalling "go
> direct" for local resources when peering with remote caches/farms.
> * Frontend caches updates to the grand central using cache digests.
> * Frontend caches queries the grand central using a variation of ICP or
> a similar protocol, to determine if the object is already cached.
> * Clients uses a PAC with a primary frontend cache based on locaion
> (source IP) and fallback frontends if the primary dies/fails.
>
> This design should be scalable on all factors, no matter what kind of
> hardware you build with (more powerful == less boxes, less powerful ==
> more boxes).
>
> The service continues to function even if the grand central dies. The
> impact is lower hit ratio, as each frontend cache then runs in
> stand-alone without knowledge of the other caches.
>
> Peering between caches is done at the grand-central level, using cache
> digests.

I've been playing with theoretical large cache-farm designs on paper,
for future setups, and (so far) a three-layer design is the best I've
been able to come up with. At least, I don't see any serious cons.

* Layer 1: Talks to the net. Fetches from origin servers. Small cache.
* Layer 2: Main cache boxes. Large cache.
* Layer 3: Talks to groups of customers. Customers are grouped by
philosophy, pattern of use, or physical region. small cache.

Layer 3 machines don't have a sibling relationship. They're all children
of layer 2, which are siblings of layer 2 and children of layer 1.

If I've thought this through correctly, most objects should accumulate
on the disks in layer 2. Layer 1 just fetches them in, and then goes off
to grab the next thing. Small, frequently used items would aggregate on
the layer-3 machines (with an appropriate policy).

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GAT d- s++: a C++++$ UL++++B+++S+++C++H++U++V+++$ P+++$ L+++ E-
W+++(--)$ N++ w++$>--- t+ 5++ X+() R+ tv b++++ DI+++ e- h-@ 
------END GEEK CODE BLOCK------
--MimeMultipartBoundary--

Received on Tue Jul 29 2003 - 13:15:47 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:45 MST