Re: Ideas for the future of Squid from dancer@dont-contact.us on 2000-03-25 (squid-dev)

From: <dancer@dont-contact.us>
Date: Sun, 26 Mar 2000 01:00:54 +0000

I agree. Recently (you've been wondering where I've been lately haven't
you?) we've had some problems with some very complex code. After spending
months bandaiding and tracking faults only to have another pop out the
following day, I told the team to scrap the whole system, and I sat down
and designed a new one from scratch operating on these lines. Multiple
cooperative processes, each tolerant of failure in adjacent components
(just retry, and after a second or two the broken component will be
restarted).

The modular, cooperative design allowed us to write different components in
different languages (I had to draft a couple perl programmers to get enough
manpower to rewrite the project in a week) and allowed us to test
components seperately. Now, after three months, we've yet to show any
trouble with it. We had exactly two bugs, both found on the first day of
testing.

Ah...except....The scheduler (at least under linux 2.0, and probably under
2.2) can display asymptotic, sinusoidal behaviour when some of your
components are entirely CPU bound, but have other components relying on
them. It's an important thing to watch out for. I can explain more about
this gotcha if anyone's interested, but it should be obvious.

Henrik Nordstrom wrote:

> As you all probably know Squid has a number of performance bottlenecks.
> Disk I/O is one and is currently being addressed, but there obviously
> are other major bottlenecks as well, presumabely in the
> networking/select/poll part, and in the amount of data copying and
> header parsing done.
>
> The memory footprint/disk size ratio is currently also way to large,
> making memory a major investment for any large size cache. Memory is
> also quite hard to scale up once the system is installed, making system
> growth quite painful.
>
> Stability is also a problem. Today a single fault in any part of the
> code can bring the whole process down to a quite lengthy restart and
> breaks all ongoing/pending requests.
>
> Squid consists of a number of major function components:
>
> * The client interface accepting requests from the clients
>
> * The object database, keeping track of the current cache contents
>
> * Different protocol modules for retreiving contents
>
> * Storage policy
>
> * On-disk storage/retreial
>
> * Access control
>
> Around these functions there is also a large library of supporting code
>
> * DNS resolving / caching
>
> * Redirectors
>
> * Proxy authentication verification
>
> * Memory handling
>
> * and lots more
>
> I think this should be divided into a number of "independent" processes:
>
> * A number of networking processes accepting client requests, access
> control and fetching content from other servers.
>
> * A number of disk processes (one per fs/spindle)
>
> * A process for DNS, proxy auth caching, long term client statistics
> (for delay pools) and other shared services.
>
> * A master process monitoring all the parts, restarting components as
> neccesary.
>
> Main problem with having multiple processes is how to make efficient
> inter-process calls, and to do this we probably have to make a large
> sacrifice in portability. Not all UNIX:es are capable of efficient
> inter-process communication at the level required, and most requires
> some tuning. However, if layered properly we might be able to provide
> full portability with the sacrifice of some performance on platforms
> with limited IPC capabilities.
>
> The object database I'd like to see distributed to the disk processes,
> where each process maintains the database for the objects it has, with
> only a rought estimate (i.e. like a cache digest) collected centrally.
>
> Any IPC should to be carefully planned and performed at a macro level
> with as large operations as feasible, with proper error recovery in case
> one of the components fail. If a networking process fails only the
> requests currently processed by that process should be affected,
> similary if a disk process fails only the requests currently served from
> that disk process should be affected.
>
> For DNS/proxy_auth/whatever else some limited distributed caching in the
> networking processes might be required to cut down on the number of IPC
> calls, but the bulk of these caches should be managed centrally.
>
> This requires a number of major transitions of the code desing. For
> example there will be no globally available StoreEntry structure to
> connect things together.
>
> Am I onto track to something here, or am I completely out dreaming?
>
> /Henrik
Received on Sat Mar 25 2000 - 18:01:12 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:22 MST