Re: ideas from Adrian Chadd on 2000-06-12 (squid-dev)

From: Adrian Chadd <adrian@dont-contact.us>
Date: Tue, 13 Jun 2000 07:58:21 +0800

On Mon, Jun 12, 2000, Henrik Nordstrom wrote:
> Andres Kroonmaa wrote:
>
> > I wouldn't hope that threads could add anything to reliability.
> > Not to argue with you, but trying to be eagerly helpful ;)
>
> Haven't claimed it would. Threads for SMP support is Adrian's idea. I
> doubt it will work on many of the target platforms of Squid.. (well, it
> will work on the same platforms as async-io currently works, but I have
> doubts for the *BSD family)

FreeBSD currently has linuxthreads which will do the same as threads
do under linux (ie clone() processes with shared data, text and seperate
stacks) so it could be made to work under FreeBSD. I'm not sure netbsd
and openbsd have the same abilities however. FreeBSD will (one day)
have more intelligent hybrid kernel/user mode threads which work in
SMP but don't incur the overhead of processes as threads. That said,
I would like to minimise the number of threads where possible.

> > So for real parallelism we are forced to use kernel-threads. Of course
> > we can expect much better concurrency and together with that more work
> > done in the same time. But we'd have to deal with all the headaches of
> > concurrent threads, or add bottlenecks ourselves.
>
> Of course. On systems not supporting kernel threads we have to use
> processes to accomplish SMP support.

I would think that systems that support SMP today would support smp-aware
threading of some sort. Even if its a kludge.

> > Infact, on contrary, having lots of specifics to threads might make
> > code even harder to debug, and by definition, if any thread of control
> > can bring the whole proccess down, there isn't much difference whether
> > you try to write code that can tolerate some errors or try to write
> > code without any errors ;)
>
> On that I do not agree. There are lots of things which can be done to
> write code that tolerates errors, however that is not what we are
> discussing in the threads discussion.
>
> Some ideas was in my notes on a multiprocess design (not threads within
> one process), but automatic crash recovery is only the tip of the
> iceberg in writing fault tolerant code.

I like the idea of automatic crash recovery. Traditionally, I'd look at
something outside squid (one thing I had in mind was a 'conditional ipfw'
statement in FreeBSD where the transparent redirection wouldn't happen
if nothing was bound to the target port). If you have the storage manager
as a seperate process and it fails, all the existing connections that go
through the connection manager would fail unless you put in some rather
interesting logic to try and reconnect the servlets to the clientlets
(to use my description of the internals) or reschedule new server requests.

I can be swayed with some pseudo-code descriptions .. :)

> > In terms of splitting squid into separate tasks (processes/threads),
> > we should very clearly think about _why_ we would want some task be
> > separated, what it gives us when separated and at what price.
>
> There has been a gread deal of thought on that.

And there will be much more thought on this.

> Ok. Maybe I should repeat the process split suggestion I made:
>
> 1. Network I/O processes. One per CPU. Each handling lots of concurrent
> connections. All request forwarding for a single client connection takes
> place in a single process.
>
> 2. Storage processes, one per cache_dir (disk spindle). Takes care of
> reading/writing to the disk and manages a definite index of that
> directory. These can in turn be multithreaded ala async-io if so is
> wanted.
>
> 3. A master process, keeping an eye on everything and making sure
> everything is up and running.
>
> 4. Other helper processes as needed.
>
> The above might differ slightly from previous two process descriptions,
> however the basic ideas is the same.
>
> The store index is shared between the storage and network via compact
> hints, for example cache digests. Sharing could be done using for
> example shared memory or memory mapped data depending on the OS and
> taste. How does not matter for the design, only that the regions are
> there with one writer and multiple readers and not sensitive to race
> conditions.
>
> The idea behind this multi-process design is that each unit is self
> contained for the operations it performs. If a network I/O process dies
> and restarts then only the requests currently being processed by that
> process gets affected.
>
>
> The exact details of the ICP/RPC mechanisms between the various parts
> remains to be spelled out in detail. Before that work is started we need
> to agree on the basic principles of having a multi-process design.
>
> Adrian is discussing a different design based on threads, but the split
> is along similar lines. The main difference is in how the store index is
> maintained, and the communication mechanisms for communicating between
> the various parts. Also the threaded design does not provide the
> distributed crash recovery of the multi-process design.

I'l agree there. THe reason I am going for threads rather than multiprocess
is simply for SMP. In the non-SMP build you wouldn't have any thread
primitives. I know various people's opinions for and against using threads.

Ok, Henrik. Lets go for a hybrid for the time being- multiprocess
as how you've put it above with the clientlet/servlet model I've proposed
for inside the connection manager. I would like to push for some
pseudo-code to describe the ideas, rather than lots of English. :-)

Issues I see right now:

* if seperate processes are used for each CPU for SMP performance, how
  do you handle multiple clients across seperate processes talking to
  the storage manager which then talks to one (or two depending if
  the object needs revalidation) servers ?

* If you have seperate processes for different IO types (say you have a http
module, a realmedia module, an FTP module..) how do you efficiently stream
between them?

I still like the idea of a monolithic process for various reasons, but
I can be swayed with pseudo-code. I will attempt to throw together
some pseudo-code describing my idea tomorrow. Henrik and anyone else,
please do the same. I think this will help us all in sorting out a final
design that we can start coding on.

Adrian

-- 
Adrian Chadd			Build a man a fire, and he's warm for the
<adrian@creative.net.au>	rest of the evening. Set a man on fire and
				he's warm for the rest of his life.

Received on Mon Jun 12 2000 - 17:58:28 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:29 MST