Re: ideas

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Mon, 12 Jun 2000 21:48:17 +0200

On 12 Jun 2000, at 11:53, Henrik Nordstrom <hno@hem.passagen.se> wrote:
> > Ok. You're talking about seperate processes here, right? I'm talking
> > about having it all as one process with a single comm/callback thread
> > per CPU. I guess if designed right the modules could support either
> > IPC or callback notification depending upon your orientation, but
> > my motivation for keeping it all in one process as much as possible
> > stems from my desire to have this compile relatively cleanly under
> > DOS at some stage.
>
> Right, but it might also be implemented using separate threads in one
> process. Does not matter much for the interface designs (only the
> "IPC"/"RPC" mechanism is different..), but it matters a lot for
> reliability in case of software errors which there will always be some
> errors. I cannot make 100% faultfree code, and I am pretty sure neither
> can you. I can however try to build a software system which can handle
> common faults in a reasonable manner without servere impact on the
> function, and I do beleive it can be done without a too large impact on
> overall performance.

 I wouldn't hope that threads could add anything to reliability.
 Not to argue with you, but trying to be eagerly helpful ;)

 There are many different types of "threads", and Squid current design
 is definitely and clearly "threaded". Each request is handled from start
 to end with some thread, as codepath between two setSelect() calls
 is equivalent to a thread. All requests are given their own thread, and
 each thread is suspended in comm_select loop while waiting for IO.
 Basically, this is where threads are scheduled to get the cpu time.
 All code is reused as with reentrant threads, and the only data that
 identifies thread is request structure (or storeEntry). Squid type
 of code is never called threaded just to avoid confusion with system
 supported threads implementations.

 Current scheme is very similar to what we'd get if we used user-level
 threads and wrote the squid down in fully threaded manner.
 The only difference is that currently Squid is limited to single thread
 being able to execute concurrently, while real threaded design could
 potentially have ability for each thread to progress on separate CPU,
 but this ability depends on OS thread implementation somewhat.
 As the benefit, compared to fully threaded design, Squid current code
 has absolutely predictable execution order, vs. absolutely unpredictable
 with threads (unless you enforce order by other means).

 Whats quite important, is that user-level threads are not able to enter
 kernel concurrently, because kernel is viewing all userlevel threads at
 process level as a single thread, and most threads libs enclose syscalls
 with mutexes (or worse: _all_ syscalls with _single_ mutex). This means
 that any call to system potentially ends up with hitting locked mutex
 and rescheduling all threads. This means that all calls to system are
 serialised and all threads can run concurrently only between calls to
 system. So we only add overhead of switching between threads, and gain
 very little on average. This isn't quite what we want.

 So for real parallelism we are forced to use kernel-threads. Of course
 we can expect much better concurrency and together with that more work
 done in the same time. But we'd have to deal with all the headaches of
 concurrent threads, or add bottlenecks ourselves.

 To write classical threaded squid we'd most probably need a total
 rewrite which is most probably not desired. This leaves us with
 adding threads where it gives most benefit. Current async-io is
 one good example. In async-io, threads are used for good, and in a
 way, are unavoidable if we want non-blocking Squid.
 By using kernel threads we pretty much unavoidably face inter-thread
 communication and syncronisation, which adds overhead proportionally
 to the number of threads running. There are several reasons for that,
 and without digging deeper into how threads are actually implemented
 it is hard to see, and may leave false sense of scalability if not
 accounted for.

 I don't think threads can in any way help us write fault-tolerant code,
 although I'd agree that using threads could help us better track code
 source, and perhaps avoid several hard-to-track errors.
 Infact, on contrary, having lots of specifics to threads might make
 code even harder to debug, and by definition, if any thread of control
 can bring the whole proccess down, there isn't much difference whether
 you try to write code that can tolerate some errors or try to write
 code without any errors ;)

 Actually, it is possible to rewrite squid so that it looks totally
 as fully threaded code while at the same time not using a single
 thread library. All it needs is to wrap all system calls that could
 block and implement stack save/restore for each such call, and then
 actually block in comm_select. Later, when socket is ready, instead
 of calling callbacks, simply restore the stack and return to the
 caller until it shortly blocks again. Infact this is exactly what
 user-level threads are doing, just hidden from the programmer.
 Later, we could actually implement those wrapped calls in several
 different ways, like kernel threads or separate processes, depending
 on OS support.

 In terms of fault-tolerance, the building block could only be process.
 (can't really cope with SIGSEGV in reasonable way within a thread)
 If we can split several tasks of squid into separate processes that
 are self-sufficient, then we can get to some fault-tolerance. But
 definitely at a price, most probably performance wise.

 In terms of splitting squid into separate tasks (processes/threads),
 we should very clearly think about _why_ we would want some task be
 separated, what it gives us when separated and at what price.

 For eg, if we had a separate thread for every client session, we'd
 most of the time be blocking in either client-write or server-read.
 So, basically for every packet coming from server, system has to
 schedule the thread to run, and all it does is to copy 1 packet from
 server socket to client socket before being blocked again. The
 overhead of switching kernel-threads grows upto dominating all the
 CPU time at high loads and many concurrent sessions.
 While splitting tasks, we should always be keeping such things in
 mind.

 In general, it isn't very useful to split such task into separate
 thread if execution of that task takes very little time and/or
 overhead of scheduling that task is comparable to execution time.
 Using async-io to read/write network sockets makes very little sense.

 At the same time, for eg. ICP server could be very effective as
 separate thread. It is pretty much independant of any other squid
 tasks and runs very fast. The reason why separate thread may be
 useful here is that we can avoid regular polling of ICP socket
 and make it totally asyncronous. ICP traffic is relatively small
 also so that awaking ICP thread for a single packet isn't very
 much overhead. We could also use separate thread for http accept
 socket.
 But to make a separate thread to only issue lookups in Store index
 database may be expensive, because current lookups are very fast
 with no overhead, and adding layer of separation (thread) can add
 noticable overhead.

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Network Development Manager
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Mon Jun 12 2000 - 13:50:32 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:29 MST