Re: memory-mapped files in Squid from John Dilley on 1999-01-26 (squid-dev)

From: John Dilley <jad@dont-contact.us>
Date: Tue, 26 Jan 1999 09:30:08 -0800

> > By async reads you mean threads? Are we ready talking about threads
> > as of portable solution? That would be cool of course.
>
> The current async-io implementation can't be regarded as very portable,
> but I beleive threads are if used in the way they are intended to be
> used, without excessive filedescriptor sharing and other strange things
> which are bad for both portability and stability.

Apologies for being slightly off the disk/IO topic, but I have
some comments & questions about using threads. My opinion of the way
threads are intended to be used leads to a thread-per-connection model.
In this model, when a request first arrives it (in this case the socket
descriptor) is dispatched to a thread that handles the request from
start to finish.

One nice thing about this model is its programming simplicity.
It's much easier to follow one thread of control through the code than
to read select/callback/dispatch code. You have to deal with reentrancy
issues but you have similar issues in different shape with select.

The disadvantage is that each thread consumes memory for its
stack, private state, etc. For a web proxy cache workload you often
have many concurrent connections each waiting on remote I/O. Under this
workload a thread-per-connection approach might be excessively expensive
in memory and thread context switch overhead. (I say may be because I
have not actually characterized it -- but we were looking into a similar
application architecture with threads and rejected the thread-per-
connection model for this reason.)

Another model for utilizing threads is to dispatch requests from
a select loop to a thread pool where each thread does a unit of work and
then returns to the pool. With this model the Squid architecture would
not be greatly changed (comm_select (or examine_select) would dispatch
requests to handlers in the next available thread). With an appropriate
implementation of threads this would allow squid to make use of multiple
processors -- but Squid's workload is not CPU limited but rather I/O
limited, so it's not clear MP would be useful.

So while in general I like distributed computing with threads, I
don't think it's a very good match for Squid's system requirements. The
current event/dispatch model fits better with the workload. The best
way to improve Squid performance would be (IMHO) to focus on disk IO
(the main "thread" of this conversation :-) and to improve select()...

-- jad --
John Dilley <jad@hpl.hp.com>
Received on Tue Jul 29 2003 - 13:15:56 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:02 MST