Re: Architecture Overview

From: Adrian Chadd <adrian_at_freebsd.org>
Date: Tue, 26 Aug 2008 10:17:02 +0800

Since this is a sort of union between what I have been working towards
and what Amos has had conceptually in his head, I'll throw in my 2c.

I've been working towards breaking out the core code to the point
where the disk, comm and http related code is seperate from src/.

The first pass is just a TCP socket data proxy - read data from one
end, write it to the other. At the moment there's one object ("ssl
tunnel", since I borrowed the code from src) which implements both TCP
sockets.

The next pass (the useful "data pump") is a low-cost stream wrapper
between a connection endpoint (TCP socket, SCTP stream, etc) which has
a message exchange API (events or callbacks, doesn't really matter at
this stage) which ties together some endpoint and some data
source/sink.

There are already examples of these elsewhere which produce a
symmetric API for data exchange such that you can create two nodes,
link them together, and have them exchange data as a TCP proxy.

The next pass is using this to develop and benchmark copy-free network
IO, concurrency/SMP, OS tuning and modification and such.

The above is the "data pump" from the discussion.

The next pass is to break out the HTTP related code and build a
message-based HTTP request and reply object. Same design goals as
above - be able to glue a request and a reply object instance together
to build a proxied HTTP connection. This should handle all the various
cases needed to be fully HTTP/1.1 compliant - the big thing different
to the "current" HTTP code is handling two-way messages for expect
flow-control and for TE'd request/reply bodies.

After that, the majority of the Squid processing becomes modules in
the request/reply pipeline. Request routing is a module which takes
queued HTTP requests, runs some business logic over them (eg URL
matching rules) and creates HTTP request objects to next-hops with the
relevant stuff. Once its done its bit it gets out of the data exchange
path. ACL lookups become a module or modules (with some ACL type stuff
done in the TCP connection layer where appropriate - say, blocking
requests before they are even parsed.) A "cache" is a module or series
of modules which either create HTTP requests to the upstream or
instances of some cache object to feed the reply data from. collapsed
forwarding could even be a cut-down module caching only the results of
a request long enough to satisfy existing pending requests, then
tossing the data (so effectively a 0-second cache). ICAP and other
protocol modules can "sit" in the request/reply data pipeline and do
whatever they wish to the messages as they flow.

My aim was to get to the point where I've got a generic-ish
message-based HTTP request/reply class outside of src/ which I can use
to "glue" together HTTP proxy connections and use that as a platform
for exploring concurrency and performance. I'd then "shoehorn" it into
httpState / ConnStateData / clientHttpRequest enough so that the Squid
code starts using these objects. At -this- point I think I'll have
enough experience and information to put forward a design with some
credible backing for whatever the future codebase will look like.

Adrian
Received on Tue Aug 26 2008 - 02:17:06 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 26 2008 - 12:00:07 MDT