Re: Architecture Overview

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 25 Aug 2008 22:52:35 -0600

On Tue, 2008-08-26 at 10:17 +0800, Adrian Chadd wrote:
> Since this is a sort of union between what I have been working towards
> and what Amos has had conceptually in his head, I'll throw in my 2c.
>
> I've been working towards breaking out the core code to the point
> where the disk, comm and http related code is seperate from src/.

IIRC, we have discussed the "separate from src/" idea before. I am not
going to repeat those old arguments.

> The first pass is just a TCP socket data proxy - read data from one
> end, write it to the other. At the moment there's one object ("ssl
> tunnel", since I borrowed the code from src) which implements both TCP
> sockets.
>
> The next pass (the useful "data pump") is a low-cost stream wrapper
> between a connection endpoint (TCP socket, SCTP stream, etc) which has
> a message exchange API (events or callbacks, doesn't really matter at
> this stage) which ties together some endpoint and some data
> source/sink.

Sorry, I cannot parse that. The Data Pump is a stream between a
connection endpoint and ...?

> There are already examples of these elsewhere which produce a
> symmetric API for data exchange such that you can create two nodes,
> link them together, and have them exchange data as a TCP proxy.

Does the "exchange data" imply that Amos' Data Pump pumps raw data
(opaque bytes)? Does the Pump pass any meta-information about the data?
Is the Pump unidirectional?

> The next pass is using this to develop and benchmark copy-free network
> IO, concurrency/SMP, OS tuning and modification and such.
>
> The above is the "data pump" from the discussion.
>
> The next pass is to break out the HTTP related code and build a
> message-based HTTP request and reply object.

What do you mean by a "message-based HTTP request and reply object"? Are
these two objects sending messages to each other? Or a single object,
like HTTP transaction, representing both directions of the info exchange
(request and reply)? Or is it just about storing headers and such?

> Same design goals as
> above - be able to glue a request and a reply object instance together
> to build a proxied HTTP connection. This should handle all the various
> cases needed to be fully HTTP/1.1 compliant - the big thing different
> to the "current" HTTP code is handling two-way messages for expect
> flow-control and for TE'd request/reply bodies.
>
> After that, the majority of the Squid processing becomes modules in
> the request/reply pipeline. Request routing is a module which takes
> queued HTTP requests, runs some business logic over them (eg URL
> matching rules) and creates HTTP request objects to next-hops with the
> relevant stuff. Once its done its bit it gets out of the data exchange
> path. ACL lookups become a module or modules (with some ACL type stuff
> done in the TCP connection layer where appropriate - say, blocking
> requests before they are even parsed.) A "cache" is a module or series
> of modules which either create HTTP requests to the upstream or
> instances of some cache object to feed the reply data from. collapsed
> forwarding could even be a cut-down module caching only the results of
> a request long enough to satisfy existing pending requests, then
> tossing the data (so effectively a 0-second cache). ICAP and other
> protocol modules can "sit" in the request/reply data pipeline and do
> whatever they wish to the messages as they flow.

IMO, pipeline design is too rigid for message adaptation and possibly
even for general HTTP/1.1 proxying. There are too many exceptions where
the "pipe" has to be turned around, plugged, split, merged, etc. I think
we should avoid that pattern as the Architectural base. Simple pipes are
good for connecting two endpoints but a complex pipe is a poor
foundation for flexible message processing. We may have discussed that
already so my apologies if I am repeating stuff.

Whether overall HTTP transaction processing is a "pipe with bumps in the
middle" or a "query processing engine" dealing with several individual
(but related in various ways) tasks is a significant Architectural
decision.

> My aim was to get to the point where I've got a generic-ish
> message-based HTTP request/reply class outside of src/ which I can use
> to "glue" together HTTP proxy connections and use that as a platform
> for exploring concurrency and performance. I'd then "shoehorn" it into
> httpState / ConnStateData / clientHttpRequest enough so that the Squid
> code starts using these objects. At -this- point I think I'll have
> enough experience and information to put forward a design with some
> credible backing for whatever the future codebase will look like.

I think you are taking a big (and mostly unnecessary) risk: whatever
code you end up with may be too difficult to merge back with Squid code,
no matter what the merits of the new design are. It is essentially a
"lets rewrite Squid from scratch using these great libraries" project,
which is likely to meet a significant resistance. Of course, it might
happen that your future libraries are just better versions of future
Squid modules and then we can just use the best parts and avoid another
rewrite.

$0.02,

Alex.
Received on Tue Aug 26 2008 - 04:53:37 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 26 2008 - 12:00:07 MDT