Re: Architecture Overview from Adrian Chadd on 2008-08-26 (squid-dev)

From: Adrian Chadd <adrian_at_freebsd.org>
Date: Tue, 26 Aug 2008 14:12:59 +0800

2008/8/26 Alex Rousskov <rousskov_at_measurement-factory.com>:

> IIRC, we have discussed the "separate from src/" idea before. I am not
> going to repeat those old arguments.

Well code reorganisation of some sort is going to happen someway or
another. I've done it the way I think makes sense for Squid-2; the
results can be seen by simply looking at my work.

>> The first pass is just a TCP socket data proxy - read data from one
>> end, write it to the other. At the moment there's one object ("ssl
>> tunnel", since I borrowed the code from src) which implements both TCP
>> sockets.
>>
>> The next pass (the useful "data pump") is a low-cost stream wrapper
>> between a connection endpoint (TCP socket, SCTP stream, etc) which has
>> a message exchange API (events or callbacks, doesn't really matter at
>> this stage) which ties together some endpoint and some data
>> source/sink.
>
> Sorry, I cannot parse that. The Data Pump is a stream between a
> connection endpoint and ...?

IIRC, the "data pump" in Amos' design is a pipeline between two FDs,
say, with whatever modules in between handling protocol related magic.

In my design the "data pump" is what you get when you glue together
two stream objects.

>> There are already examples of these elsewhere which produce a
>> symmetric API for data exchange such that you can create two nodes,
>> link them together, and have them exchange data as a TCP proxy.
>
> Does the "exchange data" imply that Amos' Data Pump pumps raw data
> (opaque bytes)? Does the Pump pass any meta-information about the data?
> Is the Pump unidirectional?

The pump in both instances is bidirectional. Amos can flesh out more
of his pump ideas, we differed a bit on how we abstracted it.

>> The next pass is using this to develop and benchmark copy-free network
>> IO, concurrency/SMP, OS tuning and modification and such.
>>
>> The above is the "data pump" from the discussion.
>>
>> The next pass is to break out the HTTP related code and build a
>> message-based HTTP request and reply object.
>
> What do you mean by a "message-based HTTP request and reply object"? Are
> these two objects sending messages to each other? Or a single object,
> like HTTP transaction, representing both directions of the info exchange
> (request and reply)? Or is it just about storing headers and such?

Hm, that came out a bit wrong. Try "HttpClient" and "HttpServer" ; a
HttpServer would process a request and then generate messages to
whatever its peer is (conceptually, that'd be some kind of "request
router"). This request router would create some object instance that
handles incoming connections and generates HttpServer; the HttpServer
would generate HttpRequests and throw them at HttpServer peer (ie,
"router"); router could then create HttpClient instances which connect
to next-hops, pass it the request and if successful setup the
pipeline. The rest of the "session" would be between HttpServer and
HttpClient.

> IMO, pipeline design is too rigid for message adaptation and possibly
> even for general HTTP/1.1 proxying. There are too many exceptions where
> the "pipe" has to be turned around, plugged, split, merged, etc. I think
> we should avoid that pattern as the Architectural base. Simple pipes are
> good for connecting two endpoints but a complex pipe is a poor
> foundation for flexible message processing. We may have discussed that
> already so my apologies if I am repeating stuff.

Right; the problem at the moment is that I've not really got any way
of gauging how suitable or unsuitable this method is but other
projects (eg lighttpd) have a callback-driven pipeline which seems to
work well. I don't know how flexible it is or how it would work with
what complicated situations you envisage.

> Whether overall HTTP transaction processing is a "pipe with bumps in the
> middle" or a "query processing engine" dealing with several individual
> (but related in various ways) tasks is a significant Architectural
> decision.

Yup and as much as that needs to really be fleshed out a little more
w/ long term goals, I'd like to look at what can be built and tested
now that'll be both flexible enough to use in almost any situation
-and- can be integrated into something usable by end-users so they can
be involved.

> I think you are taking a big (and mostly unnecessary) risk: whatever
> code you end up with may be too difficult to merge back with Squid code,
> no matter what the merits of the new design are. It is essentially a
> "lets rewrite Squid from scratch using these great libraries" project,
> which is likely to meet a significant resistance. Of course, it might
> happen that your future libraries are just better versions of future
> Squid modules and then we can just use the best parts and avoid another
> rewrite.

.. which those at the meetup would've heard me saying is my ideal
end-goal - I'd like to have enough "stuff" written that can be
re-integrated back into the Squid-2 codebase piecemeal. Everything up
to the initial HTTP client/server split should be easy to integrate
back - I'm doing it right now in Cacheboy. The HTTP client/server
message-exchange code would just be a seperate set of libraries which
reuse the existing Squid core stuff. I'd use them in fleshing out
ideas far before I try integrating them into Squid or providing some
more concrete ideas on where things should head.

Adrian
Received on Tue Aug 26 2008 - 06:13:04 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 26 2008 - 12:00:07 MDT