Re: Architecture Overview

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 26 Aug 2008 12:15:00 +1200 (NZST)

>
> On Tue, 2008-08-26 at 02:23 +1200, Amos Jeffries wrote:
>> Okay, got the pretty-picture drawn up.
>>
>> NP: this is drawn up as a high-level flow from my accumulated view of
>> all our work to date and where its heading. That includes Adrian's
>> Squid-2 work and where I see it most efficiently mapping into Squid-3.
>>
>> It should be very similar to how squid currently works. With a few major
>> differences that we have all spoken and planned things around already.
>
> Thank you for working on this picture. I am not quite sure I interpret
> it correctly, but I do see a few distinct objects there: Data Pump, HTTP
> Parser, and Store. This is more or less clear, at least at this high
> level.
>
> It is not clear to me whether the other blobs such as Protocol
> Processing and Protocol Handling are flows, objects, or something else.
> I am also not sure whether the arrows represent passing message data,
> passing processing responsibility, or something else. Do different
> colors and blob shapes mean something?
>
> If we want an architecture picture, I think it would be great if we can
> formulate it in terms of objects and flows among them. This should make
> roles and boundaries much more clear.

Okay. The clouds are where I'm uncertain of the distinct content not
knowing everything about Squid yet.

 - The protocol processing cloud is modules such as FTP, Gopher, HTTP,
HTTPS?. Each being separate, but performing a 1-1 relationship with the
request. A flow handled by a protocol 'manager' object.
Forwarding Logic looks at the request just enough to decide where to shove
it and passes it on to one of these.

 - The second cloud is the ACL handling, redirectors. We came up with some
ideas at AusMeet that make all that a single object flow manager.
Efficiency of that still needs to be checked.

 - Arrows are callback/responsibility flow.

>
>
>> 1) Data Pump side-band. This is a processing-free 'pump' for any
>> requests which do not actually need to go through Squid's twisted
>> logics. With all logics compiled out it should be equivalent to a NOP.
>> But can be used by things such as adaptation components as an IO
>> abstraction.
>
> What data does the Data Pump pumps? Message bodies? What are the valid
> ends of a pump? Can there be many Pumps per HTTP transaction? Does the
> Pump communicate any metadata to the other side?

Data pump moves bytes, from A to B. IO level provides all the hooks for it
to do so. A and B could be sockets, buffers, pipes, handles, whatever gets
micro-designed.

As for many pumps per transaction: Ideally 1 (zero-copy), realistically 2
(client-side, and server-side). Content-adaptation may need more to pump
bytes out to the ICAP helper and back etc.

>
> If adaptation components can use Data Pump as an I/O abstraction, should
> not all other high-level components processing the transaction do the
> same so that high-level I/O code could be reused among all the
> components?

Yes. The exception being quick forwarding logic which may handle accept()
before bootstrapping it into a protocol manager or a 'tunnel' pump.

>
> The NOP equivalence mentioned above confuses me. Do you mean that the
> pump does not copy data if it does not have to?

Yes. As close to zero-copy as reasonably possible.

>
>> 2) Client Facing IO is unwound from all processing logics. It's simply a
>> raw input layer to accept connections and interface to the clients.
>
> The "external" side of the Client Facing IO blob is socket API and such,
> right? What is the Squid-side interface of the Client Facing IO blob? A
> collection of portable socket-level routines? Some kind of a Transaction
> object?

Something. I'm not going into implementation details. I'm thinking the TCP
listening sockets themselves.

>
> Is limiting the number of accepted connections a "processing logic" or
> "Client Facing IO" logic?

Limiting accepted connections? Why would we want to do that?

delay_pools moves to a governor feature slowing the data pump. ACLs stay
as forwarding logic assists on an if-needed basis.

>
>> 3) Server Facing IO is likewise unwound from processing logics AND from
>> client IO logics. Though in reality it may share lowest level socket
>> code.
>>
>> 4) Processing components are distinct. Each is fully optional and causes
>> no run-time delay if not enabled.
>
> What decides which processing components are enabled for a given
> transaction? Do processing components interact with each other or a
> central "authority"? What is their input and output? Can you give a few
> examples of components?

Forwarding Logic. Or possibly an ACL/helper flow manager. How its coded
defines whats done. Presently there is quite a chain of processing.

We talked of a registry object which was squid.conf given a list and order
for ACL, redirectors, etc. That would make the detailed state fiddling
sit behind the single manager API.

>
> The text on the picture seems to imply that there can be only one
> Processing Component active for a given transaction, which worries me,
> but perhaps I just do not understand what kind of Components you are
> describing here.

The finer details may run in parallel within a module. But the high level
processing sequence for any single request needs to be linear (or at least
representable in a linear fashion) to be understandable.

>
>> 5) Stores are an optional extra, if the configuration calls for caching.
>> But not needed for basic operations.
>
> Is there a single global index of stored responses? If yes, is it
> enabled only when caching is enabled?

That would be an implementation details inside the Store module top left
of the picture.

IMO there should be a global API for storage. Whether that API loops
through a single index or a set of per-Cache ones is a detail choice.

>
> Do you consider request merging a form of caching?

I consider it a flow design issue. If forwarding wants to take a request
and point it at an already filled buffer (from store, from live stream, or
from /dev/zero) thats it's business.

>
> You have not mentioned the Control Logic blob. Is it pretty much the
> same as the "Processing Components" blob? What does it control?

Security and State. It's currently in Squid as ACLs, redirector hooks, XFF
etc.

We still need to flesh out the best possible flows there. And to iterate
into the pic the exact modules needed.

>
>> If we all agree and work towards this type of model and things are kept
>> modular isolated to the highest levels. I don't see the future
>> integration of either squid branch or CacheBoy as being a big task.
>
> I think we would need a more detailed or precise architecture
> description to be able to "work towards it" or, more precisely, to
> identify code that does not satisfy the architectural constraints.
> Otherwise, everybody will be claiming to conform to the Architecture
> principles but there will be no improvement as far as merging Squid2 or
> external code into Squid3.

Agreed. That detailing is what we are starting now.
The two orange clouds need to be fleshed out into named components, then
on to slightly finer details.

>
> BTW, the text descriptions you gave above appear much more useful than
> the picture itself. Perhaps we can define the main objects and flows
> better and then redraw the picture to match the descriptions? Should
> this go into a wiki?
>

If you can't find any flaws with that highest level flow design. We can
wiki the progress so far and start iterating down to API definitions and
TODO lists.

Amos
Received on Tue Aug 26 2008 - 00:15:05 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 26 2008 - 12:00:07 MDT