Re: Architecture Overview from Alex Rousskov on 2008-08-25 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 25 Aug 2008 23:42:20 -0600

On Tue, 2008-08-26 at 12:15 +1200, Amos Jeffries wrote:
> >
> > On Tue, 2008-08-26 at 02:23 +1200, Amos Jeffries wrote:
> >> Okay, got the pretty-picture drawn up.
> >>
> >> NP: this is drawn up as a high-level flow from my accumulated view of
> >> all our work to date and where its heading. That includes Adrian's
> >> Squid-2 work and where I see it most efficiently mapping into Squid-3.
> >>
> >> It should be very similar to how squid currently works. With a few major
> >> differences that we have all spoken and planned things around already.
> >
> > Thank you for working on this picture. I am not quite sure I interpret
> > it correctly, but I do see a few distinct objects there: Data Pump, HTTP
> > Parser, and Store. This is more or less clear, at least at this high
> > level.
> >
> > It is not clear to me whether the other blobs such as Protocol
> > Processing and Protocol Handling are flows, objects, or something else.
> > I am also not sure whether the arrows represent passing message data,
> > passing processing responsibility, or something else. Do different
> > colors and blob shapes mean something?
> >
> > If we want an architecture picture, I think it would be great if we can
> > formulate it in terms of objects and flows among them. This should make
> > roles and boundaries much more clear.
>
> Okay. The clouds are where I'm uncertain of the distinct content not
> knowing everything about Squid yet.
>
> - The protocol processing cloud is modules such as FTP, Gopher, HTTP,
> HTTPS?. Each being separate, but performing a 1-1 relationship with the
> request. A flow handled by a protocol 'manager' object.

Is HTTP protocol processing module a single class implementing both
client- and server-side processing?

> Forwarding Logic looks at the request just enough to decide where to shove
> it and passes it on to one of these.

Does it stay in the loop to, say, try another forwarding path? Does
forwarding logic know about caching?

> - The second cloud is the ACL handling, redirectors. We came up with some
> ideas at AusMeet that make all that a single object flow manager.
> Efficiency of that still needs to be checked.

Will those ideas be documented/discussed? Or is the current plan to test
performance first?

> - Arrows are callback/responsibility flow.

Callbacks and "responsibility" can be rather different things and can
"flow" in different directions. Perhaps the arrows can be removed (for
now) to avoid false implications?

> >> 1) Data Pump side-band. This is a processing-free 'pump' for any
> >> requests which do not actually need to go through Squid's twisted
> >> logics. With all logics compiled out it should be equivalent to a NOP.
> >> But can be used by things such as adaptation components as an IO
> >> abstraction.
> >
> > What data does the Data Pump pumps? Message bodies? What are the valid
> > ends of a pump? Can there be many Pumps per HTTP transaction? Does the
> > Pump communicate any metadata to the other side?
>
> Data pump moves bytes, from A to B. IO level provides all the hooks for it
> to do so. A and B could be sockets, buffers, pipes, handles, whatever gets
> micro-designed.

A pipe moves something from A to B. Is Data Pump a pipe? Pipes connect
two ends. Pumps have a single end that produces/generates/provides
something. You can put something into a pipe and get it on the other
end. You can only get something from a pump.

If Data Pump is a pipe, please note that the current pipes are slaves
(they are being told what to do). Are you proposing active pipes that
use some kind of unified I/O APIs to suck data from one end and push it
into the other?

Does Data Pump/Pipe store/buffer the bytes to give the other end a
chance to get ready for consumption?

> As for many pumps per transaction: Ideally 1 (zero-copy), realistically 2
> (client-side, and server-side).

I do not understand how a transaction can have one pump or even one pipe
(unless the pipe is bi-directional). Is Data Pump a bidirectional pipe
that can shovel bytes in both directions?

I apologize for so many questions, but the picture does not really
define these things and without knowing what the blobs are, how one can
evaluate the Architecture or one's compliance with it?

> Content-adaptation may need more to pump
> bytes out to the ICAP helper and back etc.

> > If adaptation components can use Data Pump as an I/O abstraction, should
> > not all other high-level components processing the transaction do the
> > same so that high-level I/O code could be reused among all the
> > components?
>
> Yes. The exception being quick forwarding logic which may handle accept()
> before bootstrapping it into a protocol manager or a 'tunnel' pump.
>
> >
> > The NOP equivalence mentioned above confuses me. Do you mean that the
> > pump does not copy data if it does not have to?
>
> Yes. As close to zero-copy as reasonably possible.
>
> >
> >> 2) Client Facing IO is unwound from all processing logics. It's simply a
> >> raw input layer to accept connections and interface to the clients.
> >
> > The "external" side of the Client Facing IO blob is socket API and such,
> > right? What is the Squid-side interface of the Client Facing IO blob? A
> > collection of portable socket-level routines? Some kind of a Transaction
> > object?
>
> Something. I'm not going into implementation details. I'm thinking the TCP
> listening sockets themselves.

I am not asking about implementation details. I am asking about
high-level interfaces of the blobs on the picture. Without that
knowledge, it is difficult to understand how the blobs are connected and
what they send to each other.

> > Is limiting the number of accepted connections a "processing logic" or
> > "Client Facing IO" logic?
>
> Limiting accepted connections? Why would we want to do that?

Because we are running out of resources and do not want to accept more
responsibility until we deal with what we already have? But this is not
critical at this point, there are much bigger questions so let's ignore
this one.

> delay_pools moves to a governor feature slowing the data pump. ACLs stay
> as forwarding logic assists on an if-needed basis.
>
>
> >
> >> 3) Server Facing IO is likewise unwound from processing logics AND from
> >> client IO logics. Though in reality it may share lowest level socket
> >> code.
> >>
> >> 4) Processing components are distinct. Each is fully optional and causes
> >> no run-time delay if not enabled.
> >
> > What decides which processing components are enabled for a given
> > transaction? Do processing components interact with each other or a
> > central "authority"? What is their input and output? Can you give a few
> > examples of components?
>
> Forwarding Logic. Or possibly an ACL/helper flow manager. How its coded
> defines whats done. Presently there is quite a chain of processing.

> We talked of a registry object which was squid.conf given a list and order
> for ACL, redirectors, etc. That would make the detailed state fiddling
> sit behind the single manager API.

Many processing decisions are not static so I doubt a registry object
driven by squid.conf can handle this. In fact, I suspect no single
object can handle this complexity so the responsibility to enable
processing components would have to be spread around processing
components (and forwarder), which makes things like a "single pipe with
bumps" design difficult to implement.

> > The text on the picture seems to imply that there can be only one
> > Processing Component active for a given transaction, which worries me,
> > but perhaps I just do not understand what kind of Components you are
> > describing here.
>
> The finer details may run in parallel within a module. But the high level
> processing sequence for any single request needs to be linear (or at least
> representable in a linear fashion) to be understandable.

I am not sure I agree, but let's wait until there is a processing
sequence on the picture.

> >> 5) Stores are an optional extra, if the configuration calls for caching.
> >> But not needed for basic operations.
> >
> > Is there a single global index of stored responses? If yes, is it
> > enabled only when caching is enabled?
>
> That would be an implementation details inside the Store module top left
> of the picture.
>
> IMO there should be a global API for storage. Whether that API loops
> through a single index or a set of per-Cache ones is a detail choice.

Sure, but it is important to decide whether the global index (or
equivalent store interface) exists when caching is disabled. Currently,
you get an index whether Squid (or a given caching scheme) needs it or
not. Do you propose that there is no such index?

> > Do you consider request merging a form of caching?
>
> I consider it a flow design issue. If forwarding wants to take a request
> and point it at an already filled buffer (from store, from live stream, or
> from /dev/zero) thats it's business.

How will it find a "filled buffer" to merge with if there is no index?

> >> If we all agree and work towards this type of model and things are kept
> >> modular isolated to the highest levels. I don't see the future
> >> integration of either squid branch or CacheBoy as being a big task.
> >
> > I think we would need a more detailed or precise architecture
> > description to be able to "work towards it" or, more precisely, to
> > identify code that does not satisfy the architectural constraints.
> > Otherwise, everybody will be claiming to conform to the Architecture
> > principles but there will be no improvement as far as merging Squid2 or
> > external code into Squid3.
>
> Agreed. That detailing is what we are starting now.
> The two orange clouds need to be fleshed out into named components, then
> on to slightly finer details.

All current blobs need better description/definition, IMO. A few more
blobs may need to be added. The next step would be to define the flows
(i.e., which blob talks to which and what do the send to each other).

> > BTW, the text descriptions you gave above appear much more useful than
> > the picture itself. Perhaps we can define the main objects and flows
> > better and then redraw the picture to match the descriptions? Should
> > this go into a wiki?

> If you can't find any flaws with that highest level flow design. We can
> wiki the progress so far and start iterating down to API definitions and
> TODO lists.

It is too early to find flaws. I do not understand the current picture
yet. What I am saying is that it may be easier to ignore the picture for
now, define a few blobs, and then try to draw it again.

It would also be nice to agree on how distant is the future that the
picture should reflect. Are we drawing Squid 3.3? Squid 10? The
Architecture picture would be quite different for those two examples...

Thanks,

Alex.
Received on Tue Aug 26 2008 - 05:43:06 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 26 2008 - 12:00:07 MDT