Re: Architecture Overview from Alex Rousskov on 2008-08-26 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 26 Aug 2008 09:44:00 -0600

On Wed, 2008-08-27 at 01:23 +1200, Amos Jeffries wrote:

> >> Forwarding Logic looks at the request just enough to decide where to shove
> >> it and passes it on to one of these.
> >
> > Does it stay in the loop to, say, try another forwarding path?
>
> No. If another path is needed responsible module needs to explicitly
> pass it back into forwarding logic, with whatever new state the FL might
> need to deal with it properly (error page result being one such case).
>
> Same goes for any module handing off responsibility to non-specific
> destination.

Understood. You want every "Responsible Module" to know about the
Forwarding Logic object and switch responsibility to that object if the
transaction needs to be re-forwarded:

Client => FL -> RM -> FL -> RM -> FL -> ... -> RM => Server

> >> - Arrows are callback/responsibility flow.
> >
> > Callbacks and "responsibility" can be rather different things and can
> > "flow" in different directions. Perhaps the arrows can be removed (for
> > now) to avoid false implications?
>
> True, but in Squid responsibility for current code operations on state
> flows down the callbacks at present.

I am not sure what that means. In Squid, transaction state spreads
around the entire code base, with many objects being responsible for a
single transaction at any given time. Callbacks are a low-level detail.

I suggest that we define some arrows as "passing responsibility for the
transaction" (i.e., removing self from the loop) and others as "sharing
responsibility for the transaction" OR remove arrows completely for now.

> >>>> 1) Data Pump side-band. This is a processing-free 'pump' for any
> >>>> requests which do not actually need to go through Squid's twisted
> >>>> logics. With all logics compiled out it should be equivalent to a NOP.
> >>>> But can be used by things such as adaptation components as an IO
> >>>> abstraction.
...
> Contrary to Adrians latest pump statements, I'm still envisaging a data
> pump as a one-way. From source to sink, whatever those may be.

Noted.

> Yes my vision of it is a slave told where the source/sink/buffer is and
> left at it to completion.
>
> This lends itself to the which HTTP model, one pipe reads headers into a
> buffer and passes that in, then when whatever logics handle the headers
> asks pump to read the body from source to a given sink (cache object,
> adaptation buffer or clients tcp socket for three likely examples).

If the primary function of this object is dumb data transfer from source
A to sink B, hiding A and B details from each-other, you have a
unidirectional pipe. Can we rename Data Pump to Data Pipe?

If the primary function of this object is to intelligently produce data
for any user that knows object's interface, then you have a data
generator or pump. We should continue using the name Data Pump but drop
the notion that it connects something to something.

You may use a dumb pipe to connect that smart pump to the appropriate
sink, without exposing the sink details to the pump and vice versa.

> The core function of pump would be to handle non-adapted tunnel traffic
> or request bodies, which may be very large amounts of bytes needed for
> socket A to socket B with nothing but size accounting or speed delays
> between.
>
> Most modules really should be acting on a buffer pre-filled by a pump
> somewhere, and passed non-copy (excepting the adapters of course) as
> part of the request state.

I think you are talking about a data pipe. Let's rename that object!

> Okay. The two yellow bars for IO, are whats left of the comm layer (and
> SSL layer) after its been slim-lined down to simply handle the sockets
> and setup initial state objects on accept(). Everything else, from byte
> reads to byte writes lies between them in one place or another
> (read/write as part of the pump).

Oof! Just when I thought it is clear we have a dumb data pipe, we are
dealing with a smart pump again. A bi-directional pump that knows how to
do socket reads and writes. Can we go back, please?

At the bottom of the picture, we should have a Request Producer (a
source or pump) and a Response Consumer (a sink). On top, a Request
Consumer (a sink) and Response Producer (a source or pump). There may be
other Producers and Consumers in the picture.

Sources and sinks can be connected with unidirectional pipes that
neither read (produce) nor write (get rid of), they just transfer.
Whether they also pass metadata is an open question.

Some Producers and Consumers use a comm or I/O layer to read and write
data. How smart those Producers and Consumers are is an open question.

> >>>> 4) Processing components are distinct. Each is fully optional and causes
> >>>> no run-time delay if not enabled.
> >>> What decides which processing components are enabled for a given
> >>> transaction? Do processing components interact with each other or a
> >>> central "authority"? What is their input and output? Can you give a few
> >>> examples of components?
> >> Forwarding Logic. Or possibly an ACL/helper flow manager. How its coded
> >> defines whats done. Presently there is quite a chain of processing.
> >
> >> We talked of a registry object which was squid.conf given a list and order
> >> for ACL, redirectors, etc. That would make the detailed state fiddling
> >> sit behind the single manager API.
> >
> > Many processing decisions are not static so I doubt a registry object
> > driven by squid.conf can handle this. In fact, I suspect no single
> > object can handle this complexity so the responsibility to enable
> > processing components would have to be spread around processing
> > components (and forwarder), which makes things like a "single pipe with
> > bumps" design difficult to implement.
>
> I think one of us mis-understands. Adrian explained the current flow of
> security processing in Squid was something like:
> cachable -> http_access ACLs -> FXFF ACLs -> http_access ACLs -> blah blah

I did not realize the above is restricted to simple "security/ACLs
processing". I was talking about a more general/complex case where
decisions are dynamic (e.g., ICAP or url_rewriter action decides what
the next processing component is).

Clearly, there is no need to touch FXFF ACLs if they are not configured.
IIRC, some optional components are already efficiently bypassed, but I
agree that it would be nice to clean that processing up.

So we can agree that some part of the processing has a well-known fixed
order and should skip disabled components. Some other processing steps
are determined dynamically. There will not be a single object that knows
or determines all steps.

> > Sure, but it is important to decide whether the global index (or
> > equivalent store interface) exists when caching is disabled. Currently,
> > you get an index whether Squid (or a given caching scheme) needs it or
> > not. Do you propose that there is no such index?
>
> With this architecture you could disable caching entirely to the point
> of not being compiled in. It's irrelevant outside the store module. All
> the other modules need to see is a buffer of data or its absence.

...

> The ForwardingLogic my have an internal hash/cache/index of in-transit
> URLs if it really need to, but its won't involve the store. It's in-flow
> data.
>
> (NP: this model also applies to broadcast streams if we want to go that
> way eventually).

Right! That is what I was asking about. Store index, if any, is
invisible to the rest of Squid. There may be a [process-]global index of
current (live) transactions with its own matching logic.

> > It would also be nice to agree on how distant is the future that the
> > picture should reflect. Are we drawing Squid 3.3? Squid 10? The
> > Architecture picture would be quite different for those two examples...
>
> Really? A good architecture, (which I am aiming for here) would look the
> same for both. With possibly different names or larger numbers of blobs,
> maybe finer detail the older it gets.

I disagree. Squid10 might cache p2p traffic, do video streaming using 10
CPU cores, and perform live conversation translation on iPhone. A lot of
things will change that can significantly affect the Architecture. A
good Architecture is not the one that does not change (that is
impossible); it is the one that adapts to new demands without major
rewrites.

It is of course possible to remain so abstract that virtually everything
fits:

Input -> Processing -> Output

but the practical value of such Architecture is small.

> Theres firstly a lot of work to get to anything like this end-product.
> Though we could achieve it by 3.2 if we all agreed and set out to do
> just that.
>
> Afterwards, a vastly larger array of possible 'pluggable' bits that can
> be integrated. As individual implementations of the blobs.

In my experience, working on a perfect abstract software (in hope to
"just plug bits" later) is less effective than supporting a few major
bits from the first release. IMO, we should not put in the Architecture
blobs that we are not going to implement in the foreseeable future. And,
vice versa, we should put major blobs that we are going to implement (or
keep).

HTH,

Alex.
Received on Tue Aug 26 2008 - 15:44:46 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 26 2008 - 12:00:07 MDT