Re: [PATCH] Agent base class from Alex Rousskov on 2014-03-04 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 04 Mar 2014 15:51:36 -0700

Hello,

I will start with the latest hierarchy draft. It is the same as my
earlier proposal except IcapClient is no longer mentioned, to emphasize
that it needs special care as described further below. I prefer not to
discuss IcapClient further until HTTP and FTP cases are resolved (again,
as explained in the discussion below). I also added group labels to
clarify that "this is it" as far as leaf classes are concerned (e.g.,
HttpClient contains all the code necessary to communicate with HTTP
servers, initiate RESPMOD adaptation, feed the cache, etc.).

Abstract base (things common across Squid):
Agent: Common base for Client and Server agents. Uses Transport.

Abstract side (things common on one Squid side):
Client: Common base for HttpClient, FtpClient, etc.
Server: Common base for HttpServer, FtpServer, etc.

Leaf classes (they may have other parents but have no children):
   HttpClient: Sends HTTP requests. Receives HTTP responses.
   HttpServer: Receives HTTP requests. Sends HTTP responses.
   FtpClient: Sends FTP commands. Receives FTP replies.
   FtpServer: Receives FTP commands. Sends FTP replies.

Perhaps surprisingly, we may have uncovered the biggest disagreement
ever in this discussion: The Agent->Client->HttpClient branch in my
hierarchy is meant to _include_ everything that Squid has to do when
communicating with an HTTP server. In other words, my HttpClient
(including all its parents code) is current HttpStateData (including all
its parents code). This includes caching and adapting the response.

I now begin to suspect that Amos' idea of the Agent hierarchy in general
and HttpClient in particular may _exclude_ everything that is not HTTP.
So things like caching and adapting the response would not be covered by
Amos' HttpClient. This is just a feeling/speculation on my point, but if
I am right it would explain a lot of arguments :-).

The next portion of this email is about the overall approach to this
(and similar) reviews. Detailed discussion follows that.

> On 2014-03-04 07:23, Alex Rousskov wrote:
>> IIRC, no comprehensive alternative has been proposed, but there are
>> disagreements on various parts of the above and, hence, the submitted
>> patch violates those contested parts. Once the agreement is reached, the
>> patch can be fully reviewed, polished, and committed.

On 03/03/2014 06:59 PM, Amos Jeffries wrote:
> I dont see how that conclusion follows. The protocol and client/server
> parts have not been implemented as distinct Agent hierarchy classes yet.
> So there is no code contradicting either design.

I feel your frustration with me blocking your "simply detaching the
Transport from ConnStateData" patch. Similar problems have happened
before so please allow me to try to explain my position better.

Many proposed changes fall into one of the two categories:

* Modest, localized code improvements. For example, extract common code
into a reusable function or method. Or add another leaf class to
implement existing API for a different use case. "Replace wood floors
with tile" to abuse the Museum analogy. Acceptance of such changes is
driven primarily by their implementation because they do not change
interfaces and future code direction in any significant way.

* Fundamental changes affecting Squid code design or development
direction. For example, change Squid class hierarchies and overall code
design by rearranging classes, adding new base classes to existing
and/or future class hierarchies, adding highly reusable classes, etc.
"Deciding whether to convert an Ancient Civilization Gallery into a
Museum of Modern Art or a Nature and Science Museum". Acceptance of such
changes requires agreement on not just the specific lines of
implementation code contained in a patch, but the overall design
direction that these changes are tilting Squid towards.

Your patch currently falls into the second category. Its acceptance
requires general agreement on the Agent hierarchy design even though you
are "just" adding a base class for that hierarchy. The definition of
that base Agent class and the agreement on its kids shapes not only this
appropriateness of this patch code, but how a lot of future Squid code
will look like. The patch already integrates with other, existing pieces
of the disputed hierarchy, but even if it did not, we must take the
whole Agent hierarchy into account when deciding whether the base class
is good enough or is going to misdirect future projects for the next few
years.

I certainly understand that these long discussions are costly. We are
literally spending dozens and dozens of hours on this when other
important projects suffer. However, I have to insist that there is an
agreement on this critically important class hierarchy before its base
class is committed and integrated with the official code.

I would also like to point out that these painful discussions (and not
just on this thread) already averted disasters by correcting design
directions of major projects. While I am sure we will make design
mistakes anyway, I think we would be far worse off without this review
barrier.

I am open to suggestions on how to shorten this process. Since we lack
an Architect that usually makes final decisions in such cases, the only
suggestion I could come up so far was an IRC meeting. I am also open to
voice discussion if we can find appropriate hours for that. It would be
great if more than two people participated!

Detailed discussion below...

> The focus of this patch has been specifically to prevent Transfer
> protocol semantic details getting into the base Agent class.

I think we are (and have always been?) in agreement that Agent should
not have interfaces or code specific to a particular transfer protocol.

>> Agent -> Client -> FooClient
>> Agent -> Client -> BarClient
>> Agent -> Server -> FooServer
>> Agent -> Server -> BarServer
>>
>> There will be other classes to encapsulate common transfer protocol
>> code, but the above three layers is what we should be focusing on when
>> discussing the overall Agent hierarchy shape IMO.

> So which generic operations do you see as being client-specific but not
> also protocol-specific ?

See ServerStateData as a good starting point. That class has a lot of
Agent+Client-specific stuff that is not protocol-specific. Not
surprisingly, it often either duplicates or mirrors the Agent+Server
areas itemized below, so see that list as well.

Please note that I find it useful to talk about Agent+Client
functionality when answering such questions. Some actions will be
handled in either Client exclusively, but all common code among Clients
and Servers should trickle down to their Agent parent, of course.

When I say "Client" or "Server" in such context, I think you sometimes
misinterpret that as "in Client but not in Agent" or "in Server but not
in Agent". I will try to be clearer about that in the future, but when
it doubt, please assume that Client includes Agent (because it does) and
that all shared code should trickle down to parents.

> or server-specific but not protocol-specific ?

I cannot offer a complete and polished list at this time, but I think
Agent+Server responsibilities will revolve around these areas:

* Store integration (the Store Client API, Store ID, etc.).
* Adaptation integration (pre-cache REQMOD, post-cache RESPMOD, etc.).
* BodyPipe integration (sending request bodies to other Squid sides).
* Access control.
* Persistent connection management.
* Send/receive coordination.
* Pipelining management.

Again, the above list is not meant to be comprehensive, and some of the
above may not actually belong to the Agent hierarchy itself (it depends
on how we answer a few pending questions), but this should give you a
rough outline of what should be possible.

> I see the middle layer providing:
> - pipeline list of messages not yet handled
> - transfer-coding of messages
> - generic loop running parser provided by child class to find messages

Yes, that sounds OK for my middle layer as well, but I think there is a
lot more that we can and should do there. I also suspect de/encoding
should be handled by parsing classes (different hierarchies), but that
is not important (i.e., we should not decide that) right now.

>> Please note that the existing code already demonstrates how we can have
>> a generic Client class (the existing ServerStateData class is our future
>> Client).

> Note how it also conflates RESPMOD adaptation client/server operations
> into each per-protocol server

I do not know what "conflates operations into server" means, but
ServerStateData handles most of the ICAP logic on its own, encapsulating
shared adaptation code of FTP and HTTP Clients. This matches what I
think Agent+Client should do!

> and actually contains the whole
> AnyP::Agent logic for ICAP connectivity separate from the protocol
> logics.

Sure, there is no Agent in current Squid so ServerStateData has to
include both Agent and Client code. That will, of course, change as we
move shared Client and Server code into Agent. Sorry if I should have
been more explicit about it. Again, by default, a class covers what its
parent(s) cover and common/shared class code should be trickled down to
parents.

ServerStateData is far from perfect, but as far as the class location in
the hierarchy and the overall design is concerned, ServerStateData is a
good starting point for the future Client agent IMO. Needless to say,
some existing things should be moved further down into Agent and some
missing things should be added to Client.

> Once IcapClient is built from ServerStateData (aka Client)
> things get recursive.

When/if IcapClient is built from Client, Client will not be what
ServerStateData is today, of course.

ICAP Client and eCAP are important pieces of the puzzle that we should
account for before the Agent hierarchy is finalized. I prefer to delay
these discussions until HTTP/FTP dust settles, but I will note the
following to clarify the scope of this important problem and extract it
from the current discussion:

Unlike FtpClient and HttpClient, ICAP and eCAP do not integrate with
Squid using Store. From Squid integration point of view, adaptation is a
different "side" of Squid: It is neither Client nor Server! From
transfer protocol point of view, ICAP is a Client, but eCAP is not!

I believe no single hierarchy can accommodate this juxtaposition of
requirements, but I think we can solve this puzzle. Again, I would
prefer not to discuss specific solutions until there is an agreement
that accommodates FTP and HTTP needs (but not ICAP Client needs). Let's
solve the simpler/base problem first!

To emphasize this, I have removed IcapClient from my draft class
descriptions at the top of this email.

> Note how it is not actually used for any of the FTP protocol or gopher
> protocol until after they have been mapped to HTTP classes and
> semantics. Proving that ServerStateData is *not* shared Transfer
> protocol logic, it is HTTP client logic facing an internal Transfer
> protocol gateway/mapping.

Actually, ServerStateData has interfaces that both FTP and HTTP classes
use when handling the transfer connection. Does ServerStateData
implement the entire "shared Transfer protocol logic"? No. IMO, it is
not important right now to identify specific pieces that should be moved
into or outside ServerStateData during its conversion to Client. The
not-yet-agreed-upon definition of Client and related classes will define
what code can stay and what should be moved.

> Here are the elephants in the room:
> In all the transfer protocols we handle or propose to handle the
> messages have a prefix+payload syntax in *both* directions.
>
> Any message MAY explicitly be sent from either client or server.
> Generally requests come from clients and responses from servers. That
> barrier is not strict, just a statement about "normally" in the specs.
>
> There is a major push in WebSockets, SPDY and HTTP/2 to make use of
> that leniency. SPDY and HTTP/2 PUSH_PROMISE messages in particular
> already cause your server/client separation to hit trouble. Since they
> are messages received from server with all handling requirements of
> requests.

The server sending responses that are requests is interesting and may
indeed be important important. Let's deal with that once we agree how to
handle FTP and HTTP/1. I agree that it may require adjusting the
hierarchy, but I fear our current disagreements go even beyond this
level of detail. Once we agree on what how to handle the currently
"normal" case, we shall consider any known future abnormalities.

> *For now* it still meets your model

Good. I certainly do not refuse to consider future problems, but let's
try to get an agreement on currently normal basics first.

>>> FooClient
>>> - transfer protocol logics on the message objects and state.
>>
>> There needs to be something about "Client" in your FooClient definition.
>> For example, in my hierarchy, FooClient "sends Foo requests. Receives
>> Foo responses."
>>
>
> Sure. Sorry. Insert language to the effect of protocol-specific
> directionality of requests and replies.

I hesitate defining classes in your hierarchy. That seems totally
inappropriate for me, especially given the current level of disagreement.

My HttpClient (directly or via parents) also does ICAP using Squid
Adaptation API and caches responses using Squid Store API. Does yours?

My HttpServer (directly or via parents) also does access control, ICAP
using Squid Adaptation API, and receives responses using Squid Store
API. Does yours?

>> I do not think Agent belongs to AnyP. Agent is a side-related thing, not
>> protocol-related thing. We do not want to place everything that is not
>> related to any specific protocol inside the AnyP namespace!

> Thats what "any protocol" namespace was defined for. Your point that
> this code is Transfer protocol level while agnostic to which Transfer
> protocol proves that it is part of that category. The *child* classes
> will not be.
> ie AnyP:: is the bits shared between 2+ of: src/http/, src/ftp/,
> src/icp, src/htcp, src/clients/, src/servers/

No. AnyP is the bits shared between 2+ of: src/http/, src/ftp/, src/icp,
src/htcp.

"Sides" such as src/clients/ and src/servers are separate. Sides are not
primarily about protocols (each side handles many protocols!). They are
about access controls, integration with Squid store, adaptation
vectoring points, and other things that make Squid sides substantially
different from each other.

>>> IMPORTANT: this change is *all* I am wanting to get merged into trunk
>>> right now. The rest of the code is still too tangled with other stuff to
>>> isolate the Http1Agent and Client/Server layers cleanly.
>>> Yes this means that there *may* be pieces still to push into the new
>>> Agent later. But that will depend on much cleaner child class code to
>>> even spot correctly.

>> Yes, of course. I do not suggest that you implement the whole hierarchy
>> or even complete the Agent class in one patch. I only insist that we
>> agree on what the hierarchy is, so that the Agent class can be defined,
>> partially implemented, and reviewed correctly. This agreement is
>> critical because it determines how to implement what you want to
>> implement.

> As far as I can see we agree on that

The view from this side is different: I have never heard you say that
you agree with my hierarchy (as defined on top of recent emails), and I
certainly do not agree with yours (as I have been, possibly incorrectly,
reconstructing it from bits and pieces of the discussion).

> and are just deciding on which wall
> (child classes) the doors vs windows are to go.

Sorry, I do not know what you mean by that.

>> The change you want to get merged into trunk is not ready for merge.
>> That does not mean everything in your patch is wrong, far from that!
>
> So far the blanket statement to hold off is not helping progress. Please
> point at a specific part of this patch that would be removed or even
> significantly remodelled based on changes of plans in one of the
> un-agreed details.

I do not recommend doing it that way because I think it would be more
productive to agree on what the hierarchy is before coding it up.
However, if you insist, here is a list of specific patch changes that
would bring your patch closer to what I can vote for:

1. Delete current Agent class description.
2. Add my hierarchy description (as a comment above Agent for now).
3. Use the Agent definition from #2 as the Agent class description.
4. Resubmit for review.

Needless to say, doing the above is silly because I can easily imagine
all those changes now, without you doing any work and because my
hierarchy definition requires more work (i.e., #2 and #3 may change).
However, if you insist on doing this via patch changes rather than
getting to an agreement via email, the above steps use that approach.

> In a 3-class hierarchy: A->B->C
>
> Does it matter whether a virtual method declared by A is defined in B or
> C ?
> No.
>
> Does it matter whether a method presented by A is used in B or C ?
> No.
>
> Does it matter that all child classes use it?
> No.

There are many other things that matter though: How the classes are
described, what virtual methods are present, how virtual methods
interact, what class members are exposed, etc.

Remember that part of this review was specific, but it quickly became
apparent that what I see as an implementation bug you consider the right
behavior because we define the Agent class and the Agent hierarchy
differently. Once we agree on the fundamentals, it would be possible to
resolve arguments about the implementation details.

>>> - I have not renamed the deceptive "tcp" Pointer. Was thinking of
>>> calling it "transport" or something. Any ideas?
>>
>> inConn, to keep symmetry with inBuf?
>>
>
> The conn is both in and out though.

Good point. I do not think you can do much better than naming it "conn"
or "transport" then. "Conn" is shorter and may minimize moved code
differences in some cases, but if you prefer "transport", I would not
object. In either case, you may document it as "primary transport used
for receiving and sending Agent transfer protocol messages".

> So lets remember that in your model ClientAgent/ServerAgent are level-2
> classes shared by Transfer protocols.
>
> Then a quick glance at HTTP/2 where a _minimum_ of 100 *transactions*
> are happening asynchronous and multiplexed over the same Transport and
> I/O buffer. And each message may be split into N frames over that
> Transport. Flow control for all that is performed with special frames
> and doing that between multiple Agents would force Agent-Agent
> communication which in my model is unnecessary.

In my hierarchy, all from-client and to-client messages would be handled
by a single Http2::Server agent and all from-server and to-server
messages would be handled by a single Http2::Client agent. I though it
would be the same in yours. If I was wrong, can you describe what agents
will handle those four messages kinds in your hierarchy?

> Expected changes to AnyP::Agent in my model for implementing HTTP/2 are:
> - adding a queue buffer internal to sendSomeData()
>
> Expected changes to your model for implementing HTTP/2 are:
> - implementing entirely separate parallel layer-2 Client/Server child
> classes to inherit from.

I do not yet understand the above but I suspect your fears of extra
layers are the result of a misunderstanding of what my Clients and
Servers do. It should become clear once other, more important questions
are resolved.

> * spawning a sub-Transport is shared by FTP (data channel spawned) and
> HTTP (CONNECT tunnel spawned)

CONNECT handling results in two transports: To-client CONNECT transport
is not a separate or supplementary transport. It is the same transport,
used differently. To-server CONNECT transport is a new transport, but it
is not a "spawned sub-transport" any more than any to-server HTTP
connection triggered (on the other side of Squid!) by a GET request is.

Thus, I see no significant similarity here except that FTP requires two
connections (on one Squid side) and HTTP CONNECT requires two
connections (total).

Cheers,

Alex.
Received on Tue Mar 04 2014 - 22:52:04 MST

This archive was generated by hypermail 2.2.0 : Wed Mar 05 2014 - 12:00:09 MST