Re: [PATCH] Agent base class from Amos Jeffries on 2014-03-04 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 05 Mar 2014 14:23:21 +1300

On 2014-03-05 11:51, Alex Rousskov wrote:
> Hello,
>
> I will start with the latest hierarchy draft. It is the same as my
> earlier proposal except IcapClient is no longer mentioned, to emphasize
> that it needs special care as described further below. I prefer not to
> discuss IcapClient further until HTTP and FTP cases are resolved
> (again,
> as explained in the discussion below). I also added group labels to
> clarify that "this is it" as far as leaf classes are concerned (e.g.,
> HttpClient contains all the code necessary to communicate with HTTP
> servers, initiate RESPMOD adaptation, feed the cache, etc.).
>
>
> Abstract base (things common across Squid):
> Agent: Common base for Client and Server agents. Uses Transport.
>
> Abstract side (things common on one Squid side):
> Client: Common base for HttpClient, FtpClient, etc.
> Server: Common base for HttpServer, FtpServer, etc.
>
> Leaf classes (they may have other parents but have no children):
> HttpClient: Sends HTTP requests. Receives HTTP responses.
> HttpServer: Receives HTTP requests. Sends HTTP responses.
> FtpClient: Sends FTP commands. Receives FTP replies.
> FtpServer: Receives FTP commands. Sends FTP replies.
>
>
>
> Perhaps surprisingly, we may have uncovered the biggest disagreement
> ever in this discussion: The Agent->Client->HttpClient branch in my
> hierarchy is meant to _include_ everything that Squid has to do when
> communicating with an HTTP server. In other words, my HttpClient
> (including all its parents code) is current HttpStateData (including
> all
> its parents code). This includes caching and adapting the response.
>
> I now begin to suspect that Amos' idea of the Agent hierarchy in
> general
> and HttpClient in particular may _exclude_ everything that is not HTTP.
> So things like caching and adapting the response would not be covered
> by
> Amos' HttpClient. This is just a feeling/speculation on my point, but
> if
> I am right it would explain a lot of arguments :-).
>
>
> The next portion of this email is about the overall approach to this
> (and similar) reviews. Detailed discussion follows that.
>
>
>> On 2014-03-04 07:23, Alex Rousskov wrote:
>>> IIRC, no comprehensive alternative has been proposed, but there are
>>> disagreements on various parts of the above and, hence, the submitted
>>> patch violates those contested parts. Once the agreement is reached,
>>> the
>>> patch can be fully reviewed, polished, and committed.
>
>
> On 03/03/2014 06:59 PM, Amos Jeffries wrote:
>> I dont see how that conclusion follows. The protocol and client/server
>> parts have not been implemented as distinct Agent hierarchy classes
>> yet.
>> So there is no code contradicting either design.
>
> I feel your frustration with me blocking your "simply detaching the
> Transport from ConnStateData" patch. Similar problems have happened
> before so please allow me to try to explain my position better.
>
> Many proposed changes fall into one of the two categories:
>
> * Modest, localized code improvements. For example, extract common code
> into a reusable function or method. Or add another leaf class to
> implement existing API for a different use case. "Replace wood floors
> with tile" to abuse the Museum analogy. Acceptance of such changes is
> driven primarily by their implementation because they do not change
> interfaces and future code direction in any significant way.
>
> * Fundamental changes affecting Squid code design or development
> direction. For example, change Squid class hierarchies and overall code
> design by rearranging classes, adding new base classes to existing
> and/or future class hierarchies, adding highly reusable classes, etc.
> "Deciding whether to convert an Ancient Civilization Gallery into a
> Museum of Modern Art or a Nature and Science Museum". Acceptance of
> such
> changes requires agreement on not just the specific lines of
> implementation code contained in a patch, but the overall design
> direction that these changes are tilting Squid towards.
>
> Your patch currently falls into the second category. Its acceptance
> requires general agreement on the Agent hierarchy design even though
> you
> are "just" adding a base class for that hierarchy. The definition of
> that base Agent class and the agreement on its kids shapes not only
> this
> appropriateness of this patch code, but how a lot of future Squid code
> will look like. The patch already integrates with other, existing
> pieces
> of the disputed hierarchy, but even if it did not, we must take the
> whole Agent hierarchy into account when deciding whether the base class
> is good enough or is going to misdirect future projects for the next
> few
> years.
>
> I certainly understand that these long discussions are costly. We are
> literally spending dozens and dozens of hours on this when other
> important projects suffer. However, I have to insist that there is an
> agreement on this critically important class hierarchy before its base
> class is committed and integrated with the official code.
>
> I would also like to point out that these painful discussions (and not
> just on this thread) already averted disasters by correcting design
> directions of major projects. While I am sure we will make design
> mistakes anyway, I think we would be far worse off without this review
> barrier.
>
> I am open to suggestions on how to shorten this process. Since we lack
> an Architect that usually makes final decisions in such cases, the only
> suggestion I could come up so far was an IRC meeting. I am also open to
> voice discussion if we can find appropriate hours for that. It would be
> great if more than two people participated!
>

I think we need to clarify an agreement as to whether all the direction
discussion is limited by this classes proposed implementation.

IMO:

1) The structure and positioning of Transfer protocol, Client, Server
and other logics within its child hierarchy is not affected by this API.
Any of the children can implement the virtuals and use the members. Any
class in a uses relationship can use the flow control methods.
- Thus both models we are advocating for future directions, and several
otehr possibilities seems to need this implemented.

2) Kids of this class could still have either used-by or inherits-from
relationship to the whole Client/Server hierarchy.
- Thus there is no limitation placed by this patch on which direction
the overall architecture goes in future.

3) This base class separates a large portion of the I/O logics out of
the client_side and http.cc code.
- Allowing greater clarity on deciding future directions. But IMO being
able to decide is not a good reason to block until those decisions are
made.
(I have laid out my roadmap plans so you can see that it hopes to
retain the existing client-streams model and streamline the code
overall).

>
> Detailed discussion below...
>
>
>> The focus of this patch has been specifically to prevent Transfer
>> protocol semantic details getting into the base Agent class.
>
> I think we are (and have always been?) in agreement that Agent should
> not have interfaces or code specific to a particular transfer protocol.
>

Yes. Which is why I am mystified why you want to block on planning for
followup implementations. This change does not restrict which model
those future implementations can follow.

>
>>> Agent -> Client -> FooClient
>>> Agent -> Client -> BarClient
>>> Agent -> Server -> FooServer
>>> Agent -> Server -> BarServer
>>>
>>> There will be other classes to encapsulate common transfer protocol
>>> code, but the above three layers is what we should be focusing on
>>> when
>>> discussing the overall Agent hierarchy shape IMO.
>
>
>> So which generic operations do you see as being client-specific but
>> not
>> also protocol-specific ?
>
> See ServerStateData as a good starting point. That class has a lot of
> Agent+Client-specific stuff that is not protocol-specific. Not
> surprisingly, it often either duplicates or mirrors the Agent+Server
> areas itemized below, so see that list as well.
>
> Please note that I find it useful to talk about Agent+Client
> functionality when answering such questions. Some actions will be
> handled in either Client exclusively, but all common code among Clients
> and Servers should trickle down to their Agent parent, of course.
>
> When I say "Client" or "Server" in such context, I think you sometimes
> misinterpret that as "in Client but not in Agent" or "in Server but not
> in Agent". I will try to be clearer about that in the future, but when
> it doubt, please assume that Client includes Agent (because it does)
> and
> that all shared code should trickle down to parents.
>
>
>> or server-specific but not protocol-specific ?
>
> I cannot offer a complete and polished list at this time, but I think
> Agent+Server responsibilities will revolve around these areas:
>
> * Store integration (the Store Client API, Store ID, etc.).
> * Adaptation integration (pre-cache REQMOD, post-cache RESPMOD, etc.).
> * BodyPipe integration (sending request bodies to other Squid sides).
> * Access control.
> * Persistent connection management.
> * Send/receive coordination.
> * Pipelining management.
>
> Again, the above list is not meant to be comprehensive, and some of the
> above may not actually belong to the Agent hierarchy itself (it depends
> on how we answer a few pending questions), but this should give you a
> rough outline of what should be possible.
>
>
>> I see the middle layer providing:
>> - pipeline list of messages not yet handled
>> - transfer-coding of messages
>> - generic loop running parser provided by child class to find
>> messages
>
>
> Yes, that sounds OK for my middle layer as well, but I think there is a
> lot more that we can and should do there. I also suspect de/encoding
> should be handled by parsing classes (different hierarchies), but that
> is not important (i.e., we should not decide that) right now.

Yes I was imagining a uses relationship with the chunked coder as seen
today on client_side*.cc.

>
>>> Please note that the existing code already demonstrates how we can
>>> have
>>> a generic Client class (the existing ServerStateData class is our
>>> future
>>> Client).
>
>> Note how it also conflates RESPMOD adaptation client/server operations
>> into each per-protocol server
>
> I do not know what "conflates operations into server" means, but
> ServerStateData handles most of the ICAP logic on its own,
> encapsulating
> shared adaptation code of FTP and HTTP Clients. This matches what I
> think Agent+Client should do!

HttpClient
- implements ICAP Transfer protocol
- implements HTTP Transfer protocol

FtpClient
- implements ICAP Transfer protocol
- implements FTP Transfer protocol

Not a good model.

ICAP in particular is a piece which should be moved to a used-by
relationship, just like Transport and Store.

>
>> and actually contains the whole
>> AnyP::Agent logic for ICAP connectivity separate from the protocol
>> logics.
>
> Sure, there is no Agent in current Squid so ServerStateData has to
> include both Agent and Client code. That will, of course, change as we
> move shared Client and Server code into Agent. Sorry if I should have
> been more explicit about it. Again, by default, a class covers what its
> parent(s) cover and common/shared class code should be trickled down to
> parents.
>
> ServerStateData is far from perfect, but as far as the class location
> in
> the hierarchy and the overall design is concerned, ServerStateData is a
> good starting point for the future Client agent IMO. Needless to say,
> some existing things should be moved further down into Agent and some
> missing things should be added to Client.
>
>
>> Once IcapClient is built from ServerStateData (aka Client)
>> things get recursive.
>
> When/if IcapClient is built from Client, Client will not be what
> ServerStateData is today, of course.
>
> ICAP Client and eCAP are important pieces of the puzzle that we should
> account for before the Agent hierarchy is finalized. I prefer to delay
> these discussions until HTTP/FTP dust settles, but I will note the
> following to clarify the scope of this important problem and extract it
> from the current discussion:
>
> Unlike FtpClient and HttpClient, ICAP and eCAP do not integrate with
> Squid using Store. From Squid integration point of view, adaptation is
> a
> different "side" of Squid: It is neither Client nor Server! From
> transfer protocol point of view, ICAP is a Client, but eCAP is not!
>
> I believe no single hierarchy can accommodate this juxtaposition of
> requirements, but I think we can solve this puzzle. Again, I would
> prefer not to discuss specific solutions until there is an agreement
> that accommodates FTP and HTTP needs (but not ICAP Client needs). Let's
> solve the simpler/base problem first!
>
> To emphasize this, I have removed IcapClient from my draft class
> descriptions at the top of this email.
>
>
>> Note how it is not actually used for any of the FTP protocol or gopher
>> protocol until after they have been mapped to HTTP classes and
>> semantics. Proving that ServerStateData is *not* shared Transfer
>> protocol logic, it is HTTP client logic facing an internal Transfer
>> protocol gateway/mapping.
>
> Actually, ServerStateData has interfaces that both FTP and HTTP classes
> use when handling the transfer connection. Does ServerStateData
> implement the entire "shared Transfer protocol logic"? No. IMO, it is
> not important right now to identify specific pieces that should be
> moved
> into or outside ServerStateData during its conversion to Client. The
> not-yet-agreed-upon definition of Client and related classes will
> define
> what code can stay and what should be moved.

While strangly this patch abstracts that Transport away from the
ServerStateData and into HttpStateData without affecting FTP in any way!

>
>> Here are the elephants in the room:
>> In all the transfer protocols we handle or propose to handle the
>> messages have a prefix+payload syntax in *both* directions.
>>
>> Any message MAY explicitly be sent from either client or server.
>> Generally requests come from clients and responses from servers. That
>> barrier is not strict, just a statement about "normally" in the specs.
>>
>> There is a major push in WebSockets, SPDY and HTTP/2 to make use of
>> that leniency. SPDY and HTTP/2 PUSH_PROMISE messages in particular
>> already cause your server/client separation to hit trouble. Since they
>> are messages received from server with all handling requirements of
>> requests.
>
> The server sending responses that are requests is interesting and may
> indeed be important important. Let's deal with that once we agree how
> to
> handle FTP and HTTP/1. I agree that it may require adjusting the
> hierarchy, but I fear our current disagreements go even beyond this
> level of detail. Once we agree on what how to handle the currently
> "normal" case, we shall consider any known future abnormalities.
>

This is the future "normal" and a future which will be "now" by the time
the current 3.HEAD goes stable by the look of it. If we design the
hierarchy ignoring it we will have to repeat the whole painful process
and likely discard the current decisions.

The difference is not big but important to the class intentions.

>
>> *For now* it still meets your model
>
> Good. I certainly do not refuse to consider future problems, but let's
> try to get an agreement on currently normal basics first.
>
>
>
>
>
>>>> FooClient
>>>> - transfer protocol logics on the message objects and state.
>>>
>>> There needs to be something about "Client" in your FooClient
>>> definition.
>>> For example, in my hierarchy, FooClient "sends Foo requests. Receives
>>> Foo responses."
>>>
>>
>> Sure. Sorry. Insert language to the effect of protocol-specific
>> directionality of requests and replies.
>
> I hesitate defining classes in your hierarchy. That seems totally
> inappropriate for me, especially given the current level of
> disagreement.
>
>
> My HttpClient (directly or via parents) also does ICAP using Squid
> Adaptation API and caches responses using Squid Store API. Does yours?
>
> My HttpServer (directly or via parents) also does access control, ICAP
> using Squid Adaptation API, and receives responses using Squid Store
> API. Does yours?
>
>
>
>>> I do not think Agent belongs to AnyP. Agent is a side-related thing,
>>> not
>>> protocol-related thing. We do not want to place everything that is
>>> not
>>> related to any specific protocol inside the AnyP namespace!
>
>> Thats what "any protocol" namespace was defined for. Your point that
>> this code is Transfer protocol level while agnostic to which Transfer
>> protocol proves that it is part of that category. The *child* classes
>> will not be.
>> ie AnyP:: is the bits shared between 2+ of: src/http/, src/ftp/,
>> src/icp, src/htcp, src/clients/, src/servers/
>
>
> No. AnyP is the bits shared between 2+ of: src/http/, src/ftp/,
> src/icp,
> src/htcp.
>
> "Sides" such as src/clients/ and src/servers are separate. Sides are
> not
> primarily about protocols (each side handles many protocols!). They are
> about access controls, integration with Squid store, adaptation
> vectoring points, and other things that make Squid sides substantially
> different from each other.

Indeed. And that was my point, AnyP differs from "sides". It *does*
focus on the shared parts both cross-protocol + cross-side where the
side-specific and protocol-specific code does not.

>
>>>> IMPORTANT: this change is *all* I am wanting to get merged into
>>>> trunk
>>>> right now. The rest of the code is still too tangled with other
>>>> stuff to
>>>> isolate the Http1Agent and Client/Server layers cleanly.
>>>> Yes this means that there *may* be pieces still to push into the
>>>> new
>>>> Agent later. But that will depend on much cleaner child class code
>>>> to
>>>> even spot correctly.
>
>>> Yes, of course. I do not suggest that you implement the whole
>>> hierarchy
>>> or even complete the Agent class in one patch. I only insist that we
>>> agree on what the hierarchy is, so that the Agent class can be
>>> defined,
>>> partially implemented, and reviewed correctly. This agreement is
>>> critical because it determines how to implement what you want to
>>> implement.
>
>> As far as I can see we agree on that
>
> The view from this side is different: I have never heard you say that
> you agree with my hierarchy (as defined on top of recent emails), and I
> certainly do not agree with yours (as I have been, possibly
> incorrectly,
> reconstructing it from bits and pieces of the discussion).

Sorry. I think we agree broadly on what goes into the hierarchy (for
HTTP/FTP anyway), we are just re-arranging where the bits are located. I
also agree on not implementing it yet.

>
>> and are just deciding on which wall
>> (child classes) the doors vs windows are to go.
>
> Sorry, I do not know what you mean by that.
>
>
>
>>> The change you want to get merged into trunk is not ready for merge.
>>> That does not mean everything in your patch is wrong, far from that!
>>
>> So far the blanket statement to hold off is not helping progress.
>> Please
>> point at a specific part of this patch that would be removed or even
>> significantly remodelled based on changes of plans in one of the
>> un-agreed details.
>
> I do not recommend doing it that way because I think it would be more
> productive to agree on what the hierarchy is before coding it up.
> However, if you insist, here is a list of specific patch changes that
> would bring your patch closer to what I can vote for:
>
> 1. Delete current Agent class description.
> 2. Add my hierarchy description (as a comment above Agent for now).
> 3. Use the Agent definition from #2 as the Agent class description.
> 4. Resubmit for review.
>
> Needless to say, doing the above is silly because I can easily imagine
> all those changes now, without you doing any work and because my
> hierarchy definition requires more work (i.e., #2 and #3 may change).
> However, if you insist on doing this via patch changes rather than
> getting to an agreement via email, the above steps use that approach.
>

This is my main point. The patch as implemented nicely matches both our
models and more without restricting future design directions. Lets merge
*then* discuss which direction the _remaining_ work goes. As I mentioned
my intention is to move on to the important parser work after this
rather than fleshing out the hierarchy. Either of our models may look
very different by the time that is finished and we need to discuss more.

>
>
>> In a 3-class hierarchy: A->B->C
>>
>> Does it matter whether a virtual method declared by A is defined in B
>> or
>> C ?
>> No.
>>
>> Does it matter whether a method presented by A is used in B or C ?
>> No.
>>
>> Does it matter that all child classes use it?
>> No.
>
> There are many other things that matter though: How the classes are
> described, what virtual methods are present, how virtual methods
> interact, what class members are exposed, etc.
>
> Remember that part of this review was specific, but it quickly became
> apparent that what I see as an implementation bug you consider the
> right
> behavior because we define the Agent class and the Agent hierarchy
> differently. Once we agree on the fundamentals, it would be possible to
> resolve arguments about the implementation details.
>

It is also becoming apparent that you consider this class far more
important than I ever intended it to be.

If I rename it yet again to AnyP::AgentIo and documented as the
"interface for performing Transport I/O operations implemented by
Transfer protocol agents". Does that simplify the scope of discussion
enough to eliminate the higher level model disagreements?
Then we can limit ourselves to better wording for that documentation
comment. Although I would like to avoid describing the whole Agent
hierarchy in this one class documentation just yet, particularly since
said hierarchy is not going to exist for some time ahead.

>
>>>> - I have not renamed the deceptive "tcp" Pointer. Was thinking of
>>>> calling it "transport" or something. Any ideas?
>>>
>>> inConn, to keep symmetry with inBuf?
>>>
>>
>> The conn is both in and out though.
>
> Good point. I do not think you can do much better than naming it "conn"
> or "transport" then. "Conn" is shorter and may minimize moved code
> differences in some cases, but if you prefer "transport", I would not
> object. In either case, you may document it as "primary transport used
> for receiving and sending Agent transfer protocol messages".
>

Will do.

>
>> So lets remember that in your model ClientAgent/ServerAgent are
>> level-2
>> classes shared by Transfer protocols.
>>
>> Then a quick glance at HTTP/2 where a _minimum_ of 100 *transactions*
>> are happening asynchronous and multiplexed over the same Transport and
>> I/O buffer. And each message may be split into N frames over that
>> Transport. Flow control for all that is performed with special frames
>> and doing that between multiple Agents would force Agent-Agent
>> communication which in my model is unnecessary.
>
> In my hierarchy, all from-client and to-client messages would be
> handled
> by a single Http2::Server agent and all from-server and to-server
> messages would be handled by a single Http2::Client agent. I though it
> would be the same in yours. If I was wrong, can you describe what
> agents
> will handle those four messages kinds in your hierarchy?

What you describe there would be the same in mine. With the per-protocol
specific parts buried layer-3 class.

You tied one Client to a transaction, and have described your model as
the Client being a shared level-2 class. In a Flow control has to be in
the level-3 class due to these HTTP/2 incompatibilities with the
existing protocols, which pushes the semantic operations of message
sending and receiving down to level-3. The state about which
transaction(s) are being performed can be in a generic level-2 class,
but the logics manipulating that state can't be.

>
>> Expected changes to AnyP::Agent in my model for implementing HTTP/2
>> are:
>> - adding a queue buffer internal to sendSomeData()
>>
>> Expected changes to your model for implementing HTTP/2 are:
>> - implementing entirely separate parallel layer-2 Client/Server child
>> classes to inherit from.
>
> I do not yet understand the above but I suspect your fears of extra
> layers are the result of a misunderstanding of what my Clients and
> Servers do. It should become clear once other, more important questions
> are resolved.

If you put for example, a sendNextMessage() method into a level-2 Client
class in your hierarchy (as I understand it).
The method would have to do things like blocking sending other messages
until it was completed. This is okay for HTTP/1 and FTP as the messages
are small and have same serial properties.
When HTTP/2 is added to the mix it cannot block sending other messages.
So the logic to do that is either pushed down to level-3 class or a
whole separate Client class has to be implemented - which defeats the
purpose of having a shared Client class as a parent to begin with.

>
>> * spawning a sub-Transport is shared by FTP (data channel spawned)
>> and
>> HTTP (CONNECT tunnel spawned)
>
> CONNECT handling results in two transports: To-client CONNECT transport
> is not a separate or supplementary transport. It is the same transport,
> used differently. To-server CONNECT transport is a new transport, but
> it
> is not a "spawned sub-transport" any more than any to-server HTTP
> connection triggered (on the other side of Squid!) by a GET request is.
>
> Thus, I see no significant similarity here except that FTP requires two
> connections (on one Squid side) and HTTP CONNECT requires two
> connections (total).
>

We seem to have a very different view of tunnel. Lets put that aside
then.

Amos
Received on Wed Mar 05 2014 - 01:23:33 MST

This archive was generated by hypermail 2.2.0 : Thu Mar 06 2014 - 12:00:12 MST