Re: [PATCH] Agent base class from Alex Rousskov on 2014-03-05 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 05 Mar 2014 12:32:19 -0700

Executive summary:

* No progress on agreeing whether completing the hierarchy sketch or
committing its base class should be the next major step.

* No progress on disagreements regarding specific hierarchy aspects.

* BUT, I may have perfected my understanding of why we are unable to
make progress. Earlier, I speculated that Amos' Agent hierarchy excludes
side-specific things such as adaptation and cache integration while mine
does not. Now, I speculate that the difference may be even more
fundamental: Amos and I may be talking about different hierarchies that
merge at the top. I will sketch the diagram here as a teaser for the
detailed explanation and the end of the discussion that follows:

SideAgent->ClientAgent->HttpClient<-HttpAgent<-ProtoAgent
SideAgent->ServerAgent->HttpServer<-HttpAgent<-ProtoAgent

On 03/04/2014 06:23 PM, Amos Jeffries wrote:
> On 2014-03-05 11:51, Alex Rousskov wrote:
>> Abstract base (things common across Squid):
>> Agent: Common base for Client and Server agents. Uses Transport.
>>
>> Abstract side (things common on one Squid side):
>> Client: Common base for HttpClient, FtpClient, etc.
>> Server: Common base for HttpServer, FtpServer, etc.
>>
>> Leaf classes (they may have other parents but have no children):
>> HttpClient: Sends HTTP requests. Receives HTTP responses.
>> HttpServer: Receives HTTP requests. Sends HTTP responses.
>> FtpClient: Sends FTP commands. Receives FTP replies.
>> FtpServer: Receives FTP commands. Sends FTP replies.

> I think we need to clarify an agreement as to whether all the direction
> discussion is limited by this classes proposed implementation.

Any implementation can be rewritten so, in that sense, it does not limit
any discussion. It is all subjective and relative, so I cannot formally
prove it, but, in this particular case, I believe that

* the "cost" of not committing the proposed code immediately is low,
* while the "cost" of rewriting follow-up code may be very high,

making an immediate commit without an agreement a worse idea than
reaching an agreement first.

I believe that if we accept your relatively small code changes now,
without an agreement on whether they are correct and where we are going,
then we would see far larger changes being submitted for review, and
those larger changes are going to be far more difficult to rewrite. I
think we should reach a design agreement first, before committing the
base class of that future design and giving you a green light on
expanding those changes in various directions (that we have no agreement
on).

IMO, we need to decide on the overall design of those changes before you
continue working on them. Deciding on the Agent hierarchy is the best
next step in that direction. Committing your current code is not.

> 1) The structure and positioning of Transfer protocol, Client, Server
> and other logics within its child hierarchy is not affected by this API.
> Any of the children can implement the virtuals and use the members. Any
> class in a uses relationship can use the flow control methods.
> - Thus both models we are advocating for future directions, and several
> otehr possibilities seems to need this implemented.

The code similar to the one you have posted, in some shape of form, is
and will continue to be in Squid, yes, but the location of that code and
its interaction with the rest of Squid code may differ, depending on how
the hierarchy is structured. One specific example is below.

> 2) Kids of this class could still have either used-by or inherits-from
> relationship to the whole Client/Server hierarchy.
> - Thus there is no limitation placed by this patch on which direction
> the overall architecture goes in future.

The patch code may need to be changed depending on how other classes are
going to use it. And other classes depend on the outcome of this
discussion. As I said earlier, this discussion already triggered several
patch code changes, and more changes may be necessary but it is not
possible for me to review/suggest them without knowing where we are going.

For example, the following patch code

> + if (io.flag != COMM_OK) {
> + debugs(5, 2, tcp << ": got flag " << io.flag);
> + noteTransportReadError(io.xerrno);
> + io.conn->close();
> + return;
> + }

may be OK if we assume that all agents have a single transport
connection. In my hierarchy, FTP agents have two transport connections
so the above method either needs to be renamed (to be specific to the
primary transport connection) or needs a transport parameter (so that
the FTP agent can distinguish a failing control connection from a
failing data connection). If your hierarchy limits agents to one
transport connection, then there is no problem at all.

Furthermore, if the whole grow-buffer-and-read logic is placed into a
dedicated class separate from the Agent hierarchy, then error reporting
(to the agent that uses that dedicated class for reading) needs a
completely different design!

> 3) This base class separates a large portion of the I/O logics out of
> the client_side and http.cc code.
> - Allowing greater clarity on deciding future directions. But IMO being
> able to decide is not a good reason to block until those decisions are
> made.
> (I have laid out my roadmap plans so you can see that it hopes to
> retain the existing client-streams model and streamline the code overall).

Could you point me to those plans, please? I do not recall anything
specific enough that matches the scope of this discussion, but it is
possible that I missed, forgot, or misinterpreted something.

>> I think Agent+Server responsibilities will revolve around these areas:
>>
>> * Store integration (the Store Client API, Store ID, etc.).
>> * Adaptation integration (pre-cache REQMOD, post-cache RESPMOD, etc.).
>> * BodyPipe integration (sending request bodies to other Squid sides).
>> * Access control.
>> * Persistent connection management.
>> * Send/receive coordination.
>> * Pipelining management.
>>
>> Again, the above list is not meant to be comprehensive, and some of the
>> above may not actually belong to the Agent hierarchy itself (it depends
>> on how we answer a few pending questions), but this should give you a
>> rough outline of what should be possible.

>>>> Please note that the existing code already demonstrates how we can have
>>>> a generic Client class (the existing ServerStateData class is our
>>>> future
>>>> Client).
>>
>>> Note how it also conflates RESPMOD adaptation client/server operations
>>> into each per-protocol server
>>
>> I do not know what "conflates operations into server" means, but
>> ServerStateData handles most of the ICAP logic on its own, encapsulating
>> shared adaptation code of FTP and HTTP Clients. This matches what I
>> think Agent+Client should do!
>
> HttpClient
> - implements ICAP Transfer protocol
> - implements HTTP Transfer protocol
>
> FtpClient
> - implements ICAP Transfer protocol
> - implements FTP Transfer protocol
>
> Not a good model.

Indeed. Fortunately, nobody is proposing that kind of model. In my
model, ICAP transfer protocol is handled by adaptation classes outside
of HttpClient and FtpClient code.

Client handles ICAP RESPMOD (or adaptation RESPMOD, to be more precise)
integration logic, not the ICAP transfer protocol itself. This is
similar to how ServerStateData handles that integration logic today.
Adaptation integration logic includes things like deciding when and
whether to start adaptation, sending the transfer protocol message
(e.g., HTTP or FTP response) for adaptation, and handling adaptation
results.

> ICAP in particular is a piece which should be moved to a used-by
> relationship, just like Transport and Store.

ICAP transfer protocol handling already is, and will remain that way.
Adaptation RESPMOD integration is in a Client-like class today, and is
likely to remain in Client or a non-leaf Agent hierarchy class (because
that code is shared among leaf classes), but might also be extracted
into a dedicated class (that currently does not exist) if we decide that
it the best way to reuse that shared code.

>>> Note how it is not actually used for any of the FTP protocol or gopher
>>> protocol until after they have been mapped to HTTP classes and
>>> semantics. Proving that ServerStateData is *not* shared Transfer
>>> protocol logic, it is HTTP client logic facing an internal Transfer
>>> protocol gateway/mapping.
>>
>> Actually, ServerStateData has interfaces that both FTP and HTTP classes
>> use when handling the transfer connection. Does ServerStateData
>> implement the entire "shared Transfer protocol logic"? No. IMO, it is
>> not important right now to identify specific pieces that should be moved
>> into or outside ServerStateData during its conversion to Client. The
>> not-yet-agreed-upon definition of Client and related classes will define
>> what code can stay and what should be moved.
>
> While strangly this patch abstracts that Transport away from the
> ServerStateData and into HttpStateData without affecting FTP in any way!

I see nothing strange in the fact that you can [temporary] plug the
Agent node in the middle of the ServerStateData hierarchy to avoid
rewriting FTP code to comply with the new Transport handling interface.
If done right, everything should continue to "work" after that.

There is more to Transport handling than what your patch is abstracting.
The not-yet-abstracted bits provided by ServerStateData should continue
to work correctly for FTP.

For example, ServerStateData::sendMoreRequestBody() prevents more
request body from being sent to the closed transport connection. That
code is shared among FTP and HTTP clients but is not (and probably never
will be) abstracted into Agent. This is just the first example I found.

>>> Here are the elephants in the room:
>>> In all the transfer protocols we handle or propose to handle the
>>> messages have a prefix+payload syntax in *both* directions.
>>>
>>> Any message MAY explicitly be sent from either client or server.
>>> Generally requests come from clients and responses from servers. That
>>> barrier is not strict, just a statement about "normally" in the specs.
>>>
>>> There is a major push in WebSockets, SPDY and HTTP/2 to make use of
>>> that leniency. SPDY and HTTP/2 PUSH_PROMISE messages in particular
>>> already cause your server/client separation to hit trouble. Since they
>>> are messages received from server with all handling requirements of
>>> requests.
>>
>> The server sending responses that are requests is interesting and may
>> indeed be important important. Let's deal with that once we agree how to
>> handle FTP and HTTP/1. I agree that it may require adjusting the
>> hierarchy, but I fear our current disagreements go even beyond this
>> level of detail. Once we agree on what how to handle the currently
>> "normal" case, we shall consider any known future abnormalities.
>>
>
> This is the future "normal" and a future which will be "now" by the time
> the current 3.HEAD goes stable by the look of it. If we design the
> hierarchy ignoring it we will have to repeat the whole painful process
> and likely discard the current decisions.
>
> The difference is not big but important to the class intentions.

As you know, I agree that we should not ignore these needs. I just
suggest that we can be more efficient if we try to agree on the design
that accommodates current (and, hence, better-understood) needs and then
move on to whatever design changes are necessary to accommodate future
needs (some aspect of which are still in the process of being shaped
AFAICT). Since making that very first step is already very difficult, I
suggest that we do not try to jump.

Every time one of us brings up HTTP/2 during the initial stages of this
discussion, the other person has to either ignore those remarks (not
polite and may be misinterpreted as agreement) or discuss them
(lengthens the discussion further). If we were able to agree on HTTP/1
and FTP basics quickly, I would not mind throwing HTTP/2 into the
initial mix. Since we cannot, I suggest we ignore HTTP/2 for now. Yes,
there is a risk that we will need to redo the design from scratch once
HTTP/2 is introduced, but I seriously doubt it will happen.

>> My HttpClient (directly or via parents) also does ICAP using Squid
>> Adaptation API and caches responses using Squid Store API. Does yours?
>>
>> My HttpServer (directly or via parents) also does access control, ICAP
>> using Squid Adaptation API, and receives responses using Squid Store
>> API. Does yours?

If possible, please answer the above questions. You answers will help me
understand your proposed hierarchy.

>>>> I do not think Agent belongs to AnyP. Agent is a side-related thing,
>>>> not protocol-related thing. We do not want to place everything that is not
>>>> related to any specific protocol inside the AnyP namespace!

>>> Thats what "any protocol" namespace was defined for. Your point that
>>> this code is Transfer protocol level while agnostic to which Transfer
>>> protocol proves that it is part of that category. The *child* classes
>>> will not be.
>>> ie AnyP:: is the bits shared between 2+ of: src/http/, src/ftp/,
>>> src/icp, src/htcp, src/clients/, src/servers/

>> No. AnyP is the bits shared between 2+ of: src/http/, src/ftp/, src/icp,
>> src/htcp.
>>
>> "Sides" such as src/clients/ and src/servers are separate. Sides are not
>> primarily about protocols (each side handles many protocols!). They are
>> about access controls, integration with Squid store, adaptation
>> vectoring points, and other things that make Squid sides substantially
>> different from each other.

> Indeed. And that was my point, AnyP differs from "sides". It *does*
> focus on the shared parts both cross-protocol + cross-side where the
> side-specific and protocol-specific code does not.

I agree with your statement, but I think the conclusion you are making
is wrong. A lot of Squid code is "both cross-protocol + cross-side".
AnyP is just for the protocol-focused code that is not big enough to
deserve its own /src/p/ directory.

For example, cbdata does not belong to AnyP. Same for Agent.

If we were so inclined, we could create src/anys/ (for "any side") and
place Agent there, but I do not think we should do that now.

[N.B. If my guess below about us designing different hierarchies is
correct, then _your_ ProtoAgent can indeed be placed in AnyP but mine
SideAgent cannot be!]

>> I do not recommend doing it that way because I think it would be more
>> productive to agree on what the hierarchy is before coding it up.
>> However, if you insist, here is a list of specific patch changes that
>> would bring your patch closer to what I can vote for:
>>
>> 1. Delete current Agent class description.
>> 2. Add my hierarchy description (as a comment above Agent for now).
>> 3. Use the Agent definition from #2 as the Agent class description.
>> 4. Resubmit for review.
>>
>> Needless to say, doing the above is silly because I can easily imagine
>> all those changes now, without you doing any work and because my
>> hierarchy definition requires more work (i.e., #2 and #3 may change).
>> However, if you insist on doing this via patch changes rather than
>> getting to an agreement via email, the above steps use that approach.

> This is my main point. The patch as implemented nicely matches both our
> models and more without restricting future design directions. Lets merge
> *then* discuss which direction the _remaining_ work goes.

Before your patch can be merged, I need to finish reviewing it. Today, I
can only resume reviewing it using my hierarchy. I can do that if you
insist, but it will either not work because you will refuse to make the
corresponding changes (if any) OR, worse, it will result in you writing
even more code in the non-agreed-upon direction (after the commit).

Why is it so important for you to commit this patch now? What does it
fix or unblock?

> As I mentioned
> my intention is to move on to the important parser work after this
> rather than fleshing out the hierarchy. Either of our models may look
> very different by the time that is finished and we need to discuss more.

I suspect your important parser work will be affected by the hierarchy
design. As I wrote earlier in this email, I think we should squash this
problem now rather than writing more code in conflicting directions.

> you consider this class far more
> important than I ever intended it to be.

Very true.

> If I rename it yet again to AnyP::AgentIo and documented as the
> "interface for performing Transport I/O operations implemented by
> Transfer protocol agents". Does that simplify the scope of discussion
> enough to eliminate the higher level model disagreements?

You can try that route, but then I may either argue that such a class is
not needed at all, or we will still argue about the class description,
namespace, name, method profiles, and probably some implementation details.

You are proposing to encapsulate key code from several very different
classes working on different Squid sides. Avoiding presenting it as the
cornerstone of a major hierarchy may help, but this work is unlikely to
be easy no matter how you present it. As discussed many emails ago,
focusing on one existing user class may be a better way forward if you
want to avoid being blocked by these discussions.

>>> So lets remember that in your model ClientAgent/ServerAgent are level-2
>>> classes shared by Transfer protocols.
>>>
>>> Then a quick glance at HTTP/2 where a _minimum_ of 100 *transactions*
>>> are happening asynchronous and multiplexed over the same Transport and
>>> I/O buffer. And each message may be split into N frames over that
>>> Transport. Flow control for all that is performed with special frames
>>> and doing that between multiple Agents would force Agent-Agent
>>> communication which in my model is unnecessary.
>>
>> In my hierarchy, all from-client and to-client messages would be handled
>> by a single Http2::Server agent and all from-server and to-server
>> messages would be handled by a single Http2::Client agent. I though it
>> would be the same in yours. If I was wrong, can you describe what agents
>> will handle those four messages kinds in your hierarchy?
>
> What you describe there would be the same in mine.

If agents are doing the same thing in both models, then I do not
understand why you are saying that my model will "force Agent-Agent
communication" which is "unnecessary" in your model (but I will make a
wild guess further below **)

> You tied one Client to a transaction,

I do not recall doing that. I did queued a question on whether an Agent
should handle a single transaction or multiple, but my current hierarchy
description allows both answers.

> and have described your model as
> the Client being a shared level-2 class.

That part is true.

> In a Flow control has to be in
> the level-3 class due to these HTTP/2 incompatibilities with the
> existing protocols, which pushes the semantic operations of message
> sending and receiving down to level-3. The state about which
> transaction(s) are being performed can be in a generic level-2 class,
> but the logics manipulating that state can't be.

Sorry, too many undefined terms for me to agree or disagree with that.
In the current code, the flow control (not sure if my interpretation of
that term matches yours!) is spread around several hierarchy layers. I
do not yet see why the entire flow control code would have to be pushed
to leaf classes when HTTP/2 is supported, but I do not think that will
affect agent-to-agent communication anyway.

(**) Perhaps when you said "Agent-Agent communication" you meant
different classes "communicating" within the same agent object?? For
example, Client and FtpClient class calling each other methods while
both are a part of a single FtpClient object or instance? If yes, such
communication is relatively cheap and usually preferred to code
duplication so yes, I do expect some of it in my model, but I do not
think it is a bad thing.

In most cases, I use lower-case "agent" to mean an object of one of the
leaf Agent hierarchy types. Something we can point to using "Agent*".

> If you put for example, a sendNextMessage() method into a level-2 Client
> class in your hierarchy (as I understand it).
> The method would have to do things like blocking sending other messages
> until it was completed. This is okay for HTTP/1 and FTP as the messages
> are small and have same serial properties.
> When HTTP/2 is added to the mix it cannot block sending other messages.

Not sure I understand. HTTP/1 allows pipelining which results in, for
example, sending multiple requests on the same transport connection
without waiting for a response to the first request. Thus, the notion of
"non-blocking" messages exists in both HTTP/1 and HTTP/2. I can
appreciate that it may be the default in HTTP/2 and a dangerous feature
in HTTP/1, but I do not see how it would drastically increase
communication between agents.

> So the logic to do that is either pushed down to level-3 class or a
> whole separate Client class has to be implemented - which defeats the
> purpose of having a shared Client class as a parent to begin with.

I think you are arguing that having a common parent for Http2::Client
and Http2::Server has its benefits. I agree. So does having a common
parent for Http1::Client and Http1::Server. This is not specific to
HTTP/2 so I will go back to just "HTTP".

My model does not preclude such a common HTTP parent, but the Agent
hierarchy does not have it because its primary focus is not on transfer
protocols but on sides.

Earlier (not on this thread), we have agreed to place the code common
among HTTP clients and servers into src/http/ while side-specific code
goes into clients/ and servers/. My Agent hierarchy is about the latter,
not the former.

If we do add a common HTTP-specific parent (src/http/) for HttpClient
and HttpServer and keep my hierarchy intact, we will end up with
HttpClient (and HttpServer) having multiple parents (mind the changing
direction of inheritance arrows):

SideAgent->ClientAgent->HttpClient<-HttpAgent<-ProtoAgent
SideAgent->ServerAgent->HttpServer<-HttpAgent<-ProtoAgent

My hierarchy is about the "left" part of the above diagram (HttpClient
plus classes to the left of it). Big fragments of that hierarchy exist
in Squid code today, and are arguably more complex/critical than any
specific transfer protocol handing, but that does not preclude the
creation of the new "right" hierarchy (HttpClient plus classes to the
right of it).

I suspect that we are both trying to avoid the second hierarchy, but you
might be focusing on transfer protocol parts (the ProtoAgent hierarchy)
while I am focusing on the left part (the SideAgent hierarchy). If true,
that would clarify a lot of things on this thread!

Do you think that is what is going on?

Cheers,

Alex.
Received on Wed Mar 05 2014 - 19:32:46 MST

This archive was generated by hypermail 2.2.0 : Sat Mar 08 2014 - 12:00:11 MST