Re: [PATCH] Agent base class - (WAS: [PATCH] Comm::TcpReceiver - part 1, 2, and 3) from Amos Jeffries on 2014-03-03 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 04 Mar 2014 14:59:56 +1300

On 2014-03-04 07:23, Alex Rousskov wrote:
> Executive summary from my point of view:
>
> * There is now a cemented agreement that SSL/TLS can continue to live
> in
> Transport (i.e., Comm) level for now. We no longer need to worry about
> TlsAgents and related classes/complications.
>
> * We seemed to reach an agreement regarding Agent relationship with
> messages: Agents know they are processing transfer protocol messages
> [stored in the input buffer], but they do not know how those messages
> correspond to raw bytes in the input buffer.
>
> * There is no agreement on the definition of Agent (and other classes
> in
> the Agent hierarchy). Amos' current Agent definition focuses on reading
> from and writing to a transport connection. Mine on encapsulating code
> needed/shared by other Agents. Hopefully, this can be reconciled by
> observing that the latter allows the former.
>
> * There is a big disagreement regarding the second hierarchy level.
> Amos
> wants it to be specific to a transfer protocol (e.g., HTTP or FTP),
> while I suggest making it specific to the agent protocol role (Client
> or
> Server).
>
> * There may be a disagreement on whether an Agent must deal with
> exactly
> one transport connection (e.g., FTP ctrl channel) or with any number of
> transports necessary to handle at least one transfer transaction (e.g.,
> an FTP command/reply exchange using both ctrl and data channels).
>
>
> My Agent hierarchy definitions have not changed (quoting just the
> contentious parts here for reference):
>
> Agent: Common base for Client and Server agents. Uses Transport.
>
> Client: Common base for HttpClient, FtpClient, IcapClient, etc.
> Server: Common base for HttpServer, FtpServer, etc.
>
> HttpClient: Sends HTTP requests. Receives HTTP responses.
> HttpServer: Receives HTTP requests. Sends HTTP responses.
> FtpClient: Sends FTP commands. Receives FTP replies.
> FtpServer: Receives FTP commands. Sends FTP replies.
>
>
> IIRC, no comprehensive alternative has been proposed, but there are
> disagreements on various parts of the above and, hence, the submitted
> patch violates those contested parts. Once the agreement is reached,
> the
> patch can be fully reviewed, polished, and committed.

I dont see how that conclusion follows. The protocol and client/server
parts have not been implemented as distinct Agent hierarchy classes yet.
So there is no code contradicting either design.

The focus of this patch has been specifically to prevent Transfer
protocol semantic details getting into the base Agent class.

>
> Discussion follows.
>
>
> On 03/03/2014 04:03 AM, Amos Jeffries wrote:
>> On 3/03/2014 10:27 a.m., Alex Rousskov wrote:
>>> I think we still disagree on what Agent is, unfortunately (and,
>>> hence,
>>> what your TcpReceiver is). I think you are currently not writing an
>>> Agent class from my sketch. You are writing a class that does not
>>> currently exist in my sketch. Details below.
>
>> I understood your definition to be a 3-layer inheritance:
>>
>> Agent -> FooAgent -> FooClient
>> Agent -> FooAgent -> FooServer
>>
>> Agent -> BarAgent -> BarClient
>> Agent -> BarAgent -> BarServer
>
>
> You got the number of layers right, but if you read the definitions of
> each layer, you will see that my hierarchy is:
>
> Agent -> Client -> FooClient
> Agent -> Client -> BarClient
> Agent -> Server -> FooServer
> Agent -> Server -> BarServer
>
> There will be other classes to encapsulate common transfer protocol
> code, but the above three layers is what we should be focusing on when
> discussing the overall Agent hierarchy shape IMO.

So which generic operations do you see as being client-specific but not
also protocol-specific ?
or server-specific but not protocol-specific ?

I see the middle layer providing:
  - pipeline list of messages not yet handled
  - transfer-coding of messages
  - generic loop running parser provided by child class to find messages

>
>> Agent (was TcpReceiver)
>> - use of Transport for I/O in generic way
>
> Yes, but this is just one aspect of an Agent. IMHO, you have a tendency
> to focus on this aspect too much when discussing class definitions.
> This
> probably forces you to place transfer protocol-specific classes too
> high
> in the hierarchy.
>

That Transport untangling is the specific intended scope of this patch.
The rest requires large untangling from the parser code. As bits are
found, sure, they can be added or moved around the class hierarchy.

>
>> FooAgent
>> - translation (use of Parser/Packer) between buffer bytes and Foo
>> protocol message objects
>
> This class does not exist in my hierarchy. It is far more important to
> encapsulate server and client logic than to plug in Parser and Packer
> early IMO. Parsing and packing are minor final details compared to the
> fundamentals of being a Squid Agent, a Squid Client, or a Squid Server.
>
> Please note that the existing code already demonstrates how we can have
> a generic Client class (the existing ServerStateData class is our
> future
> Client).
>

Note how it also conflates RESPMOD adaptation client/server operations
into each per-protocol server and actually contains the whole
AnyP::Agent logic for ICAP connectivity separate from the protocol
logics. Once IcapClient is built from ServerStateData (aka Client)
things get recursive.

Note how it is not actually used for any of the FTP protocol or gopher
protocol until after they have been mapped to HTTP classes and
semantics. Proving that ServerStateData is *not* shared Transfer
protocol logic, it is HTTP client logic facing an internal Transfer
protocol gateway/mapping.

I you see your point on it being a nice abstraction. Perhapse this
should be a 4-layer hierarchy or double-headed inheritance from
AnyP::Client/AnyP::Server and AnyP::Agent.

Here are the elephants in the room:
In all the transfer protocols we handle or propose to handle the
messages have a prefix+payload syntax in *both* directions.

Any message MAY explicitly be sent from either client or server.
Generally requests come from clients and responses from servers. That
barrier is not strict, just a statement about "normally" in the specs.

   There is a major push in WebSockets, SPDY and HTTP/2 to make use of
that leniency. SPDY and HTTP/2 PUSH_PROMISE messages in particular
already cause your server/client separation to hit trouble. Since they
are messages received from server with all handling requirements of
requests.
  *For now* it still meets your model in a loose way by being an
encapsulation from servers. As HTTP/2 rolls out in future likelihood of
other strange frame extensions being added grows substantially.
  ** WebSockets makes no such attempt at compatibility.

>
>> FooClient
>> - transfer protocol logics on the message objects and state.
>
> There needs to be something about "Client" in your FooClient
> definition.
> For example, in my hierarchy, FooClient "sends Foo requests. Receives
> Foo responses."
>

Sure. Sorry. Insert language to the effect of protocol-specific
directionality of requests and replies.

>
>> By "deliver" / "pass" between these classes I mean crossing the class
>> name scope by calling each others (virtual) methods.
>
> Yes, I know.
>
>
>> Lets skip FTP for the moment that is relatively simple regardless of
>> our
>> disagreement on specifics. I am also waiting on the FTP relay project
>> to
>> go in before any more work there. So things may change there before it
>> gets implemented.
>
> I do not think we should ignore FTP needs during this discussion
> because
> its two-transport aspect is very useful in pointing out weaknesses and
> strengths of a given hierarchy design.
>

Fine.

FTP protocol in a nutshell:
* one ctrl Transport containing a persistent pipeline of messages.

* starts with a message-based handshake with challenge-response
semantics driven by server messages.

* then switches direction to request-reply semantics driven by client
messages.
* certain client request messages require secondary Transport to be
initiated and transfer a data blob.
- while this transfer is taking place ctrl channel messages must be
restricted/queued to avoid races

* message payload may be in secondary Transport or inline with message.

>
>> The difficult code is the existing client-side/server-side hierachies.
>
> To me, client-side/server-side objects are easily defined (and the
> existing Client implementation is nearly complete as far as big-picture
> design issues are concerned). I will repeat those definitions here:
>
> Client: Common base for HttpClient, FtpClient, IcapClient, etc.
> Server: Common base for HttpServer, FtpServer, etc.
>
>
> I would not be surprised if this needs more work (e.g., I am not 100%
> sure ICAP fits well here because it kind of faces a different side of
> Squid), but it is more than enough to start with IMO.
>

ICAP in a nutshell:
* one ctrl Transport containing a persistent pipeline of messages.

* request-reply semantics driven by client messages.

* message payload always inline.

When we abstract them down to the nutshell requirements the differences
between client and server disappear.

>
>> ** In this project branch I have isolated the shared logics that can
>> be
>> base Agent on any TCP client or server connection. That is now called
>> AnyP::Agent.
>
> I am not sure why you have to say "TCP" when defining a class that is
> not TCP-specific. I thought we are already passed that point in the
> discussion.

We did, I have not migrated the UDP and UDS code to using this class
though.

>
> I do not think Agent belongs to AnyP. Agent is a side-related thing,
> not
> protocol-related thing. We do not want to place everything that is not
> related to any specific protocol inside the AnyP namespace!

Thats what "any protocol" namespace was defined for. Your point that
this code is Transfer protocol level while agnostic to which Transfer
protocol proves that it is part of that category. The *child* classes
will not be.
ie AnyP:: is the bits shared between 2+ of: src/http/, src/ftp/,
src/icp, src/htcp, src/clients/, src/servers/

>
> Please note that I probably understand what you have done. I probably
> understand the patch code well enough. Moreover, encapsulating shared
> kids logic is the right way to implement Agent, and you focused on one
> important aspect (transport integration). There is nothing wrong with
> that!
>
>
>> IMPORTANT: this change is *all* I am wanting to get merged into trunk
>> right now. The rest of the code is still too tangled with other stuff
>> to
>> isolate the Http1Agent and Client/Server layers cleanly.
>> Yes this means that there *may* be pieces still to push into the new
>> Agent later. But that will depend on much cleaner child class code to
>> even spot correctly.
>
> Yes, of course. I do not suggest that you implement the whole hierarchy
> or even complete the Agent class in one patch. I only insist that we
> agree on what the hierarchy is, so that the Agent class can be defined,
> partially implemented, and reviewed correctly. This agreement is
> critical because it determines how to implement what you want to
> implement.

As far as I can see we agree on that and are just deciding on which wall
(child classes) the doors vs windows are to go.

>
> The change you want to get merged into trunk is not ready for merge.
> That does not mean everything in your patch is wrong, far from that!

So far the blanket statement to hold off is not helping progress. Please
point at a specific part of this patch that would be removed or even
significantly remodelled based on changes of plans in one of the
un-agreed details.

I think I am probably understanding your model too and none of the
differences are in or appearing to impact the code layer the patch
implements.

I suspect the the stopReadingXXX() method maybe changed relation to the
sub-Agent/Job spawning ideas. In my model we can (1-2 patches in future)
push it down a level or two to a BodyPipe agent or the specific clients
spawning. Your model we don't have that and it may be erased completely.
That seems to be all.

I suspect you want to push the parser generic loop up here. That is
*new* code perhapse converting procesReadBuffer() into a local method
instead of virtual. Right now its not possible due to the tangling in
client_side*.cc.

[ The last few rounds of this discussion seems to be something better
off prefixing the work planned after parser-ng is merged. ]

>
> I do insist that there is a general agreement on the fundamental API
> that your patch is introducing. If you were adding some minor class
> used
> by one method in the corner of Squid, I would have given up long time
> ago. However, you are adding one of the most important classes in
> Squid,
> and the hierarchy that grows from it will shape critical pieces of
> Squid
> code for years to come. We need to agree on what that hierarchy will
> look like before I can review you code
>
> The fact that the actual code that you are moving or adding is small is
> irrelevant IMO. Hopefully the above explains why I think that way.
>
>
> Imagine I bring you a beautiful classic portrait of a person and ask
> you
> whether you think that portrait is good enough to be hanged at the
> entrance of the museum dedicated to that person. The portrait itself
> may
> be stunning, but you should not say "yes" without knowing who that
> person is (or at least the person's gender). The review of your patch
> is
> currently in this "Who is that Agent?" stage.
>

Maybe that explains all this then. Because the museum has not yet been
built and the Architect is trying to get agreement on whether to re-use
the wooden flooring or stone tiles from the old bathhouse. Since the
thickness/reinforcing of the flooring and height of the museum is
dictated by that weight. The person and museum content matters as well
by adding weight from displays and decorations, appropriateness of the
color and light effects added by the floor. But still it is a decision
about flooring, not gender or colors in the portrait.

Yes I am focusing on the I/O details because that is what underpins
everything else we have on the go this year ahead, and simply detatching
the Transport from ConnStateData is what this patch is about.

>
>> The current state of old client-side and server-side code in the
>> branch
>> is that:
>>
>> - ConnStateData is spanning Http1Agent and Http1Client and (some of)
>> Parsing.
>> - ClientSocketContext is spanning Http1Client AsyncJob calls.
>> - a bunch of global functions also span Http1Agent and Http1Client and
>> Parsing
>>
>> - HttpStateData is spanning Http1Agent and Http1Server
>> - ServerStateData is spanning Http1Agent, IcapAgent, IcapClient,
>> IcapServer operations
>>
>>
>> Future work is planned to consist of separating out the Parser/Packer
>> code into specialized classes outside the Agent hierarchy (parser-ng
>> project).
>>
>> Followed by isolation shuffling of the resulting simpler code into an
>> Http1Agent class.
>>
>> Followed by wrapping up the HttpClient/HttpServer logics into their
>> respective classes.
>>
>> (last two may happen together or get further broken down).
>
> As you can tell by now, a lot of the above already does not match the
> hierarchy I think we should implement. It may be prudent to pause
> development until we can agree on how that hierarchy should look like
> and get Agent changes committed.
>
>
>> In regards to the attached patch can you point me at parts which are
>> specifically in your idea not supposed to be in the base Agent ?
>
> I will, once we agree on what Agent is. You know how I define it.
>

It is already clear that you are repeating definitions for this classes
child classes.

Lets put it another way?

In a 3-class hierarchy: A->B->C

Does it matter whether a virtual method declared by A is defined in B or
C ?
No.

Does it matter whether a method presented by A is used in B or C ?
No.

Does it matter that all child classes use it?
No.

>
>> Keeping in mind that:
>> - "uses Transport" is implemented in trunk as having a Pointer to
>> Comm::Connection and calling various Comm functions on it.
>
> Yes, we have agreed that this is the right Transport implementation for
> now.
>
>
>> - I have not renamed the deceptive "tcp" Pointer. Was thinking of
>> calling it "transport" or something. Any ideas?
>
> inConn, to keep symmetry with inBuf?
>

The conn is both in and out though.

"in" on inBuf is because it is directionally specific. outBuf exist as
MemBuf transparently relayed to Comm::Write.

>
>>>>>> If I am understanding you right this class should be named
>>>>>> AnyP::Agent
>>>>>> in the generic protocol library instead of attempting to be part
>>>>>> of
>>>>>> Comm:: related to TCP.
>>>>>
>>>>> Yes, if that is what you want to implement (and, as you know, I
>>>>> think
>>>>> that is a better choice than Transport). This may not be as easy as
>>>>> it
>>>>> may seem. For example, "generic high-level protocol library" does
>>>>> not
>>>>> work with "opaque bytes" it works with "protocol messages", but I
>>>>> am not
>>>>> sure you are happy to accept that.
>>>
>>>
>>>> If this TcpReceiver class is actually Agent then the child
>>>> FooAgent's
>>>> will be the ones receiving and emitting the messages. This being the
>>>> base class that deals with the buffers send/receive operations as
>>>> opaque
>>>> bytes.
>>>
>>> Yes. One of the problems in this discussion is that you resist the
>>> notion of Agent dealing with protocol messages because it actually
>>> reads
>>> opaque bytes. When thinking about a class hierarchy design,
>>> especially
>>> the design of a base class, we should focus on the API it provides to
>>> the kids and the rest of the world.
>>
>> Yes. For the base class messages are opaque.
>
> Agreed. Agent having a concept of a message is what's important as far
> as Agent design and implementation is concerned. The exact type/kind of
> a message is not important to Agent.
>
>
>> Messages are what the child classes provide.
>> You may think of the buffer as having 0-N messages in it, but the
>> *base* Agent class has no idea which particular bytes are which
>> message.
>
> Yes. However, Agent having a concept of a message immediately makes it
> possible to ask a soon-to-be-critical question (that I have queued
> before): How many messages does a single Agent deal with? Does the
> answer depend on the side?
>
> Please do not try to answer that question now to avoid side-tracking
> the
> discussion! I am simply illustrating why it is important to agree that
> Agent knows that agents deal with transfer protocol messages. If all
> Agent knows about is bytes, then the question itself becomes impossible
> to ask, and we would not know the right place to plug in new Server
> Agents into the existing code, among other things.

Risking that because I already put the answer above in what the middle
layer needs to provide. "pipeline of N" and "No, neither side nor
protocol" respectively.

>
>>> When defining Agent, we should not be focusing on what
>>> specific code the base class has. That last part is relatively easy
>>> --
>>> the base class will have code shared/needed by all of its kids!
>>>
>>
>> Yes. You seem to be implying constantly that there is something
>> missing
>> or too-much in the base Agent class coded so far.
>
> Not really. Your code (the "portrait" in the analogy above) remains
> largely irrelevant until we agree on what Agent (the "person" in the
> analogy above) is. Once Agent hierarchy is agreed upon, it becomes
> possible to determine whether you Agent implementation is missing
> something, has inappropriate code, etc.
>

The parts of the model we agree on are either met by this patch or
agreed to be isolated at a lower level. The disagreement appears to be
all in the lower levels.

>
>>>>> Agent:
>>>>>
>>>>> * owns/fully controls transport T at any given time
>>>>> * uses transport T for sending and receiving messages (not
>>>>> bytes!)
>
>>>> That "not bytes!" says that you want the message parsing/packing
>>>> translation from bytes to HTTP/FTP messages to be done by Transport
>>>> (
>>>> ie. comm_read() ) and delivered _as messages_ to Agent.
>
>>> Not, exactly: The very same byte in the input buffer is both a "raw
>>> opaque byte" for Transport and "transfer message byte" for Agent.
>>> Agents
>>> work with transfer protocol messages while Transport works with raw
>>> bytes (or "transport protocol bytes" to be more precise).
>
>
>> Aha! you confused me with that.
>>
>> A message going from HttpAgent to elsewhere in the code is an
>> HttpRequest or HttpReply *object*. It bears almost no byte-wise
>> relation
>> to the bytes in the buffer used by base Agent and Transport.
>
>> ==> Ergo: Agent must parse the buffer into messages. *how* it does so
>> is by using a Parser.
>
> Sure. I do not think the above contradicts anything I have said. There
> are unparsed messages in the input buffer. There are parsed message
> objects that may not remember much about the buffer they came from (due
> to excessive copying mostly).
>

... and the buffer itself is irrelevant to the FooClient/FooServer
Transfer protocol logics.

>
>>>>> It is critical to draw the bytes/messages boundary like that IMO.
>>>>> Also,
>>>>> it would be best to minimize Agent-to-Agent communication to the
>>>>> extent
>>>>> possible. Most of that communication would go through other Squid
>>>>> layers
>>>>> such as Store and BodyPipe.
>>>>
>>>> +1.
>>>
>>>
>>> Note that having both FtpCtrlAgent and FtpDataAgent contradicts the
>>> above because those two agents imply extensive Agent-to-Agent
>>> communication and coordination:
>>
>> Nope. I am thinking FtpClient/FtpServer (childs of FtpCtrlAgent) at
>> some
>> point spawn a FtpDataAgent (Job) and wait for it to message back about
>> completion (or timeout) before continuing with the ctrl messages via
>> its
>> parent class. Not what I'd call extensive communication.
>
> In my experience with FTP code, it does not quite work that way: There
> is a lot more interaction between data and control streams because, in
> real world, the commands and data do not always come, open, or close in
> orderly/ideal fashion. Spawn-and-wait for the data channel is not
> sufficient.

That is what is implemented now and working.
- switchTimeoutToDataChannel() moves FtpStateData "agent/client" focus
from ctrl to data channel.
- while waiting for data to start ctrl may be aborted or closed

>
> Again, I am not saying that an FTP agent cannot be implemented using
> two
> or more sub-agent jobs, especially if you start violating job
> boundaries, but it is more difficult to implement it [correctly] that
> way. In my Agent hierarchy, that difficulty is not _required_ because
> each agent job/object is dedicated to one transfer protocol transaction
> (at least), not one transport object.

So lets remember that in your model ClientAgent/ServerAgent are level-2
classes shared by Transfer protocols.

Then a quick glance at HTTP/2 where a _minimum_ of 100 *transactions*
are happening asynchronous and multiplexed over the same Transport and
I/O buffer. And each message may be split into N frames over that
Transport. Flow control for all that is performed with special frames
and doing that between multiple Agents would force Agent-Agent
communication which in my model is unnecessary.

Expected changes to AnyP::Agent in my model for implementing HTTP/2 are:
- adding a queue buffer internal to sendSomeData()

Expected changes to your model for implementing HTTP/2 are:
- implementing entirely separate parallel layer-2 Client/Server child
classes to inherit from.

Now where did the benefit of pushing Client/Sever up to the middle go?
If each level-3 class has a unique 1:1 relationship with its parent
class they may as well be flattened.

>
> In my approach, an FtpClient may still spawn jobs if needed, of course,
> but it does not have to. The decision is made when implementing a given
> agent. If we call that agent FtpCtrlClient, that decision would be made
> now, when the hierarchy is defined.

So there is no disagreement on level-1 hierarchy there. It is
implementation specific for the level-2/3 class separation.

FWIW:
* spawning a sub-Transport is shared by FTP (data channel spawned) and
HTTP (CONNECT tunnel spawned)
* blocking waiting on some message completion shared by FTP, HTTP/1,
ICAP - all the serial messaging protocols.

Protocol specific however differences for each of the above "shared"
concepts.

Amos
Received on Tue Mar 04 2014 - 02:00:01 MST

This archive was generated by hypermail 2.2.0 : Wed Mar 05 2014 - 12:00:09 MST