Re: [PATCH] Agent base class - (WAS: [PATCH] Comm::TcpReceiver - part 1, 2, and 3)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 03 Mar 2014 11:23:13 -0700

Executive summary from my point of view:

* There is now a cemented agreement that SSL/TLS can continue to live in
Transport (i.e., Comm) level for now. We no longer need to worry about
TlsAgents and related classes/complications.

* We seemed to reach an agreement regarding Agent relationship with
messages: Agents know they are processing transfer protocol messages
[stored in the input buffer], but they do not know how those messages
correspond to raw bytes in the input buffer.

* There is no agreement on the definition of Agent (and other classes in
the Agent hierarchy). Amos' current Agent definition focuses on reading
from and writing to a transport connection. Mine on encapsulating code
needed/shared by other Agents. Hopefully, this can be reconciled by
observing that the latter allows the former.

* There is a big disagreement regarding the second hierarchy level. Amos
wants it to be specific to a transfer protocol (e.g., HTTP or FTP),
while I suggest making it specific to the agent protocol role (Client or
Server).

* There may be a disagreement on whether an Agent must deal with exactly
one transport connection (e.g., FTP ctrl channel) or with any number of
transports necessary to handle at least one transfer transaction (e.g.,
an FTP command/reply exchange using both ctrl and data channels).

My Agent hierarchy definitions have not changed (quoting just the
contentious parts here for reference):

  Agent: Common base for Client and Server agents. Uses Transport.

  Client: Common base for HttpClient, FtpClient, IcapClient, etc.
  Server: Common base for HttpServer, FtpServer, etc.

  HttpClient: Sends HTTP requests. Receives HTTP responses.
  HttpServer: Receives HTTP requests. Sends HTTP responses.
  FtpClient: Sends FTP commands. Receives FTP replies.
  FtpServer: Receives FTP commands. Sends FTP replies.

IIRC, no comprehensive alternative has been proposed, but there are
disagreements on various parts of the above and, hence, the submitted
patch violates those contested parts. Once the agreement is reached, the
patch can be fully reviewed, polished, and committed.

Discussion follows.

On 03/03/2014 04:03 AM, Amos Jeffries wrote:
> On 3/03/2014 10:27 a.m., Alex Rousskov wrote:
>> I think we still disagree on what Agent is, unfortunately (and, hence,
>> what your TcpReceiver is). I think you are currently not writing an
>> Agent class from my sketch. You are writing a class that does not
>> currently exist in my sketch. Details below.

> I understood your definition to be a 3-layer inheritence:
>
> Agent -> FooAgent -> FooClient
> Agent -> FooAgent -> FooServer
>
> Agent -> BarAgent -> BarClient
> Agent -> BarAgent -> BarServer

You got the number of layers right, but if you read the definitions of
each layer, you will see that my hierarchy is:

  Agent -> Client -> FooClient
  Agent -> Client -> BarClient
  Agent -> Server -> FooServer
  Agent -> Server -> BarServer

There will be other classes to encapsulate common transfer protocol
code, but the above three layers is what we should be focusing on when
discussing the overall Agent hierarchy shape IMO.

> Agent (was TcpReceiver)
> - use of Transport for I/O in generic way

Yes, but this is just one aspect of an Agent. IMHO, you have a tendency
to focus on this aspect too much when discussing class definitions. This
probably forces you to place transfer protocol-specific classes too high
in the hierarchy.

> FooAgent
> - translation (use of Parser/Packer) between buffer bytes and Foo
> protocol message objects

This class does not exist in my hierarchy. It is far more important to
encapsulate server and client logic than to plug in Parser and Packer
early IMO. Parsing and packing are minor final details compared to the
fundamentals of being a Squid Agent, a Squid Client, or a Squid Server.

Please note that the existing code already demonstrates how we can have
a generic Client class (the existing ServerStateData class is our future
Client).

> FooClient
> - transfer protocol logics on the message objects and state.

There needs to be something about "Client" in your FooClient definition.
For example, in my hierarchy, FooClient "sends Foo requests. Receives
Foo responses."

> By "deliver" / "pass" between these classes I mean crossing the class
> name scope by calling each others (virtual) methods.

Yes, I know.

> Lets skip FTP for the moment that is relatively simple regardless of our
> disagreement on specifics. I am also waiting on the FTP relay project to
> go in before any more work there. So things may change there before it
> gets implemented.

I do not think we should ignore FTP needs during this discussion because
its two-transport aspect is very useful in pointing out weaknesses and
strengths of a given hierarchy design.

> The difficult code is the existing client-side/server-side hierachies.

To me, client-side/server-side objects are easily defined (and the
existing Client implementation is nearly complete as far as big-picture
design issues are concerned). I will repeat those definitions here:

  Client: Common base for HttpClient, FtpClient, IcapClient, etc.
  Server: Common base for HttpServer, FtpServer, etc.

I would not be surprised if this needs more work (e.g., I am not 100%
sure ICAP fits well here because it kind of faces a different side of
Squid), but it is more than enough to start with IMO.

> ** In this project branch I have isolated the shared logics that can be
> base Agent on any TCP client or server connection. That is now called
> AnyP::Agent.

I am not sure why you have to say "TCP" when defining a class that is
not TCP-specific. I thought we are already passed that point in the
discussion.

I do not think Agent belongs to AnyP. Agent is a side-related thing, not
protocol-related thing. We do not want to place everything that is not
related to any specific protocol inside the AnyP namespace!

Please note that I probably understand what you have done. I probably
understand the patch code well enough. Moreover, encapsulating shared
kids logic is the right way to implement Agent, and you focused on one
important aspect (transport integration). There is nothing wrong with that!

> IMPORTANT: this change is *all* I am wanting to get merged into trunk
> right now. The rest of the code is still too tangled with other stuff to
> isolate the Http1Agent and Client/Server layers cleanly.
> Yes this means that there *may* be pieces still to push into the new
> Agent later. But that will depend on much cleaner child class code to
> even spot correctly.

Yes, of course. I do not suggest that you implement the whole hierarchy
or even complete the Agent class in one patch. I only insist that we
agree on what the hierarchy is, so that the Agent class can be defined,
partially implemented, and reviewed correctly. This agreement is
critical because it determines how to implement what you want to implement.

The change you want to get merged into trunk is not ready for merge.
That does not mean everything in your patch is wrong, far from that!

I do insist that there is a general agreement on the fundamental API
that your patch is introducing. If you were adding some minor class used
by one method in the corner of Squid, I would have given up long time
ago. However, you are adding one of the most important classes in Squid,
and the hierarchy that grows from it will shape critical pieces of Squid
code for years to come. We need to agree on what that hierarchy will
look like before I can review you code

The fact that the actual code that you are moving or adding is small is
irrelevant IMO. Hopefully the above explains why I think that way.

Imagine I bring you a beautiful classic portrait of a person and ask you
whether you think that portrait is good enough to be hanged at the
entrance of the museum dedicated to that person. The portrait itself may
be stunning, but you should not say "yes" without knowing who that
person is (or at least the person's gender). The review of your patch is
currently in this "Who is that Agent?" stage.

> The current state of old client-side and server-side code in the branch
> is that:
>
> - ConnStateData is spanning Http1Agent and Http1Client and (some of)
> Parsing.
> - ClientSocketContext is spanning Http1Client AsyncJob calls.
> - a bunch of global functions also span Http1Agent and Http1Client and
> Parsing
>
> - HttpStateData is spanning Http1Agent and Http1Server
> - ServerStateData is spanning Http1Agent, IcapAgent, IcapClient,
> IcapServer operations
>
>
> Future work is planned to consist of separating out the Parser/Packer
> code into specialized classes outside the Agent hierarchy (parser-ng
> project).
>
> Followed by isolation shuffling of the resulting simpler code into an
> Http1Agent class.
>
> Followed by wrapping up the HttpClient/HttpServer logics into their
> respective classes.
>
> (last two may happen together or get further broken down).

As you can tell by now, a lot of the above already does not match the
hierarchy I think we should implement. It may be prudent to pause
development until we can agree on how that hierarchy should look like
and get Agent changes committed.

> In regards to the attached patch can you point me at parts which are
> specifically in your idea not supposed to be in the base Agent ?

I will, once we agree on what Agent is. You know how I define it.

> Keeping in mind that:
> - "uses Transport" is implemented in trunk as having a Pointer to
> Comm::Connection and calling various Comm functions on it.

Yes, we have agreed that this is the right Transport implementation for now.

> - I have not renamed the deceptive "tcp" Pointer. Was thinking of
> calling it "transport" or something. Any ideas?

inConn, to keep symmetry with inBuf?

>>>>> If I am understanding you right this class should be named AnyP::Agent
>>>>> in the generic protocol library instead of attempting to be part of
>>>>> Comm:: related to TCP.
>>>>
>>>> Yes, if that is what you want to implement (and, as you know, I think
>>>> that is a better choice than Transport). This may not be as easy as it
>>>> may seem. For example, "generic high-level protocol library" does not
>>>> work with "opaque bytes" it works with "protocol messages", but I am not
>>>> sure you are happy to accept that.
>>
>>
>>> If this TcpReceiver class is actually Agent then the child FooAgent's
>>> will be the ones receiving and emitting the messages. This being the
>>> base class that deals with the buffers send/receive operations as opaque
>>> bytes.
>>
>> Yes. One of the problems in this discussion is that you resist the
>> notion of Agent dealing with protocol messages because it actually reads
>> opaque bytes. When thinking about a class hierarchy design, especially
>> the design of a base class, we should focus on the API it provides to
>> the kids and the rest of the world.
>
> Yes. For the base class messages are opaque.

Agreed. Agent having a concept of a message is what's important as far
as Agent design and implementation is concerned. The exact type/kind of
a message is not important to Agent.

> Messages are what the child classes provide.
> You may think of the buffer as having 0-N messages in it, but the
> *base* Agent class has no idea which particular bytes are which message.

Yes. However, Agent having a concept of a message immediately makes it
possible to ask a soon-to-be-critical question (that I have queued
before): How many messages does a single Agent deal with? Does the
answer depend on the side?

Please do not try to answer that question now to avoid side-tracking the
discussion! I am simply illustrating why it is important to agree that
Agent knows that agents deal with transfer protocol messages. If all
Agent knows about is bytes, then the question itself becomes impossible
to ask, and we would not know the right place to plug in new Server
Agents into the existing code, among other things.

>> When defining Agent, we should not be focusing on what
>> specific code the base class has. That last part is relatively easy --
>> the base class will have code shared/needed by all of its kids!
>>
>
> Yes. You seem to be implying constantly that there is something missing
> or too-much in the base Agent class coded so far.

Not really. Your code (the "portrait" in the analogy above) remains
largely irrelevant until we agree on what Agent (the "person" in the
analogy above) is. Once Agent hierarchy is agreed upon, it becomes
possible to determine whether you Agent implementation is missing
something, has inappropriate code, etc.

>>>> Agent:
>>>>
>>>> * owns/fully controls transport T at any given time
>>>> * uses transport T for sending and receiving messages (not bytes!)

>>> That "not bytes!" says that you want the message parsing/packing
>>> translation from bytes to HTTP/FTP messages to be done by Transport (
>>> ie. comm_read() ) and delivered _as messages_ to Agent.

>> Not, exactly: The very same byte in the input buffer is both a "raw
>> opaque byte" for Transport and "transfer message byte" for Agent. Agents
>> work with transfer protocol messages while Transport works with raw
>> bytes (or "transport protocol bytes" to be more precise).

> Aha! you confused me with that.
>
> A message going from HttpAgent to elsewhere in the code is an
> HttpRequest or HttpReply *object*. It bears almost no byte-wise relation
> to the bytes in the buffer used by base Agent and Transport.

> ==> Ergo: Agent must parse the buffer into messages. *how* it does so
> is by using a Parser.

Sure. I do not think the above contradicts anything I have said. There
are unparsed messages in the input buffer. There are parsed message
objects that may not remember much about the buffer they came from (due
to excessive copying mostly).

>>>> It is critical to draw the bytes/messages boundary like that IMO. Also,
>>>> it would be best to minimize Agent-to-Agent communication to the extent
>>>> possible. Most of that communication would go through other Squid layers
>>>> such as Store and BodyPipe.
>>>
>>> +1.
>>
>>
>> Note that having both FtpCtrlAgent and FtpDataAgent contradicts the
>> above because those two agents imply extensive Agent-to-Agent
>> communication and coordination:
>
> Nope. I am thinking FtpClient/FtpServer (childs of FtpCtrlAgent) at some
> point spawn a FtpDataAgent (Job) and wait for it to message back about
> completion (or timeout) before continuing with the ctrl messages via its
> parent class. Not what I'd call extensive communication.

In my experience with FTP code, it does not quite work that way: There
is a lot more interaction between data and control streams because, in
real world, the commands and data do not always come, open, or close in
orderly/ideal fashion. Spawn-and-wait for the data channel is not
sufficient.

Again, I am not saying that an FTP agent cannot be implemented using two
or more sub-agent jobs, especially if you start violating job
boundaries, but it is more difficult to implement it [correctly] that
way. In my Agent hierarchy, that difficulty is not _required_ because
each agent job/object is dedicated to one transfer protocol transaction
(at least), not one transport object.

In my approach, an FtpClient may still spawn jobs if needed, of course,
but it does not have to. The decision is made when implementing a given
agent. If we call that agent FtpCtrlClient, that decision would be made
now, when the hierarchy is defined.

Cheers,

Alex.
Received on Mon Mar 03 2014 - 18:23:37 MST

This archive was generated by hypermail 2.2.0 : Tue Mar 04 2014 - 12:00:25 MST