[PATCH] Agent base class - (WAS: [PATCH] Comm::TcpReceiver - part 1, 2, and 3) from Amos Jeffries on 2014-03-03 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 04 Mar 2014 00:03:34 +1300

On 3/03/2014 10:27 a.m., Alex Rousskov wrote:
> I suggest to pick one of the following options as the next step:
>
> (A) Make TcpReceiver into a base Agent class as defined below. If this
> approach results in significant code duplication, encapsulate duplicated
> code into a yet unnamed class(es), such as a class discussed in (B).
> This encapsulation is not expected to have drastic effects on the base
> Agent interface itself or the Agent hierarchy interfaces in general.

FYI: I started with this last night. The basic renaming part is in the
launchpad connection-manager branch already (along with the new edits
from this latest response by you).

>
> (B) Make TcpReceiver into a yet unnamed class that is primarily
> responsible for reading transport bytes and converting them into
> transfer protocol message, messages, or their parts (e.g., body bytes in
> FTP). Transfer protocol Agents will use objects of that new class to get
> transfer messages or message parts. For example, an FtpClient agent will
> use two objects of this class: One for the control connection and one
> for the data connection.
>
> In either case (either A or B), agree that an Agent object (of any leaf
> Agent type) is responsible for coordinating all primary activities
> related to handling of a given transfer protocol transaction (or
> transactions, to be decided later) on a given Squid side. For example,
> FtpClient agent is responsible for conducting an FTP transaction with an
> FTP server (including both control and data channels).
>
> Such an Agent object will use its own class code/members and the
> code/members of its parent classes to accomplish that, as usual. The
> base Agent class contains code/members needed by all Agents, of course.
> This "obvious" paragraph was added just to make sure that you do not
> think that I am proposing that the entire FTP handling code resides in a
> single class. More general code that is usable/needed in other leaf
> Agent classes will trickle up to parent classes, of course.
>
>
> I recommend starting with (A). Justification is in the detailed
> discussion below.
>

We seem to be floating into confusion again.

>
> On 03/01/2014 09:43 PM, Amos Jeffries wrote:
>> The IETF official line seems to be naming of these levels as Transport
>> (UDP/TCP/etc) vs Transfer (HTTP/FTP/etc) protocols. Maybe we should
>> mirror that terminology?
>
> Sure, at least in discussions (and new code names where it makes sense).
> Where does IETF place SSL/TLS in that division?

It seems to be regarded an encoding extension of the Transport at times
(SCTP) and a separate layered Transport at others (TCP, DTLS).
For our purposes (on TCP or UDP) it is a multi-layered Transport.

>
>> Sure. This Agent direction is fine by me.
>
> I think we still disagree on what Agent is, unfortunately (and, hence,
> what your TcpReceiver is). I think you are currently not writing an
> Agent class from my sketch. You are writing a class that does not
> currently exist in my sketch. Details below.
>

Yes. You seem to be starting to talk about the FooClient/FooServer from
your original definition as if they were the FooAgent class itself
instead of the specialized childs of it from that definition.

I understood your definition to be a 3-layer inheritence:

Agent -> FooAgent -> FooClient
Agent -> FooAgent -> FooServer

Agent -> BarAgent -> BarClient
Agent -> BarAgent -> BarServer

Agent (was TcpReceiver)
- use of Transport for I/O in generic way

FooAgent
- translation (use of Parser/Packer) between buffer bytes and Foo
protocol message objects

FooClient
- transfer protocol logics on the message objects and state.

By "deliver" / "pass" between these classes I mean crossing the class
name scope by calling each others (virtual) methods.

Lets skip FTP for the moment that is relatively simple regardless of our
disagreement on specifics. I am also waiting on the FTP relay project to
go in before any more work there. So things may change there before it
gets implemented.
The difficult code is the existing client-side/server-side hierachies.

** In this project branch I have isolated the shared logics that can be
base Agent on any TCP client or server connection. That is now called
AnyP::Agent.

IMPORTANT: this change is *all* I am wanting to get merged into trunk
right now. The rest of the code is still too tangled with other stuff to
isolate the Http1Agent and Client/Server layers cleanly.
Yes this means that there *may* be pieces still to push into the new
Agent later. But that will depend on much cleaner child class code to
even spot correctly.

The current state of old client-side and server-side code in the branch
is that:

- ConnStateData is spanning Http1Agent and Http1Client and (some of)
Parsing.
- ClientSocketContext is spanning Http1Client AsyncJob calls.
- a bunch of global functions also span Http1Agent and Http1Client and
Parsing

- HttpStateData is spanning Http1Agent and Http1Server
- ServerStateData is spanning Http1Agent, IcapAgent, IcapClient,
IcapServer operations

Future work is planned to consist of separating out the Parser/Packer
code into specialized classes outside the Agent hierarchy (parser-ng
project).

Followed by isolation shuffling of the resulting simpler code into an
Http1Agent class.

Followed by wrapping up the HttpClient/HttpServer logics into their
respective classes.

(last two may happen together or get further broken down).

In regards to the attached patch can you point me at parts which are
specifically in your idea not supposed to be in the base Agent ?

Keeping in mind that:
- "uses Transport" is implemented in trunk as having a Pointer to
Comm::Connection and calling various Comm functions on it.

- I have not renamed the deceptive "tcp" Pointer. Was thinking of
calling it "transport" or something. Any ideas?

>
>>>> If I am understanding you right this class should be named AnyP::Agent
>>>> in the generic protocol library instead of attempting to be part of
>>>> Comm:: related to TCP.
>>>
>>> Yes, if that is what you want to implement (and, as you know, I think
>>> that is a better choice than Transport). This may not be as easy as it
>>> may seem. For example, "generic high-level protocol library" does not
>>> work with "opaque bytes" it works with "protocol messages", but I am not
>>> sure you are happy to accept that.
>
>
>> If this TcpReceiver class is actually Agent then the child FooAgent's
>> will be the ones receiving and emitting the messages. This being the
>> base class that deals with the buffers send/receive operations as opaque
>> bytes.
>
> Yes. One of the problems in this discussion is that you resist the
> notion of Agent dealing with protocol messages because it actually reads
> opaque bytes. When thinking about a class hierarchy design, especially
> the design of a base class, we should focus on the API it provides to
> the kids and the rest of the world.

Yes. For the base class messages are opaque. Messages are what the child
classes provide.
You may think of the buffer as having 0-N messages in it, but the
*base* Agent class has no idea which particular bytes are which message.

> From that point of view, you should
> not resist saying "Agent deals with protocol messages" even though Agent
> does not know what those messages are and even though Agent
> implementation contains a read call that gets opaque bytes into the buffer.

Sure.

>
> In other words, when talking about a base Agent class, we should be
> defining what an Agent (i.e., any object of any specific Agent class)
> is, in terms general enough so that they are still applicable to every
> Agent class.

Yes.

> When defining Agent, we should not be focusing on what
> specific code the base class has. That last part is relatively easy --
> the base class will have code shared/needed by all of its kids!
>

Yes. You seem to be implying constantly that there is something missing
or too-much in the base Agent class coded so far.

>
> If you wish, just think of those opaque bytes in buffer as transfer
> protocol messages. Imagine that they become transfer protocol messages
> when they leave the transport read call and enter Agent. They are still
> in the same I/O buffer, but they are transfer protocol messages now.
> Problem solved?

Sure.

>
>> I view [Agent] as the class taking messages in the transfer protocol and
>> converting it to bytes out socket. Likewise taking opaque bytes from the
>> socket and converting it to transfer protocol messages.
>> ie the parser and possibly packer routines.
>
> If you keep focusing on this packing/parsing bit, you will end up with a
> completely different design where TcpReceiver is not an Agent but a
> Reader+Parser (converting opaque bytes into protocol messages). That
> [more complex] design is also valid and can be implemented, but we need
> to be clear what we are implementing first because the implementations
> of different designs will differ substantially.

I am implementing FooAgent *uses* Parser/Packer to convert between
opaque message bytes and message classes. Between Transport I/O bytes
buffer (inBuf from the base Agent) and HttpRequest/HttpReply classes
manipulated by HttpClient/HttpServer which are generated from that
buffer by the parser.

... You switched to FtpClient discussions now...

>
> In "TcpReceiver is a Parser" design:
>
> Transport: Deals with transport layers (same as in my design).
> Parser: Reads bytes from Transport and creates transfer messages.
> Agent: An end point of a IETF transfer protocol using parser(s).
> ...
> FtpClient: Coordinates work of FtpCtrlParser and FtpDataParser.
> Also sends FTP commands and FTP data.
>
>
> In mine design:
>
> Transport: Deals with transport layers (same as in Parser design).
> Agent: An end point of a IETF transfer protocol using transport(s).
> ...
> FtpClient: Sends and receives FTP commands and FTP data.
> Uses one Transport for receiving FTP commands and
> another Transport for receiving FTP data.
>
>
> Please note that I do not like "Parser" and "Packer" names for these new
> classes because they initiate transport I/O in addition to packing and
> parsing. At this time, I cannot offer a better name so I am using
> Parser/Packer. We will need a better name (TransportEnd??) if we decide
> to go forward with this design (which I hope we do not at this time).
>
> The name "Agent" does not fit for this Reader+Parser class because an
> agent is usually responsible for all (or most) aspects of some
> transaction, not just unidirectional communication with another party.
>
> For example, a good real estate agent does not just show you houses that
> happen to be in the neighborhood, but also listens to your preferences
> of what kind of house you are looking for, does market research, and
> facilitates negotiations with the seller. Such an agent is responsible
> for most interactions between you and the real estate "market". In my
> sketch, Agent is the same kind of class, responsible for most transfer
> protocol interactions with the outside world (on one side of Squid, and
> in the scope of one transfer protocol connection or even one transfer
> protocol message).
>
>
>> Processing of those messages after parsing and before packing is what
>> FooClient/FooServer does to the messages to/from FooAgent.
>
> It is wrong to think of kid classes as receiving/sending something to
> their parent classes. Sending/receiving is not a child-parent
> relationship. It is a "pipe" or "message sending" relationship. I
> suspect this difference is in the core of the current disagreements (or
> more precisely "confusion") on this thread!

See above. I'm talking about between class scopes. Sorry if the word was
not one you associate with scope movement.

>
>>>> * HTTP/2 Agent requires semantics for sending bytes and receiving bytes
>>>> over a single Transport which may also need to switch arbitrarily
>>>> between TcpTransport and TlsTransport at byte boundaries as defined by
>>>> the HTTP/2 protocol framing.
>>>> NP: the peek-n-splice BIO methods over a TcpTransport would seem
>>>> necessary for this.
>>>
>>> I know very little about HTTP/2, but I am willing to bet that the same
>>> "single Transport" idea and the above sketch apply to HTTP/2 as well.
>>>
>>
>> Think about ssl-bump applied to a CONNECT stream DATA frames which are
>> multiplexed alongside several plaintext HTTP streams.
>> HTTP/2 brings switching between TLS and non-TLS Transport at arbitrary
>> byte boundaries.
>
>
> FYI: To properly support the above using any of the designs discussed so
> far, one would have to essentially elevate SSL to the transfer protocol
> status. You cannot keep SSL in Comm/transport layer if you want to tell
> OpenSSL which specific blocks of the input stream it needs to decode or
> encode. We would have to start using OpenSSL as a buffer
> encoding/decoding library instead of the socket encoding/decoding
> library (i.e., a Transfer layer).

Yes I have been seeing it as something better suited to implementing as
a BodyPipe / ClientStreams module setup by whatever Client/Server class
needs it.

>
> This is not a show stopper at all, of course. I do not think it affects
> the current discussion as long as we agree that SSL/TLS can continue to
> live in Comm/Transport layers for now. We will come back to this later
> and add SSL/TLS filter objects where they are needed, removing SSL code
> from Comm.
>

Yes.

>>>> * FTP requires two Agents requires semantics for sending bytes and
>>>> receiving bytes over TcpTransport. Also the ability for both Agents to
>>>> share a TcpTransport. Also data channel Agent requires ability to block
>>>> ctrl channel Agent from reading.
>>>> One for send/receive on ctrl channel, one for receive on data channel.
>>>> Shared Transport when "data channel" is an ASCII mode transfer via ctrl
>>>> socket.
>>>
>>> I think we just need one FtpAgent on each side of Squid: FtpClient and
>>> FtpServer. The data channel is not a message channel (for the lack of a
>>> better word) so it does _not_ require a separate Agent in my definition
>>> of Agent. Here again, the distinction between low-level bytes (such as
>>> FTP data bytes) and messages (such as FTP commands) is critical.
>>>
>>> This is similar to how an HttpServer agent may do DNS lookups (over a
>>> UDP socket) or ICAP transactions (over a TCP connection). Those
>>> auxiliary activities do not require a special HttpServer agent! They are
>>> just a part of what an HttpServer does beyond its primary function:
>>> processing of HTTP messages. Same for FTP.
>>
>> This is where your idea of Agent is separating from what I have planned.
>
> Yes, this is where our plans differ substantially.
>
>
>> ==> separating the opaque bytes read(2) and write(2) operations from FTP
>> data channel and tunnelled connections will cause an unnecessary
>> duplication of code which is already implemented by the base
>> TcpReceiver/Agent class.
>
> If we see code duplication, we can always move that code into a common
> parent or a stand-alone object. Current TcpReceiver is not suitable for
> reading data bytes anyway; you would have to dumb it down further if you
> want to use it that way (and probably add a smarter class on top of it
> for transfer protocols).

:-) surprise coming up. But that is a ways ahead.

>
>>>> In summary overall Agent requires:
>>>> * semantics for sending bytes
>>>> * semantics for receiving bytes
>>>> * ability to switch Transport type arbitrarily
>>>> * ability to block another Agent reading on its Transport
>>>> * maybe ability to switch between client/server owners/users.
>>>
>>> I would not put it that way. Here is how I would describe this:
>>>
>>> Agent:
>>>
>>> * owns/fully controls transport T at any given time
>>> * uses transport T for sending and receiving messages (not bytes!)
>>
>> That "not bytes!" says that you want the message parsing/packing
>> translation from bytes to HTTP/FTP messages to be done by Transport (
>> ie. comm_read() ) and delivered _as messages_ to Agent.
>
> Not, exactly: The very same byte in the input buffer is both a "raw
> opaque byte" for Transport and "transfer message byte" for Agent. Agents
> work with transfer protocol messages while Transport works with raw
> bytes (or "transport protocol bytes" to be more precise).

Aha! you confused me with that.

A message going from HttpAgent to elsewhere in the code is an
HttpRequest or HttpReply *object*. It bears almost no byte-wise relation
to the bytes in the buffer used by base Agent and Transport.

==> Ergo: Agent must parse the buffer into messages. *how* it does so
is by using a Parser.

>
>> Can we agree that:
>> - all messages are represented as a sequence of bytes sent/received
>> over the Transport API.
>
> Yes.
>
>> - the format of those bytes is determined by FooAgent, Transport sees
>> them as opaque.
>
> Sure.
>
>
>
>>> It is critical to draw the bytes/messages boundary like that IMO. Also,
>>> it would be best to minimize Agent-to-Agent communication to the extent
>>> possible. Most of that communication would go through other Squid layers
>>> such as Store and BodyPipe.
>>
>> +1.
>
>
> Note that having both FtpCtrlAgent and FtpDataAgent contradicts the
> above because those two agents imply extensive Agent-to-Agent
> communication and coordination:

Nope. I am thinking FtpClient/FtpServer (childs of FtpCtrlAgent) at some
point spawn a FtpDataAgent (Job) and wait for it to message back about
completion (or timeout) before continuing with the ctrl messages via its
parent class. Not what I'd call extensive communication.

> Both agents serve the same FTP
> transaction!

Sure. We already have more than a handful of Jobs servicing each
transaction in the HTTP pathways.

>>>>>>>> '''Common base for classes reading (and writing) from (and to) a single
>>>>>>>> TCP Comm Connection. Contains basic connection management code and
>>>>>>>> complex algorithms for coordinating I/O activity with message receiving
>>>>>>>> (and sending) states.'''
>>>>>>>
>>>>>>> Please show me the TCP specific code in your TcpReceiver implementation.
>>>>>>>
>>>>>>
>>>>>> The algorithm model behind readSomeData() assumes a streaming protocol
>>>>>> with single input stream with serialized read(2).
>>>>>> - eliminating direct use by SCTP, FTP, or other multi-connection protocols.
>>>>>
>>>>> I do not know about SCTP, but current TcpReceiver does not eliminate
>>>>> direct use by FTP. FTP agents can use this code for the control channel.
>>>>
>>>> Ah. You are talking about FTP Agents controlling both channels. I am
>>>> talking about a TcpReceiver being used on each channel separately.
>>>
>>> Yes, an Agent in my hierarchy sketch cannot control just a data channel.
>>> Such an Agent would have no high-level protocol (i.e., FTP) messages to
>>> send or receive, which is the primary purpose of an Agent in my sketch.
>>>
>>
>> Ah. I see where we are thinking separately now.
>>
>> My model has a special-case Agent child class used by Tunnel and/or FTP
>> data channel that receives an opaque byte stream and drops each
>> Transport read(2) byte sequence into a BodyPipe as "a message" in Agent
>> terms.
>>
>> That is what I mean by the second Agent for FTP. Are you okay with that?
>
> No, I cannot accept such a special-case Agent child class. I can accept
> it as a Reader+Parser class (called BodyReader in this case?), but it
> would not be an Agent. Some agents might use such a BodyReader class as
> a member or even one of the parents. This will be needed if we see
> significant duplication of the corresponding code among Agents. I am not
> against such encapsulation, under appropriate name, if/when it happens
> to be needed. It is not needed today.
>
> Please note that I am not sure that Tunnels should read raw bytes into
> BodyPipes because those pipes would have to terminate at the same Tunnel
> Agent, which feels like too much overhead and may not even be supported
> by the current BodyPipe code. However, we can review that idea more
> closely when we start working on Tunnels.
>

Yes. I think we should.

>
>> Sure. Making protected.
>>
>> 1) I predict that you will find passing the buffer to
>> processReadBuffer(MemBuf&) and the child retaining its own reference is
>> all that is necessary for the kid classes.
>
> Asking kids to retain a reference to a data member of a parent class is
> a rather strange design: It would indeed allow us to keep inBuf private,
> but essentially without any protections that such a privacy designation
> offers _and_ making it harder to track buffer [mis]use (because there
> are now more objects to track: the buffer object itself in the parent
> and buffer pointers in kids).
>

Sure. I am doubting they will need even that much.

>
>> 2) Note that the code I was forced to add is not removable with change
>> to protected. The injection hack is being done by a global function
>
> This is fine. Just add a TODO comment to move the XXX method into the
> right child class when possible.
>

Moved it out to ConnStateData. Hopefully it can die there instead of
making its way to Http1Client or Http1Agent.

>
>> which should correctly be in the HttpClient in your model not a member
>> of the HttpAgent child (ConnStateData).
>
> HttpClient agent will have access to protected Agent class members,
> including inBuf so that is not a problem AFAICT. Perhaps I misunderstood
> what the above comment implies though.
>

Sorry. I was backsliding to thinking the HttpClient uses Agent relationship.

Amos

text/plain attachment: agent_model_mk1.patch

Received on Mon Mar 03 2014 - 11:03:50 MST

This archive was generated by hypermail 2.2.0 : Mon Mar 03 2014 - 12:00:12 MST