Re: [PATCH] Comm::TcpReceiver - part 1, 2, and 3

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 28 Feb 2014 21:23:32 -0700

On 02/08/2014 04:00 AM, Amos Jeffries wrote:
> On 8/02/2014 8:05 p.m., Alex Rousskov wrote:
>> Agent will have pure virtual functions, of course. Event the existing
>> ClientAgent (i.e., ServerStateData) does!

> Oh you are looking at Agent as part of the hard-coded inheritence chain
> of the protocol manager class?
> I have been looking at it as a dynamicaly assigned member of the
> manager class. So that we can pass an Agent between say ConnStateData
> and TunnelStateData. Or switch a ConnStateData Agent pointer between
> TcpAgent and TlsAgent when doing ssl-bump on a client connection.

I have provided my working definition of Agent on January 24, 2014:

  Agent: Common base for Client and Server agents. Uses Transport.
  Client: Common base for HttpClient, FtpClient, IcapClient, etc.
  Server: Common base for HttpServer, FtpServer, etc.
  HttpClient: Sends HTTP requests. Receives HTTP responses.
  FtpServer: Receives FTP commands. Sends FTP replies.
  ...

  TransportLayer: Common base for TcpLayer, SslLayer, and other layers.
  TcpLayer: A final TransportLayer using a TCP socket Comm APIs for I/O.
  SslLayer: A TransportLayer using another TransportLayer for I/O.
  ...

  Transport: Contains one or more ConnLayers. Handles I/O multilayering.

  Connection: A socket with a peer address. We have that already.

Whether Agent is a dynamically assigned member of some other class
should not be relevant to what Agent is. We should strive to define
things for what they are/do, not for how they may be used (to the extent
possible).

IIRC what ConnStateData does today, it is an HttpServer in the above
definition (i.e., an Agent).

Overall, I think your TcpReceiver class is stuck somewhere between

* Agent (that is how you actuall use TcpReceiver in the patch!) and

* Transport (that is how you talk about TcpReceiver on this thread when
explaining/justifying some of its internals).

This is just my impression, of course! If I am on the right track, then
I suggest you pick the Agent design because

a) that is how you _use_ your class, which is more important than the
implementation details; and

b) it is usually easier to design and implement classes you can use
immediately rather than classes that you have to just keep around for
future use.

and we can make much better progress from there.

> The server-facing Agents already are separate as you noted. They are
> decended from ServerStateData ... but the manager over there is called
> FwdState rather than ConnStateData/ClientSocketContexxt.

I am not sure what an "Agent manager" is, but I know that FwdState today
is essentially a connection opening code. Its relationship with Agents
is pretty much limited to starting them (with a freshly opened or reused
idle/pinned connection).

>>>>> + MemBuf inBuf;
>>>>
>>>> The above undocumented TcpReceiver data member may be critical to
>>>> defining TcpReceiver and related classes correctly. What does that
>>>> buffer contain? Raw TCP bytes? I do not think so. Do we read raw TCP
>>>> bytes into that buffer? Not quite. Those bytes did not necessarily got
>>>> there from a TCP connection. They may have been through an SSL decoding
>>>> layer using ssl_read_method() and Comm I/O API (at least).
>>>>
>>>
>>> The intention is to have a separate TlsWhatever class with same API as
>>> this one for the SSL connections reading and writing.
>>
>> I still think that would be a wrong design. The complex pieces of code
>> you have moved to TcpReceiver have nothing to do with TCP or SSL. If you
>> implement your plan, those pieces would be identical in TcpWatever and
>> TlsWhatever. That's wrong.
>>
>> Let other layers handle the TCP and SSL specifics. The class you are
>> writing now does not have any TCP- or SSL-specific code (unless I missed
>> some minor bits) and that is good! Why make it TCP specific?

> TLS is always used as a protocol over-layer on TCP connections.

OpenSSL code does not know or care that a secure communication is
happening over a TCP connection. We do not leak that information to the
OpenSSL library. The only knowledge OpenSSL has about I/O transport is a
socket descriptor supplied via SSL_set_fd().

> So in my
> idea model Comm::TlsWhatever inherits the complex algorithms and
> send/receive parts from TcpReceiver and replaces only the parts it needs
> to and extends with API for TLS-specific needs.

AFAICT, the "complex algorithms" there are not specific to TCP or SSL so
I do not see why TlsWhatever would want to extend the current
TcpReceiver class. Moreover, I do see why a class like HttpServer may
want to extend TcpReceiver _and_ why an HttpServer may want to go from
raw TCP I/O to OpenSSL I/O. All of that tells me that TcpReceiver is
probably an Agent and that TCP/SSL/TLS transport code should live elsewhere.

>>> As you know one cannot safely or cleanly mix randomly read(2) and
>>> ssl_read_method() on one connection. Any solution we need to do soo
>>> would be a third interface class.
>>
>> The actual read(2) and ssl_read_method() calls will be done by other
>> layers and classes, orthogonal to the class you are working on now. The
>> existing Comm API is already doing that (for now; Peek and Splice needs
>> will require more work on those layers, but that is out of scope here).
>>
>
> From what I know of Peek-n-Splice you will need to be reading the data
> in explicitly (as TCP?) and passing it to the BIO handlers, yes?

That is what the current Peek and Splice implementation does IIRC. Those
I/O operations are and probably should remain very low-level. Thus, I
expect most Peek and Splice *I/O* integration to happen at Comm or
Comm-like levels rather that Agent levels, but now we are discussing one
unknown on top of another... I still think we should keep Peek and
Splice out of the TcpReceiver discussion scope.

> That is easier to handle with a distinct TlsWhatever agent than relying
> on the comm_*() abstraction layer to determine a read(2) / write(2)
> equivalent (from the global fd_table BTW).

If we can agree on what HTTP and FTP Agents are first, we may have a
better chance of being able to add Peek and Splice (and other
complexities) to that as a second stage. Even if new features are going
to require some redesign, the initial agreement would be more valuable
than the current stalemate, where we cannot even agree on what
TcpReceiver class is (and, hence, how it should be named)!

>>>> That buffer actually contains "final protocol" bytes, for whatever
>>>> protocol the kid agent is implementing (HTTP, Gopher, etc.). The whole
>>>> TcpReceiver class is not about TCP at all!
>>>
>>> It's about bytes going to/from a TCP connection.
>>
>> Nope. The transport and any encodings/encryptions on top of it are
>> pretty much irrelevant to the current (and future) TcpReceiver code. If
>> you disagree, please show me important TCP-specific code in your current
>> TcpReceiver implementation.

You have not responded to the above paragraph. Does it mean that you
agree that connection transport and any encodings/encryptions on top of
it are pretty much irrelevant to the current (and future) TcpReceiver code?

>>> '''Common base for classes reading (and writing) from (and to) a single
>>> TCP Comm Connection. Contains basic connection management code and
>>> complex algorithms for coordinating I/O activity with message receiving
>>> (and sending) states.'''
>>
>> Please show me the TCP specific code in your TcpReceiver implementation.
>>
>
> The algorithm model behind readSomeData() assumes a streaming protocol
> with single input stream with serialized read(2).
> - eliminating direct use by SCTP, FTP, or other multi-connection protocols.

I do not know about SCTP, but current TcpReceiver does not eliminate
direct use by FTP. FTP agents can use this code for the control channel.

> Use of comm_read() and Comm::Write() relies on a socket with TCP_STREAM
> socket type,

It does not. Existing IPC code uses both comm_read() and Comm::Write()
for UDS sockets.

> "Comm::Connection tcp;" is likewise a TCP_STREAM connection

Comm::Connection is not TCP_STREAM connection. IPC uses Comm::Connection
for UDS connections.

> assuming direct read(2)/write(2) access is possible.

Comm::Connection does not make that assumption. The I/O methods are
handled at a separate layer. This is why TCP, SSL, and UDS code can use
the same Connection and high-level read/write API. That part of the
design is valuable/good. We should not break it. High-level message
(HTTP, FTP, etc.) handling code should not care what the transport is.

> - all of these eliminate its use for UDP or UDS based protocol
> receiving/sending.

As the existing/working code illustrates, they do not.

> We could just as easily (for the model, not the code) have the child
> class doing TLS decryption of encrypted bytes read(2) directly off the
> socket in to TcpReceiver. Given the plans I've seen about
> Peek-and-Splice using BIO handlers we will probably end up doing exactly
> that in some code.

You are right, except "use transport X with encoding Y for HTTP" is not
a "parent-child" relationship. It is a "use" relationship. Implementing
it using Agent child classes would be wrong. That relationship should be
implemented via class data members instead.

>> In other words (and simplifying a bit):
>>
>> /// raw TCP bytes (WRONG: could be SSL-decrypted TCP)
>> MemBuf inBuf;
>>
>> /// opaque received bytes (OK but a little vague)
>> MemBuf inBuf;
>
> Explicitly accurate for what is in there though. Which is why I left it
> undocumented. "inBuf" seems clear enough.

There is a significant difference among the three inBuf descriptions I
have posted so, no, inBuf without a description is not clear enough.

>> /// received bytes that are yet-unprocessed by our kids;
>> /// opaque for us, but kids know the protocol these bytes
>> /// are using (e.g., HTTP, FTP, etc) (Better)
>> MemBuf inBuf;
>>
>> The actual final description may also mention what we expect the kids to
>> do with the buffer (copy, leave just the unparsed bytes, empty on exit,
>> etc.).
>
> We expect them to process it. Whatever that may mean to the child. This
> class places no restriction or expectation on that.

Your own TcpReceiver code treats empty inBuf specially so, yes, you are
already placing expectations and restrictions (and that is OK; we just
need to agree on them and disclose them).

> +bool
> +Comm::TcpReceiver::injectFakeRequestXXX(MemBuf &fake)
...
>>>> Overall, I am not sure this method is needed. The caller can do the
>>>> manipulations on its own, right? It may be easier for the caller to add
>>>> the required capacity checks than implement this general code correctly.

>>> Ideally not. I am trying to move the inBuf to a private member so the
>>> childs should only be playing with the buffer they get passed by
>>> processReadBuffer(MemBuf &)

>> As you know by now, I do not think that hiding input buffer from kids is
>> going to work unless you force kids to copy it. TcpReceiver does not
>> control all the events in a transaction life and, hence, cannot prohibit
>> access to the buffered message which may be waiting for those events.

> By making this a private/protected member we restrict access to it from
> objects outside the child class and restrict parsing/processing actions
> on it to the code where it is passed as parameter to
> processReadbuffer(Membuf &).

Yes, but we were discussing whether injectFakeRequestXXX() is needed, to
which you replied that you are trying to make inBuf private (not
protected!). So, if you now agree that kids should have access to the
buffer, let's remove injectFakeRequestXXX() with all if its problems.

Thank you,

Alex.
Received on Sat Mar 01 2014 - 04:23:55 MST

This archive was generated by hypermail 2.2.0 : Sat Mar 01 2014 - 12:00:14 MST