Re: [PATCH] Comm::TcpReceiver - part 1, 2, and 3

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 08 Feb 2014 17:26:00 +1300

On 8/02/2014 1:54 p.m., Alex Rousskov wrote:
> On 02/05/2014 07:55 AM, Amos Jeffries wrote:
>> Here we have the combined patch for what I have been calling part 1 -
>> Comm::TcpReceiver interface, part 2 - ConnStateData object API upgrade,
>> and part 3 - HttpStateData object API upgrade.
>
> I am puzzled about the status of this work. There seems to be unfinished
> changes in some critical paths of the code such as
> HttpStateData::mayNeedToReadMore() but you are posting this as a PATCH,
> implying it is ready for commit from your point of view. Could you
> please clarify whether you consider these changes ready for trunk commit
> and use?

The HttpStateData pieces are not quite done. You requested the context
of usage of TcpReceiver for clarity on its scope. The ConnStateData
alone do not cover all the API usage, so I had to include these bits.

I believe the ConnStateData and TcpReceiver pieces are ready once audit
is done for them.

>
>> * Kept the TcpReceiver name for now since Alex is disagreeing with
>> callign the interface an "agent". I am thinking of calling it just
>> "Comm::Tcp" in its current form.
>
> Once we agree on a description for the new class, we should be able to
> name it. Until then, let's keep the wrong name to minimize renaming
> overheads. I do not recall objecting to Agent. Agent is the right name
> if that is what you are writing (it is difficult to be sure based on the
> current code but I will discuss this below).
>

You suggested Agent. Then went on to describe an object which was
self-contained. TcpReceiver has pure virtuals so cannot be what you seem
to describe as Agent (its childs would be that).

>
>> * The clear thing to keep in mind is that this TcpReceiver object is
>> *just* the interface design for the I/O actions. It needs to be
>> inherited by some other object which performs the processing logics on
>> the I/O data.
>
> So, currently, you define TcpReceiver as a base for classes that do
> [TCP?] I/O. That is a start, although I hope we can be more specific
> than that. Something like this might be a step forward:
>
> '''Common base for classes reading (and writing) variable-size protocol
> messages from (and to) a single TCP stream. Contains basic I/O code and
> complex algorithms for coordinating I/O activity with message receiving
> (and sending) states.'''
>
> Please note that I am not saying the above description is correct! That
> description still does not address SSL concerns, for example. However,
> it may be better than the current nothing because it is difficult to
> review a class without a stated scope or intended purpose (wrong or
> otherwise).
>
>> + MemBuf inBuf;
>
> The above undocumented TcpReceiver data member may be critical to
> defining TcpReceiver and related classes correctly. What does that
> buffer contain? Raw TCP bytes? I do not think so. Do we read raw TCP
> bytes into that buffer? Not quite. Those bytes did not necessarily got
> there from a TCP connection. They may have been through an SSL decoding
> layer using ssl_read_method() and Comm I/O API (at least).
>

The intention is to have a separate TlsWhatever class with same API as
this one for the SSL connections reading and writing.

As you know one cannot safely or cleanly mix randomly read(2) and
ssl_read_method() on one connection. Any solution we need to do soo
would be a third interface class.

> That buffer actually contains "final protocol" bytes, for whatever
> protocol the kid agent is implementing (HTTP, Gopher, etc.). The whole
> TcpReceiver class is not about TCP at all!

It's about bytes going to/from a TCP connection. The bufer has nothing
to do with the format of the bytes inside it. No more than TCP packets
determine their payload content, or a char* text string has anything to
do with the upper or lower case of the bytes inside it.

> If that is correct, the class
> description can be adjusted to something like:
>
> '''Common base for classes reading (and writing) variable-size protocol
> messages from (and to) a single Comm Connection. Contains basic
> connection management code and complex algorithms for coordinating I/O
> activity with message receiving (and sending) states.'''
>
> Again, I am not saying the above is correct, but I think it reflects the
> code you are writing better (than nothing or than the TCP-based
> definition I wrote above). And once we agree on what is the correct
> definition, we can discuss the class and member names to match it; I am
> not focusing on the naming problems right now.

'''Common base for classes reading (and writing) from (and to) a single
TCP Comm Connection. Contains basic connection management code and
complex algorithms for coordinating I/O activity with message receiving
(and sending) states.'''

"protocol messages" are nothing to do with this class (they are the
child business and may not even exist for cases like tunnel.cc, or may
be several abstraction levels away).

>
>> +bool
>> +Comm::TcpReceiver::doneAll() const
>> +{
>> + return stoppedSending() && stoppedReceiving() && !inBuf.hasContent() && AsyncJob::doneAll();
>> +}
>
> If we are done sending and receiving why would having buffered content
> (the third guard above) matter? What can we do with that content (in the
> scope of this class) that does not involve sending or receiving?

Because until the child class has cleared/consumed the content out of
our buffer we cannot guarantee that it is finished with. I am trying to
make the buffer private, so the child class doneAll() does not have
direct access to it except when passed in.

What can be done is processReadBuffer() and maybeFinishedWithTcp()
and/or writing that buffered data out to so non-TCP direction (Store,
separate TCP connection, into memory, IPC channels, etc.).

Which shows I missed the maybeFinishedWithTcp() parameter. Now updated
to take inBuf as a parameter so the child classes can play with that
data without accessing inBuf. They can use aBuf.clean() or inBuf.reset()
to drop any remaining bytes once the useful content has been dealt with.

NP: We cannot (yet) unconditionally empty the buffer in the final else
condition of ConnStateData::maybeFinishedWithTcp() because the parse
upgrades have not gone in. (The last active requests is still stored in
the buffer by trunk parser code). When the parser upgrade is done these
bytes will be consumed as parsed and the buffer should be emptied in
ConnStateData::maybeFinishedWithTcp().
 - For now I've added a note about that and put a matching inBuf.clean()
in keepaliveNextRequest() to clear it once the active requests queue has
been drained.

>
>> +void
>> +Comm::TcpReceiver::releaseConnection(const char *reason)
>> +{
>> + // Used by clients to release the connection before
>> + // storing it in a Pconn pool for reuse.
>> + comm_remove_close_handler(tcp->fd, closed_);
>> + closed_->cancel(reason);
>
> s/clients/callers/ or s/clients/kids/ to avoid the impression that this
> code is specific to Client agents. It is not.
>
> "release" is kind of undefined in this context. I think you are talking
> about the need to stop monitoring connection for I/O. Consider
> rephrasing the comment to be more precise.

I have been considering making this return the Comm::Connection object
and set the tcp member to NULL. Somewhat like pop() semantics. Since it
is used to *move* control of the TCP connection from this object to
somewhere else (ie to a pconn pool or maybe a TlsWhatever agent on
ssl-bump or Upgrade).

 Does that sound good?

>
>> +void
>> +Comm::TcpReceiver::releaseConnection(const char *reason)
>> +{
> ...
>> + // XXX: remove half-closed handler ??
>> +}
>
> If the method is used exclusively to prepare a connection for entering
> a PconnPool (or similar actions), then the connection Must() not be
> half-closed (and the method should be renamed?). Otherwise, yes, we
> should remove the half-closed monitor to be in sync with the method
> definition.

I think that Must() goes for the other intended-possible uses as well.

>
>> +void
>> +Comm::TcpReceiver::stopReadingXXX()
>> +{
>> + /* NP: This is a hack needed to allow TunnelStateData
>> + * to take control of a socket despite any scheduled read.
>> + */
>> + if (reading()) {
>> + comm_read_cancel(tcp->fd, reader_);
>> + reader_ = NULL;
>> + }
>> +}
>
> I recommend removing the above comment. We already marked the method
> with XXX

 and documented its dangers. The above comment does not add much
> value, will get stale, and raises questions like "Why is this method
> here and not in the one specific caller it was designed for?". If you
> remove the comment, the reader can still search for stopReadingXXX()
> callers. Callers should document why they are calling an *XXX() method.

Done.

>
>
>> + // useless to read() after aborting read()
>> + if (stoppedReceiving())
>> + return;
>
> It is not "useless", it is wrong. And stoppedReceiving() does not mean
> we aborted reading. We could just stopped reading after EOF. Consider:
>
> // do not start now if we have already stopped
>
> or something like that.

Used "do not start read(2) if receiving has been halted".

>
>> + /// Inject a fake request as if the client had sent it.
>> + // httpsSslBumpAccessCheckDone() should be doing explicit state setup inste
>> ad of parsing.
>> + bool injectFakeRequestXXX(MemBuf &fake);
>
> As implemented, the method is not really specific to clients, requests,
> fake messages, or even messages. It just prepends some protocol bytes.
> Technically, there is nothing XXX about the method either! If you keep
> it, consider naming it inject(). If there is something XXX about it that
> I missed, let's use injectXXX().
>
> I think we should document that the message is injected before existing
> buffered content, if any.
>
> Please move caller-specific explanation of the XXX to the caller where
> it is less likely to get out of sync.
>

Moved comment to caller and renamed:

+ /// Inject some bytes to prefix the existing buffer contents (if any).
+ bool injectPrefixBytesXXX(MemBuf &pfx);

>
>> +bool
>> +Comm::TcpReceiver::injectFakeRequestXXX(MemBuf &fake)
>> +{
>> + if (inBuf.hasContent())
>> + fake.append(inBuf.content(), inBuf.contentSize());
>> +
>> + inBuf.reset();
>> + inBuf.append(fake.content(), fake.contentSize());
>> +
>> + return processReadBuffer(inBuf);
>> +}
>
> Let's not combine injection with processing. Let the caller call
> processReadBuffer() if/when they are ready for that. It would make the
> method more powerful but simpler. And you would not have to fix its
> documentation to describe the return value, etc.
>
> Another layer of problems with this method are two similar asserts: One
> when the "fake" buffer is too small and the other one is when the inBuf
> is too small to accommodate the combined content. I will focus on the
> former.
>
> We should not assume that the fake MemBuf can accommodate the entire
> inBuf. MemBufs have limited capacity and it is too much to ask the
> caller to give us the right buffer. If you disagree, please adjust the
> method description to explain the requirements on the buffer parameter.
> If you agree, make the buffer parameter constant and use a temporary
> buffer (easy) or some kind of new MemBuf method with memmove() to inject
> content.

I was trying to reduce data copies here. Oh well we will have to go with
3x data copy and two re-allocations on the common case for intercepted
HTTPS.

>
> The code should not assert (because I might be able to trigger such an
> assert by stuffing the input buffer with a lot of content). A throwing
> Must() would be OK in this hypothetical case IMO because it will just
> terminate a single transaction.

Dangerous to call Must() as these are possibly outside of AsyncJob
scope. IDENT and WHOIS lookup etc will likely not be AsyncJob by the
time this could be rolled out there.

>
> Overall, I am not sure this method is needed. The caller can do the
> manipulations on its own, right? It may be easier for the caller to add
> the required capacity checks than implement this general code correctly.
>

Ideally not. I am trying to move the inBuf to a private member so the
childs should only be playing with the buffer they get passed by
processReadBuffer(MemBuf &)

>
>> +Comm::TcpReceiver::TcpReceiver() :
>> + AsyncJob("Comm::TcpReceiver"),
>> + tcp(),
>> + stoppedReceiving_(NULL),
>> + stoppedSending_(NULL),
>> + closed_(),
>> + reader_()
>> +{}
>
> If you want to explicitly list data members with default constructors,
> please list all of them, including inBuf.
>

Oops. Done.

>
>
>> + /// Attempt to read some data.
>> + /// Will call processReadBuffer() when there is data to process.
>> + void readSomeData();
>
> Let's explicitly say that the method "May do nothing." It may be
> important for the caller not to rely on more data coming in after
> calling this method.
>

Okay. Done.

>
>> + if (!maybeMakeSpaceAvailable()) {
>> + stopReceiving("full read buffer - but processing does not free any space");
>> + // fall through to setup the half-closed monitoring
>> + } else {
>> + // schedule another read() - unless aborted by maybeMakeSpaceAvailable() failure
>> + readSomeData();
>> + return; // wait for the results of this attempt.
>> + }
>
> The "unless aborted by..." part is wrong -- we have already ensured that
> there _is_ space available above. Otherwise, the "wait for the results
> of this attempt" would become a bug -- the results would never come if
> the buffer is full.

Removed.

>
> That is all I had time for today, sorry. More to come.
>

Thank you.

Amos
Received on Sat Feb 08 2014 - 04:26:13 MST

This archive was generated by hypermail 2.2.0 : Sat Feb 08 2014 - 12:00:11 MST