Re: [PATCH] client-side redesign pt1 - Comm::TcpReceiver

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 21 Jan 2014 22:45:45 -0700

On 01/07/2014 02:52 AM, Amos Jeffries wrote:
> Updated patch attaced for audit.
>
> This one includes all the currently known bits for server-side delay
> pools so no audit omissions this time around.
>
> On 4/01/2014 8:16 a.m., Alex Rousskov wrote:
>> On 12/03/2013 10:05 PM, Amos Jeffries wrote:
>>> This patch abstracts the TCP socket operations out of existing
>>> client-side code into a class which can be re-used for any TCP socket code.
>>
>>
>>> +namespace Comm {
>>> +
>>> +class TcpReceiver : virtual public AsyncJob
>>> +{
>>> +public:
>>
>> Missing class description.

You may want to add an "XXX: finalize and describe scope" comment in
lieu of that description for now.

> I was hoping initially to call it TcpReader, but these send-related
> details prevent it being a pure read-only semantic. At least "receive"
> semantics involve the concept of sending in the form of ACKs.

To make matters worse, ICAP code is using read/write terms to talk about
data being sent from/to Squid main code and receive/send terms to talk
about data being sent from/to the remote ICAP service. See the first
comment in src/adaptation/icap/ModXact.cc

Unfortunately, I am worried there is a bigger problem here than the verb
to use (on the other hand, our inability to name something is often
indicative of a design problem; more on that immediately below).

> Any better ideas at a name? The primary purpose is to control the
> read(2) operations. The only write(2) related code is the stop*() errors
> flag state to indicate that sending has stopped or not. Which is needed
> to manage the close(2) properly. Or should we abstract that out to yet
> another class managing FD lifetime?

Difficult to say without doing the legwork. The correct design depends
on what our clients and servers need. I am seriously worried that you
are too focused on one server now (client_side_*cc) and once you start
adding more and more agents, the current design would fall apart, even
if we spend another five iterations perfecting it.

All the TCP clients and servers you are willing to include (as future
TcpReceiver kids) in the current project scope have at least one thing
in common -- they all read and write protocol messages over [persistent]
TCP connections. Why do you think it is a good idea to focus on just the
reading part so much? If you want a common parent for all those agents,
why not handle both actions in that common parent, especially if you
already find it necessary to add a little bit of "send" part into the
TcpReceiver?

If tomorrow we have both TcpReceiver and TcpSender, and all their kids
have both classes as parents, with complex inter-parent dependencies
that are already showing in TcpReceiver, we are doing something wrong.

This may be one of those cases where restricting class scope too much
makes the code more complex and less reusable. After reading your
response to my initial comments, this is my primary concern for the
overall design. I am not [just] ranting about it. This may be a serious
matter.

I can think of two very different ways to go forward from here:

A) Forget about other agents, sides, etc. and focus on the HTTP server
(i.e., client-side*cc) code exclusively. That code does not need a
TcpReceiver. It needs a lot of work, but extracting TCP reading code
into an abstract class is not one of them!

B) Try to implement a common parent for all existing TCP protocol
agents/sides. That common parent, let's call it TcpAgent for the purpose
of this discussion, will have code to read and write messages. There is
lots of common, complex code to extract from the existing agents into
this common parent! If we go a step further, many Agents also maintain a
connection pool, so you may add a TcpAgentWithPool class of some kind.

IMO, both approaches above are valid in isolation, but mixing them may
cause trouble.

>>> It provides:
>>>
>>> * data members for referencing a TCP socket and read buffer.
>>>
>>> * methods for incremental parsing or processing of recieved data.
>>>
>>> * methods for managing half and fully closing a socket.
>>>
>>> * separation of socket read() from HTTP/1 parsing and processing logic.
>>> It may be re-used for any client-side or server-side TCP socket.

>>> +void
>>> +Comm::TcpReceiver::stopReading()
>>> +{
>>> + /* NP: This is a hack only needed to allow TunnelStateData
>>> + * to take control of a socket despite any scheduled read.
>>> + */
>>> + if (reading()) {
>>> + comm_read_cancel(tcp->fd, reader_);
>>> + reader_ = NULL;
>>> + }
>>> +}

>> Why is this called a hack?

> It only exists because tunnel.cc takes over the FD. Really nasty (the
> dropped data race condition).

Let's call this method stopReadingXXX() so that it is clear that any
caller is "doing it wrong" and to discourage use.

> All I have done is shuffle the function into this class. The problem
> already exists and I do not see any way we can solve it within
> reasonabel scope of this project. So I opted to mark with a comment and
> leave it for now.

That is fine, of course. We just need to make sure others will not start
routinely calling this method thinking that it stops the receiver job
from reading without any nasty side-effects. Adding an XXX suffix will
at least help flag these calls during code review.

>>> + if (inBuf.spaceSize() < 2) {
>>
>> Please either use !spaceSize() or add a comment to explain why having a
>> non-empty buffer is not enough (i.e., why we must have at least two
>> bytes of buffer space).
>>
>
> This is a delay pools or read(2) limitation apparently. I have no idea
> why it had that in the client-side when I started.
...
> Perhapse it was in client-side for the client delay pools but not
> documented properly as such?

I do not know or do not remember, sorry. The whole "delay pools cannot
deal with single-byte buffers" is a delay pools bug, outside this
project scope.

> +bool
> +Comm::TcpReceiver::maybeMakeSpaceAvailable()
> +{
> + /*
> + * why <2? Because delayAwareRead() won't actually read if
> + * you ask it to read 1 byte. The delayed read(2) request
> + * just gets re-queued until the client side drains, then
> + * the I/O thread hangs. Better to not register any read
> + * handler until we get a notification from someone that
> + * its okay to read again if the buffer cannot grow.
> + */
> +

That comment does not match the code. The method is not scheduling any
reads, it is just allocating buffer space. I would just add a simpler
comment like this:

// XXX: <2 is probably for delayAwareRead() in callers to work, as
// discussed in HttpStateData::maybeReadVirginBodymethod().

> +Comm::TcpReceiver::readIoHandler(const CommIoCbParams &io)

Drop the "Io" part because "read" already implies "Io"?

> + assert(Comm::IsConnOpen(tcp));
> + assert(io.conn->fd == tcp->fd);

These should probably be Must()s since we are running under job protection.

> + /*
> + * Don't reset the timeout value here. The timeout value will be
> + * set to Config.Timeout.request by httpAccept() and
> + * clientWriteComplete(), and should apply to the request as a
> + * whole, not individual read() calls. Plus, it breaks our
> + * lame half-close detection
> + */

Too specific to the servers/ code (given your desire to use this for
clients/, servers/, and even adaptation services).

> + if (size < 0) {
> + if (!ignoreErrno(xerrno)) {
> + debugs(5, 2, tcp << " read failure: " << xstrerr(xerrno));
> + return true;
> + } else if (!inBuf.hasContent()) {
> + debugs(5, 2, tcp << ": no data to process (" << xstrerr(xerrno) << ")");
> + }
> + }
> +
> + return false;

The ignoreErrno() == true case above does not work well for the caller:
If there was a minor error, the caller should keep reading, not
re-processing the old inBuf content that has not changed. More on that
further below (**).

> + if (readWasError(io.flag, io.size, io.xerrno)) {
> + noteTcpReadError(io.xerrno);
> + io.conn->close();
> + return;
> + }

The io.conn->close() part should probably be inside noteTcpReadError(),
which kids may overwrite and call. More on that futher below (**).

> + // if the connection is still possibly sending
> + // the child class may be able to stop immediately
> + if (const char *reason = maybeFinishedWithTcp()) {
> + stopSending(reason); // will close connection
> + return;
> + }

Missing "not" somewhere in the top comment?

> + if (io.flag == COMM_OK) {
...
> + } else if (io.size == 0) {
> + debugs(5, 5, io.conn << " closed?");
> + stopReceiving("zero sized read(2) result");
...
> + /* It might be half-closed, we can't tell */
> + fd_table[io.conn->fd].flags.socket_eof = true;
> + commMarkHalfClosed(io.conn->fd);
> + fd_note(io.conn->fd, "half-closed");
> + }
> + }
> +
> + bool mayReadMore = true;
> + // pass handling on to child instance code
> + if (inBuf.hasContent())
> + mayReadMore = processReadBuffer(inBuf);

There seems to be a couple of logic bugs here:

* If we read zero bytes, we should call the parser again (this time with
an eof flag), even if the TCP connection is closed now. Parse-after-eof
is necessary because a zero-read may indicate the end of the message,
etc. The above code does not do that. This may not be very kosher for a
server and is wrong for a client.

* If we read zero bytes, we should not be reading anymore but the code
continues with reading-specific checks/calls.

(**): Your readWasError() method may help to address many issues raised
above, actually. That method is currently half-baked. It detects errors,
including the ones we should ignore, but the caller still has to
remember check the io.flag and deal with other special conditions. What
you probably want is to move readWasError() code inside
Comm::TcpReceiver::readIoHandler() but have readIoHandler() call the
appropriate handling methods, acting as a dispatcher. Here is a sketch:

Comm::TcpReceiver::readIoHandler() {
...
  if (bad error)
      handleReadError(...);
  else if (ignored error)
      ; // do nothing
  else if (size == 0)
      handleReadEof();
  else
      handleReadData(...)
...
}

Some of the above methods can be implemented in TcpReceiver, some in
kids, some on both layers.

>>> + // AsyncJob API
>>> + virtual bool doneAll() const;
>>> + virtual void swanSong();
>>> + void tcpConnectionInit();
>>> + virtual int64_t mayNeedToReadMore() const = 0;
>>> + bool maybeMakeSpaceAvailable();
>>> + virtual bool processReadBuffer(MemBuf &) = 0;
>>> + virtual void noteTcpReadError(int) {}
>>> + void readIoHandler(const CommIoCbParams &io);
>>> + virtual const char * maybeFinishedWithTcp() = 0;
>>
>> Please review all public methods and make each methods that is meant to
>> be called by parent or kid classes only protected instead of public. The
>> above lists some good candidates, but I have not carefully verified each
>> of them and I may have missed some.
>>
>> In general, if you can make a method protected, do so and if you can
>> make a data member private, do so as well. This helps protecting classes
>> from invasive calls that should be using different/public interfaces.

> My workaround for the server-side trouble was to make this
> ServerStateData inherit from this TcpReceiver class. Which has trouble
> with the 2nd generation inheriting child being the one implementing the
> virtual when its a protected/private membet ofr the middle inheritence
> class.

> Will do the protected changes when I am sufficiently happy that the
> CBDATA calls are working properly to avoid the workaround.

I hope the cbdata problem is resolved now, but please let me know if you
need my help with any aftershocks.

> + /// initialize the TCP connection event handlers
> + /// close(2) callback etc.
> + void tcpConnectionInit();

* The "tcp" prefix seems redundant -- the whole class is about receiving
over a TCP connection and there are no other connections around.

* Have you considered passing the connection to this method instead of
the constructor that cannot really do anything with the connection? I
have not seen the callers yet, but it feels like that would be a cleaner
design.

> + void releaseTcpConnection(const char *reason);

Same problem with the "tcp" prefix.

> +void
> +Comm::TcpReceiver::releaseTcpConnection(const char *reason)
> +{
> + // Used by server-side to release the connection before
> + // storing it in Pconn pool
> + comm_remove_close_handler(tcp->fd, closed_);
> + closed_->cancel(reason);
> + stopReading();
> +}

* What about the HTTP Server (client_side*.*) agent? It has to maintain
a connection pool as well, right? If the code is really specific to
clients/, why place it here?

* Hm.. Earlier you promised that stopReading() is a hack exclusively for
TCP tunnels. Here it seems to be used by ordinary clients/.

> + closed_->cancel("graceful close");
> + tcp->close();

It is best to be specific in those cases because there are many graceful
closures but we want to mark this one (for debugging/triage). I suggest
s/graceful close/Comm::TcpReceiver::stopSending/, especially since we
may be closing due to an error.

> +/* This is a handler normally called by comm_close() */
> +void
> +Comm::TcpReceiver::tcpConnectionClosed(const CommCloseCbParams &io)

I think it is best to call all Comm handlers handle*(). This one could
be handleConnectionClose() or something like that.

HTH,

Alex.
Received on Wed Jan 22 2014 - 05:45:58 MST

This archive was generated by hypermail 2.2.0 : Thu Jan 23 2014 - 12:00:14 MST