Re: [PATCH] comm cleanup mk8 - src/comm only

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 20 Sep 2010 16:00:34 -0600

On 09/19/2010 05:35 AM, Amos Jeffries wrote:
> Round 2 of only src/comm/ changes.
>
> I believe this accounts for all the bits requested to date. Including
> adding Subscriptions for ConnAcceptor.

* The patch is missing the new Subscription class. Based on your subject
line, I am guessing you will post that separately.

* Comm::ConnOpener::start() should call connect() directly. Start() is
already async, there is no need to pay the async price again. Moreover,
by doing an async call, you are more likely to encounter timeouts even
before we call connect() because the clock starts ticking before the
call is fired.

* Comm::ConnOpener::timeout() should call connect() directly. Timeout()
is already async, there is no need to pay the async price again.

* I do not like how Comm::ConnOpener::connect() calls itself. I assume
you did not invent this, but the code will fail to produce the
intended(?) delay if there are no other async calls already scheduled,
which is something the class does not really control. Recall that async
calls are AsyncCallQueue::fire()d until there are none left...

At the very least, please add a comment explaining why we use such a
hack instead of either using an explicit loop or waiting a little bit
via eventAdd.

* calls_.connect_ should not be reused in Comm::ConnOpener::connect()
because AsyncCall objects may not support being fired twice -- they may
have state. The connect_ object is not needed at all if you address the
above comments. If you do not, set it to NULL at the start of
Comm::ConnOpener::connect() and recreate when needed.

* I may be wrong, but the interesting "Is opened conn fully open?"
comment below may reveal a couple of bugs:

> + // recover what we can from the job
> + if (conn_ != NULL && conn_->isOpen()) {
> + // it never reached fully open, so abort the FD
> + conn_->close();
> + ...

Clearly, conn_ is open inside that if-statement (there is no such thing
as half-open conn, I hope; we have enough headaches from half-closed
ones!). There is nothing wrong with the connection. Instead, it is the
ConnOpener job itself that got into a half-finished state here, most
likely due to an exception after the connection got opened and before it
was sent to the caller. To clean up, two separate actions are needed:

a) Close the opened connection. The beginning of your if-statement will
do the job, but do not call doneConnecting(). Here is a polished version:

   if (conn_ != NULL && conn_->isOpen()) {
       Must(callback_ != NULL); // or there should be no conn_
       // we never sent this open connection to the caller, so close
       conn_->close();
       fd_table[conn_->fd].flags.open = 0;
       conn_ = NULL;
   }

b) Notify the caller we are dying:

   if (callback_ != NULL) {
       // do not let the caller wait forever
       doneConnecting(COMM_ERR_CONNECT, 0);
   }

I am not sure about the order. The above order means "caller can handle
and should get nil params.conn". You can remove "conn_ = NULL" if
"caller should get a non-nil but closed params.conn", but there will
still be two if-statements.

Please check that I did not miss some other inter-state dependencies in
the above sketch. You know this code much better than I am.

* Consider moving case COMM_OK code, including lookupLocalAddress() to
something like Connection::opened(). The code seems to be manipulating
conn_ without access to Comm::ConnOpener data except perhaps host_. I
could have missed some other dependencies, of course, but it feels like
a Connection business to set things up after the connection has been
established. The ConnOpener job is done at that stage.

* Can we leave resetting fd_table[conn_->fd].flags.open to comm_close(),
which will be called from via conn_->close()? It feels wrong to force it
to zero way before the connection is actually closed but I understand
that you may be fighting some hidden dependencies.

* Should we test _peer cbdata-validity before dereferencing it below?
> + if (_peer)
> + _peer->stats.conn_open--;

* The API below seems like a bad idea:
> + /** retrieve the peer pointer for use.
> + * The caller is responsible for all CBDATA operations regarding the
> + * used of the pointer returned.
> + */
> + peer * const getPeer() const { return _peer; }

because it leads to bugs like that (missing cbdata check):
> + if (conn_->getPeer())
> + conn_->getPeer()->stats.conn_open++;

I suggest checking cbdata validity of _peer in getPeer(), returning NULL
if the pointer is invalid, and using getPeer() even internally.

* Our Vector::shift is both slow and buggy. This is not your problem, I
guess, but perhaps you can add an XXX comment to
Comm::AcceptLimiter::kick() to replace Vector with an efficient queue type.

* Can you explain how "syncWithComm() side effects hit
theCallSub->callback()"? Is not theCallSub->callback() itself constant
and its result is stored in a local variable? I do not understand how
syncWithComm() can affect theCallSub itself. Thank you.

* Why check calls_.earlyAbort_, calls_.timeout_, and connStart_ in
ConnOpener::start(). Start is called only once and these checks seem to
imply that those variables can be set somewhere else, before the start,
even though they are private.

* Consider s/connStart_/connectStart_/ for clarity. Too many conn
prefixes...

* Please use "const TextException &e" in
> + } catch(TextException &e) {

* The code below does not need to declare temp outside its scope:
> + ConnAcceptor *temp = deferred.shift();
> + if (temp != NULL) {

* cbdataReferenceDone() already sets its parameter to NULL. No need to
do that twice:
> + /* clear any previous ptr */
> + if (_peer) {
> + cbdataReferenceDone(_peer);
> + _peer = NULL;
> + }

* s/COMM flags set on this connection/COMM flags; see COMM_ defines
above/ or something of that kind.

* Consider s/_peer/peer_/ in Connection for consistency with most other
code.

* s/unused//g. C++ does not require naming unused parameters.

* s/As soon as comm IO accepts/As soon as commSetSelect accepts/
just to be more precise, because many comm IO calls already accept
AsyncCalls.

HTH,

Alex.
Received on Mon Sep 20 2010 - 22:01:04 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 22 2010 - 12:00:07 MDT