Re: [PATCH] comm cleanup mk8 - src/comm only

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 23 Sep 2010 02:28:54 +1200

On 21/09/10 10:00, Alex Rousskov wrote:
> On 09/19/2010 05:35 AM, Amos Jeffries wrote:
>> Round 2 of only src/comm/ changes.
>>
>> I believe this accounts for all the bits requested to date. Including
>> adding Subscriptions for ConnAcceptor.
>
> * The patch is missing the new Subscription class. Based on your subject
> line, I am guessing you will post that separately.
>
>
> * Comm::ConnOpener::start() should call connect() directly. Start() is
> already async, there is no need to pay the async price again. Moreover,
> by doing an async call, you are more likely to encounter timeouts even
> before we call connect() because the clock starts ticking before the
> call is fired.

Done.

>
> * Comm::ConnOpener::timeout() should call connect() directly. Timeout()
> is already async, there is no need to pay the async price again.
>

Done.

> * I do not like how Comm::ConnOpener::connect() calls itself. I assume
> you did not invent this, but the code will fail to produce the
> intended(?) delay if there are no other async calls already scheduled,
> which is something the class does not really control. Recall that async
> calls are AsyncCallQueue::fire()d until there are none left...
>
> At the very least, please add a comment explaining why we use such a
> hack instead of either using an explicit loop or waiting a little bit
> via eventAdd.
>

I think you asked me to remove eventAdd a while back before this was a
Job. Reverted to the old .05 sec delay.

>
> * calls_.connect_ should not be reused in Comm::ConnOpener::connect()
> because AsyncCall objects may not support being fired twice -- they may
> have state. The connect_ object is not needed at all if you address the
> above comments. If you do not, set it to NULL at the start of
> Comm::ConnOpener::connect() and recreate when needed.

Oops. sorry. Foxed.

>
>
> * I may be wrong, but the interesting "Is opened conn fully open?"
> comment below may reveal a couple of bugs:
>
>> + // recover what we can from the job
>> + if (conn_ != NULL && conn_->isOpen()) {
>> + // it never reached fully open, so abort the FD
>> + conn_->close();
>> + ...
>
> Clearly, conn_ is open inside that if-statement (there is no such thing
> as half-open conn, I hope; we have enough headaches from half-closed

sorry, no such luck. see COMM_INPROGRESS.
As I understand it:
  For the period between comm_connect_addr() returning and a signal
being received (via the write handler being called) that write access is
possible the FD is in fact half-open.
  The kernel seems to have allocated a FD and returned (non-blocking)
but still be awaiting the network SYN-ACK.

The headaches here are small since only the opening sequence job and the
fd_table is involved. We just have to cleanup the conn_->fd before
passing control of conn_ back out to any other part of Squid.

> ones!). There is nothing wrong with the connection. Instead, it is the
> ConnOpener job itself that got into a half-finished state here, most

Yes. Nothing wrong, but we can't use the FD the system gave us already.

> likely due to an exception after the connection got opened and before it
> was sent to the caller. To clean up, two separate actions are needed:
>
> a) Close the opened connection. The beginning of your if-statement will
> do the job, but do not call doneConnecting(). Here is a polished version:
>
> if (conn_ != NULL && conn_->isOpen()) {
> Must(callback_ != NULL); // or there should be no conn_
> // we never sent this open connection to the caller, so close
> conn_->close();
> fd_table[conn_->fd].flags.open = 0;
> conn_ = NULL;
> }
>
> b) Notify the caller we are dying:
>
> if (callback_ != NULL) {
> // do not let the caller wait forever
> doneConnecting(COMM_ERR_CONNECT, 0);
> }
>
> I am not sure about the order. The above order means "caller can handle
> and should get nil params.conn". You can remove "conn_ = NULL" if
> "caller should get a non-nil but closed params.conn", but there will
> still be two if-statements.

Good idea. Above order is correct. params.conn may contain the attempted
local/remote IP combos for use by the callers recovery. !COMM_OK status
guarantees that the FD is not safely usable.

I also see that I missed clearing the write handler before fully closing
conn_. do the timeout and closing handlers need to be explicitly
removed? or are they happy with the cancels?

>
> * Consider moving case COMM_OK code, including lookupLocalAddress() to
> something like Connection::opened(). The code seems to be manipulating
> conn_ without access to Comm::ConnOpener data except perhaps host_. I
> could have missed some other dependencies, of course, but it feels like
> a Connection business to set things up after the connection has been
> established. The ConnOpener job is done at that stage.

Done.

>
> * Can we leave resetting fd_table[conn_->fd].flags.open to comm_close(),
> which will be called from via conn_->close()? It feels wrong to force it
> to zero way before the connection is actually closed but I understand
> that you may be fighting some hidden dependencies.

We can. It was me being paranoid rather than known problems.

Considering no other jobs etc have anything validly pending on this
brand new FD should I be calling fd_close(conn_->fd) here instead of
conn_->close()?

>
> * Should we test _peer cbdata-validity before dereferencing it below?
>> + if (_peer)
>> + _peer->stats.conn_open--;
>
> * The API below seems like a bad idea:
>> + /** retrieve the peer pointer for use.
>> + * The caller is responsible for all CBDATA operations regarding the
>> + * used of the pointer returned.
>> + */
>> + peer * const getPeer() const { return _peer; }
>
> because it leads to bugs like that (missing cbdata check):
>> + if (conn_->getPeer())
>> + conn_->getPeer()->stats.conn_open++;
>
> I suggest checking cbdata validity of _peer in getPeer(), returning NULL
> if the pointer is invalid, and using getPeer() even internally.
>

Done.

>
> * Our Vector::shift is both slow and buggy. This is not your problem, I
> guess, but perhaps you can add an XXX comment to
> Comm::AcceptLimiter::kick() to replace Vector with an efficient queue type.
>
> * Can you explain how "syncWithComm() side effects hit
> theCallSub->callback()"? Is not theCallSub->callback() itself constant
> and its result is stored in a local variable? I do not understand how
> syncWithComm() can affect theCallSub itself. Thank you.
>

Possibly obsolete now.

Dialer copy-constructors was initially created using getDialer() which
call syncWithComm() which in CommIoCb child class updates this->* state.
Fallout of that is that:
  the CommIoCb constructor cant take a const parameter,
  the parent class cant provide a const overload constructor,
  the other children cant either,
  callback() using copy-constructor cant be const.

I *think* the copy constructors are okay using raw dialer pointers now.
Which means everything down the chain can be const.

>
> * Why check calls_.earlyAbort_, calls_.timeout_, and connStart_ in
> ConnOpener::start(). Start is called only once and these checks seem to
> imply that those variables can be set somewhere else, before the start,
> even though they are private.

Fixed.

>
> * Consider s/connStart_/connectStart_/ for clarity. Too many conn
> prefixes...

Done.

>
> * Please use "const TextException &e" in
>> + } catch(TextException &e) {
>
Done.

>
> * The code below does not need to declare temp outside its scope:
>> + ConnAcceptor *temp = deferred.shift();
>> + if (temp != NULL) {
>
Done.

>
> * cbdataReferenceDone() already sets its parameter to NULL. No need to
> do that twice:
>> + /* clear any previous ptr */
>> + if (_peer) {
>> + cbdataReferenceDone(_peer);
>> + _peer = NULL;
>> + }
>

Great.
At this random point we are not sure its valid. For now I've make the
if() use the fixed getPeer(). Would it be correct to use
cbdataReferenceValidDone(NULL, _peer) unconditionally here with no
callback? instead of separate validate and done?

> * s/COMM flags set on this connection/COMM flags; see COMM_ defines
> above/ or something of that kind.
>
>
> * Consider s/_peer/peer_/ in Connection for consistency with most other
> code.

The "struct peer_" global symbol clashes and gcc complains about missing
; or = after variable "struct".
I don't plan on fighting that one just yet. Will mark it to be done though.

>
> * s/unused//g. C++ does not require naming unused parameters.

Done. And converted the two places (commSetSelect and eventAdd wrappers)
to use JobCallback with NullaryMemFunT dialers on connect().

>
> * s/As soon as comm IO accepts/As soon as commSetSelect accepts/
> just to be more precise, because many comm IO calls already accept
> AsyncCalls.
>

Done.

Thank you.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.8
   Beta testers wanted for 3.2.0.2
Received on Wed Sep 22 2010 - 14:29:00 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 22 2010 - 12:00:07 MDT