Re: [PATCH] Comm cleanup (Async ConnOpener)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 19 Jul 2010 09:40:25 -0600

On 07/19/2010 05:26 AM, Amos Jeffries wrote:
> Alex Rousskov wrote:
>> On 07/18/2010 04:52 AM, Amos Jeffries wrote:
>>> This updated my previous submission for the comm connection opener
>>> model.
>>>
>>> diff since last time:
>>>
>>> - removed the multi-path pathways. They are now isolated at the upper
>>> level. ConnOpener now handles only one Comm::Connection.
>>>
>>> - implemented ConnOpener as an AsyncJob child.
>>>
>>>
>>> What has changed from HEAD:
>>>
>>> ConnectionDetails objects have been renamed Comm::Connection and been
>>> extended to hold the FD and Squids' socket flags.
>> ...
>>> On COMM_ERR_CONNECT and COMM_TIMEOUT the conn will be !isOpen() or NULL
>>> and should not be relied on.
>>
>> If the connection pointer is NULL, a job opening multiple connections
>> would not be able to tell which connection just failed. Let's not hide
>> connection pointer from the initiator.
>>
>>
>>
>> * Can Adaptation::Icap::Xaction::closeConnection set connection to NULL?
>
> In the reuseConnection == false case, it can after the connection->close().

Why not in both reuse and no-reuse cases?

>> * ConnOpener::start(), doneAll(), and swanSong() should remain protected
>> as they are in the AsyncJob.
>>
>
> Done. Apart from doneAll(). It is required to be accessible to the
> ConnectRetry wrapper until comm write callbacks are turned into methods.

ConnectRetry is broken in several ways: It partially and incorrectly
emulates async call protections, it calls start() directly, and it
results in at least two calls to start() for on job (which is like
having two calls to main() in a program).

Instead of this mess, please have ConnectRetry wrapper to schedule an
async call for ConnOpener::tryConnecting() or a similar job method and
move your existing comm_connect_addr() code there. No deleteThis or
doneAll access would be necessary.

>> * Should ConnOpener::swanSong() call comm_close instead of just close()?
>
> No. Until ConnOpener() has completed successfully there is no other
> valid user of the conn->fd. The only handlers to deal with are
> ConnOpeners' earlyAbort and timeout handlers, which are explicitly
> cancelled and removed.
> comm_close does a lot of additional handler and callback state updating
> over multiple callbacks. This is all unnecessary waste of cycles on a
> closed or maybe half-open FD.

I doubt the "we are the only fd user so we are going to bypass Comm when
we want to" is a good approach. ConnOpener calls many Comm methods.
Those methods may create some state in Comm internal tables. We should
call comm_close if we call comm_open instead of assuming that Comm has
no state associated with our descriptor.

If calling comm_close results in too many useless callbacks in our case,
we should remove the no-longer-needed callbacks _before_ closing the
descriptor.

>> * ConnOpener::callCallback() clears the close handler and timeout
>> depending on solo state, which is kind of wrong, but the best thing to
>> do is to remove that cleanup code from callCallback() completely because
>> the same code is already present in swanSong() and because it is
>> unrelated to calling the callback.
>
> Ideally yes. I'm not certain enough about the state of the Job on the
> second (ConnectRetry) legacy entrance to remove either at this point.

Do not let one bug cause others. I have discussed ConnectRetry fixes
above, but swanSong and callCallback should not depend on what kind of
bugs we have in ConnectRetry or we will be going in circles.

> When the Write handling is able to be a method properly then the
> swanSong() pre-cancellations can be removed.
> For now we have the NULL checks saving cycles where possible without
> adding any risky situations.

I am not against NULL checks before cancellations. Those are normal.

What I am saying is that the cancellations should be removed from
callCallback completely. You already have them in swanSong, where they
belong. There is no complicated voodoo here: callCallback calls the
callback and guarantees that the job will exit; swanSong cleans up on
job exit.

* Do not call start() directly. Only AsyncStart should call it. And it
should not be called more than once per job object lifetime. It is
similar to main() in C/C++.

>> * It may not be a good idea to separate calls.earlyabort and
>> calls.timeout existence from the existence of the corresponding Comm
>> close or timeout handler. Set them when setting the Comm handler. Clear
>> them when clearing the handler.
>
> What do you mean? they _are_ the close and timeout handlers for the
> duration of the ConnOpener activity. Set on comm_openex and removed on
> ConnOpener::callCallback()

Consider the code snippet below:

> + if (calls.earlyabort == NULL) {
> + typedef CommCbMemFunT<ConnOpener, CommConnectCbParams> Dialer;
> + calls.earlyabort = asyncCall(5, 4, "ConnOpener::earlyAbort",
> + Dialer(this, &ConnOpener::earlyAbort));
> + }
> + comm_add_close_handler(solo->fd, calls.earlyabort);

The above code sets calls.earlyabort separately from calling
comm_add_close_handler() which implies that sometimes we may have
calls.earlyabort but no active close handler. To fix this, move the call
to comm_add_close_handler() in side the if-statement.

Same for calls.timeout and commSetTimeout.

>> * FwdState::unregister() used to prevent FwdState from closing the
>> unregistered connection. The new code does not. Was the old code buggy?
>>
>
> Only in Comm::Connection semantics. Added an alternative version and
> deprecated the old unregister(int) with a hack to keep it going on the
> old semantics.

I will try to remember to revisit this when you post the patch with the
changes described above, but I may forget.

>> And why is it called "paths"? Each paths element (i.e., a path) only
>> contains one point (the next hop). A path is a sequence of points.
>> Should we call the "paths" member "hops", "peers", or "servers"?
>
> Sequence of two location data points are stored in each conn:
> (local,remote)
>
> The paths vector<> holds a series of such.
>
> Though I agree paths is not a good name, and I am leaning towards
> remoteDestinations or somesuch now.

"destinations" is fine with me though other choices are shorter. I would
drop the "remote" prefix as it adds no meaning and may be confusing when
a remote address is "localhost".

>> Overall, exposing connection pointer to Servers seems wrong, especially
>> because the pointer may change from one call to another. A Server that
>> needs that pointer, should store it at initialization time, just like it
>> stores fd in the current code.
>>
>> You have "TODO: store Comm::Connection instead of FD". This is an XXX
>> and not just a TODO because of the above concerns.
>>
>
> This auditing task is scoped to do the connection setup and get
> split-stack IPv6 a chance of going operational. Which means as little
> upper layer change in the Servers as we can cope with.

I appreciate the goal of minimizing the upper-level changes, but when
the connection pointer change is outside Server control, it becomes
dangerous to offer the Server access to it. It already caused one bug
and we could easily miss more hidden ones. At the very least, let's add
a "current connection, may change between calls" comment for the conn()
method.

> Is there any reason why the CommCalls have unique params classes and
> dialers for CommTimeoutCb* and CommCloseCb* ? I'd kind of like to roll
> them and CommConnectCb* into passing CommCommonCb* if thats possible,
> and cut ~25% of the CommCalls code on the next cleanup step.

You can merge CommTimeoutCb and CommCloseCb because they have the same
arguments and such. The merge will remove a few lines of simple code,
but you may lose the ability to know which callback is segfaulting based
on call object alone, which is sometimes useful when dealing with core
dumps. I doubt it is worth it, but have no strong objections.

If the cleanup you have mentioned focuses on removing old function-style
callbacks instead, then you would be able to remove a lot of CommCalls
complexity and improve debugging.

Converting old-style function calls to use CommCommonCbParams is fine,
but, again, it would be better to get rid of those calls instead.

> All changed. Running my checks again now before commit and re-submit.

It would be nice if you could run it with valgrind. A few bugs I was
lucky to spot (e.g., double-destruction and peer leak) would have been
identified by valgrind.

Cheers,

Alex.
Received on Mon Jul 19 2010 - 15:41:33 MDT

This archive was generated by hypermail 2.2.0 : Tue Jul 20 2010 - 12:00:09 MDT