Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout

From: Rainer Weikusat <rweikusat_at_mobileactivedefense.com>
Date: Thu, 17 Jan 2013 21:53:52 +0000

Alex Rousskov <rousskov_at_measurement-factory.com> writes:
> On 01/17/2013 10:56 AM, Rainer Weikusat wrote:

[...]

>> 2. Check if temporaryFd_ is still valid. At the moment, this is always
>> the case, but it would prudent wrt general 'defensive programming' and
>> the check will become necessary for change 3|.
>
> I do not know what valid is in this context, but temporaryFd_ and
> earlyAbort must reflect whether there is a Comm I/O scheduled (from our
> job point of view).

'Valid' means 'a file descriptor in state "connection in progress"' is
presently open. With the current code, this is always true but it
won't remain true when fixing connect_retries: Then, the ::connect
method (or some code running on behalf of that) would close
temporaryFd_ before scheduling a 'delayed connect retry' which would
open a new socket. A ::timeout running then won't need to free
write_data and won't have to cancel the early_abort call.

>> 2a. If this was the case, check if the write handler of the
>> corresponding fde is still set.
>
> If I/O is scheduled, we should stop it, close the descriptor, and free
> memory using Comm APIs. This condition should probably be detected only
> in swanSong(), to avoid code duplication. The job should exit after a
> timeout, of course.
>
> Accessing Comm write handler pointers in some low-level table should be
> avoided if at all possible. If Comm APIs need to be adjusted to handle
> this better, we can adjust them.

I agree with the second paragraph but not entirely with the first, see
explanation in my earlier e-mail: If the write handler was scheduled
by the time timout runs, this happened because 'something definite' is
known about the state of the connection attempt: Either, it suceeded
or the OS returned an error. Throwing this 'real information' away in
order to save a line of code seems not right to me even if it is a
fringe case of a fringe case.

>
> Amos, do you remember what "partially open FD" in the following comment
> meant, exactly? Why cannot we just call comm_close() to cleanup?
>
>> // it never reached fully open, so cleanup the FD handlers
>> // Note that comm_close() sequence does not happen for partially open FD
>
>
>
> In your latest patch, please move fd cleanup from timeout() and from
> doneConnecting() into a dedicated method and call that method from
> swanSong() if temporaryFd_ is still set there. Do not call it from
> doneConnecting() or anywhere else. Remember that a job may end for
> reasons other than some timeout or I/O. swanSong() is the cleanup method.
>
> Please do not forget to set calls.earlyAbort to NULL after canceling it
> in the fd cleanup method.

What precisely constitutes 'fd_cleanup'? The doneConnecting code will
set calls_.early_abort to NULL and the leading comment states

/**
 * Connection attempt are completed. One way or the other.
 * Pass the results back to the external handler.
 * NP: on errors the earlyAbort call should be cancelled first with a reason.
 */

This means that it won't be possible to cancel the call afterwards
(which may not be necessary) but documenting the 'pre calling procedure'
of some routine as "do such-and-such a thing before calling this" and
then not doing it when calling the routine seems confusing to me.

>> + ptr = static_cast<Pointer *>(F->write_data);
>> + delete ptr;
>
> This is not going to work well. F->write_data now points to deleted
> memory. If you absolutely have to do this, please set it to NULL after
> delete.

Insofar my understanding goes, the

Comm::SetSelect(temporaryFd_, COMM_SELECT_WRITE, NULL, NULL, 0);

in doneConnecting will do that (and - for good measure - the fd_close
called from doneConnecting will clear the write pointers again).
Received on Thu Jan 17 2013 - 21:54:20 MST

This archive was generated by hypermail 2.2.0 : Fri Jan 18 2013 - 12:00:07 MST