Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 14 Jan 2013 15:09:33 -0700

On 01/14/2013 02:01 PM, Mike Mitchell wrote:
> Check your memory usage. I'll bet you're leaking a ConnOpener object
> on every timeout.
>
> You'll need something like
>
> // wipe out the InProgressConnectRetry write handler
> if (temporaryFd_ >= 0) {
> fde *F = &fd_table[temporaryFd_];
> if (F->write_handler == Comm::ConnOpener::InProgressConnectRetry && F->write_data != NULL) {
> Pointer *ptr = static_cast<Pointer*>(F->write_data);
> Comm::SetSelect(temporaryFd_, COMM_SELECT_WRITE, NULL, NULL, 0);
> delete ptr;
> }
>
> added to your ConnOpener::timeout() function.

Not exactly. When we timed out, we need to close the temporary FD and
start from scratch. Here is the relevant part of the sketch I referred
to earlier:

>> On 11.09.2012 03:06, Alex Rousskov wrote:
>>>>> Here is the sketch for connect retries on failures, as I see it:
>>>>>
>>>>> default:
>>>>> ++failRetries_; // TODO: rename to failedTries
>>>>>
>>>>> // we failed and have to start from scratch
>>>>> ... close the temporary socket ...
>>>>>
>>>>> // if we exhausted all retries, bail with an error
>>>>> if (failRetries_ > Config.connect_retries) {
>>>>> ... send COMM_ERR_CONNECT to the caller ...
>>>>> return;
>>>>> }
>>>>>
>>>>> // we can still retry after a small pause
>>>>> // we might timeout before that pause ends, but that is OK
>>>>> eventAdd("Comm::ConnOpener::DelayedConnectRetry", ...)
>>>>>
>>>>> More changes would be needed to make this work correctly:
>>>>>
>>>>> * the timeout handler should not be associated with the socket
>>>>> descriptor (since it may change). We can use eventAdd instead.
>>>>>
>>>>> * the Comm::ConnOpener::DelayedConnectRetry() wrapper should call a
>>>>> method that will open a new socket.

Rainer, you are getting close to completing this and closing two or
perhaps three outstanding bugs. Would you mind adjusting your patch to
close the temporary FD and open it again in DelayedConnectRetry, as
discussed above? And disassociate the timeout from the descriptor?

If you need official justification, you may tell your boss that your
patch addressed some of the issues, but more related problems have been
found, and your fix cannot be committed until those problems are
addressed (all true, of course).

If you can finish this, I promise to polish the final patch to make it
suitable for trunk inclusion :-).

Thank you,

Alex.
Received on Mon Jan 14 2013 - 22:09:37 MST

This archive was generated by hypermail 2.2.0 : Tue Jan 15 2013 - 12:00:06 MST