Re: [PATCH] Bug 3281

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 01 Sep 2011 10:55:49 -0600

On 09/01/2011 05:47 AM, Amos Jeffries wrote:
> This patch seeks to avoids race conditions between the AsyncCalls for
> the pconn monitoring handlers and lookups pulling pconn up for use.

...

> It adds checks for the timeout handler, as discussed in bug 3281,
> preventing the related type of race from the timeout handler.

> + // our connection timeout handler is scheduled to run already. unsafe
> + if (fd_table[conn->fd].timeoutHandler == NULL)
> + return false;

I am not strongly against this, but it would be more elegant and may be
more efficient to let the about-to-timeout connections to be reused
instead. This can be safely done, for example, by canceling the pending
timeout notification callback in the connection giving code.

Thank you,

Alex.
P.S. Higher level rant: We have an increasing amount of code devoted to
checking whether some Comm information is already "in flight" and trying
to predict what it will be or trying to react to it before it comes to
us. Most of that code accesses internal Comm structures. I understand
that this code is being created to address specific problems, but the
direction of this worries me.

We need to think of a better (uniform and encapsulated) way to handle
the asynchronous nature of Comm-Job relationship or we will end up with
too many similar-but-different and half-working checks in all Comm-using
Jobs. We have been there before with the old spaghetti code (for
slightly different reasons); let's try not to fall into the same trap.

I do not have a well-formulated suggestion for the above high-level
problem, but I may be able to spend some time on it after Rock Store is
in. Meanwhile, every time you write fd_table[] outside of Comm, please
pause and ask yourself whether there is a better way of doing it :-).
Received on Thu Sep 01 2011 - 16:56:09 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 02 2011 - 12:00:03 MDT