Re: [PREVIEW] ConnOpener fixes

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 25 Jan 2013 10:35:28 -0700

On 01/24/2013 09:35 PM, Amos Jeffries wrote:
> On 25/01/2013 3:06 p.m., Alex Rousskov wrote:

> NP: This is way beyond the simple fixes Ranier was working on. The
> changes here relys on code behaviour which will limit the patch to trunk
> or 3.3. I was a bit borderline on the earlier on the size of Raniers
> patches, but this is going over the change amount I'm comfortable
> porting to the stable branch with a beta cycle coming to an end.

I have not looked at v3.2 specifically during this project, so I do not
know which behaviors make my patch incompatible with that version. I
certainly do not insist that you port it to v3.2, but I am happy to help
with that if you decide that it is the best way forward.

> * You are still making comments about what "Comm" should do (XXX: Comm
> should!). ConnOpener *is* "Comm" at this point in the transaction.

Fixed comments by using specific Comm API names.

> * It was probably done beforehand, but it much clearer that it happens
> now that the sleep()/DelayedRetry mechanism leaks Pointer()

Where do you see a leak? Please note that our static DelayedRetry event
callback is always called, regardless of the job state/existence, so it
will always delete the Pointer. I clarified that point by adjusting the
comment in cancelSleep().

> * Looking at your comment in there about comm_close() I see why we are
> at such loggerheads about it.
> +++ comm_close() does *not* replace the write_data cleanup.

Fixed comments that were missing two facts:

1) ConnOpener's bypass of Comm::Write() API makes
COMMIO_FD_WRITECB(fd)->active() false, which means comm_close() will not
call and cleanup the write handler properly. Sorry for missing that
point. We should consider using Comm::Write eventually so that
comm_close can do this cleanup for us instead (and so that we can use
regular non-leaking job callbacks).

2) comm_close() is probably slightly broken itself because its
COMMIO_FD_WRITECB cleanup condition is probably wrong. It should not
matter whether there is an active callback when it comes to the
SetSelect cleanup call. That call should be done unconditionally, and
some SetSelect call implementations may optimize to clean only when
there _was_ a write handler at some point.

We have fixed a related bug #3048 (r10880) by adding those cleanup
calls, but it looks like we should have been even more aggressive and
make cleanup unconditional. This may explain why I see many "FD ready
but there is no handler" debug lines in some cache.logs! If I am
correct, fixing this may even speed things up a little as we would be
polling fewer FDs :-).

> * apart from the timeout unset the relevant parts of the comm_close()
> sequence are all in comm_close_complete().

Not exactly. There is more potentially useful code in current
_comm_close(): source code address setting and the arguably less useful
comm_empty_os_read_buffers() call. And who knows what else will be added
or changed there! We should call comm_close() instead of trying to take
it apart and use what we think is currently relevant. IMO this is basic
API sanity policy: If we call comm_open, we must call comm_close.

> * the naming of "ConnOpener::open()" is ambiguous. Since it is not the
> active open() operation in this Job. It is the getSocket() operation and
> should be named as such.

Fixed by renaming open() to createSocket(). I wanted to emphasize that
this method creates something rather than just extracting something that
was there before.

Adjusted patch attached. Besides method renaming and comment changes
discussed above, this patch

* Calls comm_close() as discussed.

* No longer resets COMM_SELECT_WRITE when keeping FD for the job
initiator, to mimic Comm::DoSelect() optimization. We still clear our
write handler, of course.

Thank you,

Alex.

Received on Fri Jan 25 2013 - 17:35:40 MST

This archive was generated by hypermail 2.2.0 : Sat Jan 26 2013 - 12:00:07 MST