Re: [PREVIEW] ConnOpener fixes

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 26 Jan 2013 17:42:34 +1300

On 26/01/2013 6:35 a.m., Alex Rousskov wrote:
> On 01/24/2013 09:35 PM, Amos Jeffries wrote:
>> On 25/01/2013 3:06 p.m., Alex Rousskov wrote:
>> NP: This is way beyond the simple fixes Ranier was working on. The
>> changes here relys on code behaviour which will limit the patch to trunk
>> or 3.3. I was a bit borderline on the earlier on the size of Raniers
>> patches, but this is going over the change amount I'm comfortable
>> porting to the stable branch with a beta cycle coming to an end.
> I have not looked at v3.2 specifically during this project, so I do not
> know which behaviors make my patch incompatible with that version. I
> certainly do not insist that you port it to v3.2, but I am happy to help
> with that if you decide that it is the best way forward.
>
>
>> * You are still making comments about what "Comm" should do (XXX: Comm
>> should!). ConnOpener *is* "Comm" at this point in the transaction.
> Fixed comments by using specific Comm API names.
>
>
>> * It was probably done beforehand, but it much clearer that it happens
>> now that the sleep()/DelayedRetry mechanism leaks Pointer()
> Where do you see a leak? Please note that our static DelayedRetry event
> callback is always called, regardless of the job state/existence, so it
> will always delete the Pointer. I clarified that point by adjusting the
> comment in cancelSleep().

You commented earlier that the events were scheduled as AsyncCall queue.
You commented earlier that AsyncCall queue would not run a Call if the
Job were gone.

I connected those two statements to imply the DelayRetry event would not
happen. Same as is happening for InProgress.

Uhm, were you meaning these events are *always* run and the scheduling
happens in our wrapper which deletes the Pointer() ?
  If so there is no leak. Otherwise there is.

>
>> * Looking at your comment in there about comm_close() I see why we are
>> at such loggerheads about it.
>> +++ comm_close() does *not* replace the write_data cleanup.
> Fixed comments that were missing two facts:
>
> 1) ConnOpener's bypass of Comm::Write() API makes
> COMMIO_FD_WRITECB(fd)->active() false, which means comm_close() will not
> call and cleanup the write handler properly. Sorry for missing that
> point. We should consider using Comm::Write eventually so that
> comm_close can do this cleanup for us instead (and so that we can use
> regular non-leaking job callbacks).

That does ring a memory bell. Sorry I can't recall specifics right now
though.
Good catch.
On the whole I am kind of against calling comm_close() for the sheer
number of conditionals it requires to effectively do a NOP operation. It
is nicer from the code readability and completeness viewpoint, but not
performance. Even a few dozen micro-seconds in the ConnOpener and
related stages can translate to a significatn proportion of a MEM_HIT
transaction time.

> 2) comm_close() is probably slightly broken itself because its
> COMMIO_FD_WRITECB cleanup condition is probably wrong. It should not
> matter whether there is an active callback when it comes to the
> SetSelect cleanup call. That call should be done unconditionally, and
> some SetSelect call implementations may optimize to clean only when
> there _was_ a write handler at some point.
>
> We have fixed a related bug #3048 (r10880) by adding those cleanup
> calls, but it looks like we should have been even more aggressive and
> make cleanup unconditional. This may explain why I see many "FD ready
> but there is no handler" debug lines in some cache.logs! If I am
> correct, fixing this may even speed things up a little as we would be
> polling fewer FDs :-).

Happy to review the patch when you have it ready. :-)

>
>> * apart from the timeout unset the relevant parts of the comm_close()
>> sequence are all in comm_close_complete().
> Not exactly. There is more potentially useful code in current
> _comm_close(): source code address setting and the arguably less useful
> comm_empty_os_read_buffers() call. And who knows what else will be added
> or changed there! We should call comm_close() instead of trying to take
> it apart and use what we think is currently relevant. IMO this is basic
> API sanity policy: If we call comm_open, we must call comm_close.

Okay. However I am trying to work towards an overall goal of reducing
the things comm_close() does, by moving the cleanup of these things into
the Jobs which are setting them.

>
>> * the naming of "ConnOpener::open()" is ambiguous. Since it is not the
>> active open() operation in this Job. It is the getSocket() operation and
>> should be named as such.
> Fixed by renaming open() to createSocket(). I wanted to emphasize that
> this method creates something rather than just extracting something that
> was there before.

Fair enough. I'm fine with that.

> Adjusted patch attached. Besides method renaming and comment changes
> discussed above, this patch
>
> * Calls comm_close() as discussed.
>
> * No longer resets COMM_SELECT_WRITE when keeping FD for the job
> initiator, to mimic Comm::DoSelect() optimization. We still clear our
> write handler, of course.

You still only delete and unset write_data, without clearing the rest of
the write state at the same time via Comm::SetSelect().
  * "XXX: We are about to remove write_handler, " is now completely
false. You are about to clear write_data *only*. There are several code
paths leading to cleanFd(). One of which is called keepFd() and does not
result in that extra cleanup you are relying on.

What happens if the write handler is called before being replaced? ...
assert(ptr) crashes Squid.

What happens if I/O loops detect a write_handler set? ... they schedule
write(). then what?

  Please do us all a favour and call Comm::SetSelect to clear the write
handler state *fully* at the point the hack takes place without relying
on half-hidden code elsewhere doing it now and forever. This type of
optimization is how many Squid bugs are born.

Amos
Received on Sat Jan 26 2013 - 04:42:48 MST

This archive was generated by hypermail 2.2.0 : Sat Jan 26 2013 - 12:00:07 MST