Re: [PREVIEW] ConnOpener fixes

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 26 Jan 2013 00:32:23 -0700

On 01/25/2013 09:42 PM, Amos Jeffries wrote:
> On 26/01/2013 6:35 a.m., Alex Rousskov wrote:
>> On 01/24/2013 09:35 PM, Amos Jeffries wrote:

>>> * It was probably done beforehand, but it much clearer that it happens
>>> now that the sleep()/DelayedRetry mechanism leaks Pointer()

>> Where do you see a leak? Please note that our static DelayedRetry event
>> callback is always called, regardless of the job state/existence, so it
>> will always delete the Pointer. I clarified that point by adjusting the
>> comment in cancelSleep().

> You commented earlier that the events were scheduled as AsyncCall queue.

Yes, they are (after spending time in the event queue).

> You commented earlier that AsyncCall queue would not run a Call if the
> Job were gone.

Yes, if it is a job call. (The protection is offered by the call dialer,
not the caller or the callee. This is why we can have one async queue
for many different types of async calls.)

Calls created and placed in the async queue by the event queue are not
job calls. The current event queue code converts a C-style event
callback into a basic async call. It does not have enough information to
create a job call or to realize that the event is storing such a job
callback already.

> I connected those two statements to imply the DelayRetry event would not
> happen. Same as is happening for InProgress.

Yes, I see where the confusion was coming from. Sorry for not being
clear. This is especially messy because DelayedRetry/event queue and
SetSelect/InProgress are using different callback mechanisms (SetSelect
vs event queue).

> Uhm, were you meaning these events are *always* run and the scheduling
> happens in our wrapper which deletes the Pointer() ?

Yes, ConnOpener event queue events always run (there is optional cbdata
layer that offers call protection for some events but we are not using
that). The async calls created from those events always fire. At that
point, our handler wrapper is called. It deletes the Pointer and
schedules an async job call to our job. Only that final call will not
fire if the job is gone.

In summary, there are two async calls per event here. The first one is a
dumb callback that will always fire. That one deletes Pointer. The
second one is a job callback and may not fire, but that is OK.

Needless to say, all these layers will disappear when the event queue is
converted to AsyncCall API, just like upper Comm levels were. This
layering is very annoying IMO. Fortunately, it does not represent a
serious performance problem because most performance-sensitive paths do
not use events these days IIRC.

> If so there is no leak. Otherwise there is.

No leak AFAICT. I still need to run valgrind and other leak tests though.

> On the whole I am kind of against calling comm_close() for the sheer
> number of conditionals it requires to effectively do a NOP operation. It
> is nicer from the code readability and completeness viewpoint, but not
> performance. Even a few dozen micro-seconds in the ConnOpener and
> related stages can translate to a significatn proportion of a MEM_HIT
> transaction time.

Fortunately, comm_close() in ConnOpener is not called for 99.99% of
[memory] hits in a typical environment. In fact, it is probably not
called for 99% of requests in most environments. It is only called when
we needed to connect to the origin server _and_ we failed to do so.

Still, we can optimize it to be more efficient. Most conditionals are
probably too fast to be a serious problem, but we can get rid of the
final "really close" async call when it is not needed (because no async
callbacks were called). That would be a nice small optimization project
if somebody wants to volunteer!

Another performance optimization target could be that FD buffer cleaning
loop. Perhaps there are cases where we do not need to run it at all. I
have not researched it though. Another project!

> Okay. However I am trying to work towards an overall goal of reducing
> the things comm_close() does, by moving the cleanup of these things into
> the Jobs which are setting them.

If this is done as an optimization, with comm_close still guaranteeing
proper and full Comm state cleanup, then it may help (see above for two
related optimization ideas). Otherwise, it sounds too risky IMO.

>> Adjusted patch attached. Besides method renaming and comment changes
>> discussed above, this patch
>>
>> * Calls comm_close() as discussed.
>>
>> * No longer resets COMM_SELECT_WRITE when keeping FD for the job
>> initiator, to mimic Comm::DoSelect() optimization. We still clear our
>> write handler, of course.
>
> You still only delete and unset write_data, without clearing the rest of
> the write state at the same time via Comm::SetSelect().

[ See the next bullet first! The bug fixed there may have caused a
confusion here. ]

Yes, but only when keeping the FD. And that is exactly what
Comm::DoSelect() does when calling write callbacks -- it keeps the state
because there will probably be other write calls for the same FD. In our
case, the write call will come from the ConnOpener initiator...

I added a Comm::SetSelect(NULL) call for kept FD back, but it seems like
preserving that optimization is a better choice here.

For example, the epoll's Comm::SetSelect would keep the FD in the FD
set. If we call SetSelect(NULL), epoll deletes that FD from the set and
then will have to add it right back in when our initiator calls
SetSelect() with the same FD.

> * "XXX: We are about to remove write_handler, " is now completely
> false. You are about to clear write_data *only*.

My bad! I think I accidentally deleted the write_handler resetting line.
It must be there, of course. Now added back in.

> Please do us all a favour and call Comm::SetSelect to clear the write
> handler state *fully* at the point the hack takes place without relying
> on half-hidden code elsewhere doing it now and forever. This type of
> optimization is how many Squid bugs are born.

Done. Please note that the same optimization is still performed every
time we do a regular Comm write elsewhere in Squid. If the above bug fix
and explanation convinced you that the optimization should be enabled, I
will restore it (i.e., remove Comm::SetSelect(NULL) after we clear the
write handler).

Adjusted patch attached. No additional changes other than adding
write_handler reset and disabling the "do not call SetSelect(NULL) for
kept FDs" optimization. The changes are in cleanFd() and closeFd().

Thank you,

Alex.

Received on Sat Jan 26 2013 - 07:32:36 MST

This archive was generated by hypermail 2.2.0 : Sat Jan 26 2013 - 12:00:07 MST