Re: [PATCH] ConnOpener fixes

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 31 Jan 2013 16:18:28 +1300

On 29/01/2013 6:32 a.m., Alex Rousskov wrote:
> Hello,
>
> The attached patch fixes several ConnOpener problems by relying on
> AsyncJob protections while maintaining a tighter grip on various I/O and
> sleep states. This patch is identical to the last ConnOpener patch
> posted in PREVIEW state. I am now done with valgrind testing. No leaks
> were detected, although I doubt my tests were comprehensive enough to
> trigger all possible conditions where the unpatched code leaks.
>
> I started with Rainer Weikusat's timeout polishing patch posted a few
> days ago, but all bugs are mine.
>
>
> Here are some of the addressed problems:
>
> * Connection descriptor was not closed when attempting to reconnect
> after failures. We now properly close on failures, sleep with descriptor
> closed, and then reopen.
>
> * Timeout handler was not cleaned up properly in some cases, causing
> memory leaks (for the handler Pointer) and possibly timeouts that were
> fired (for then-active handler) after the connection was passed to the
> initiator.
>
> * Comm close handler was not cleaned up properly.
>
> * statCounter.syscalls.sock.closes counter was not updated on FD closure.
>
> * Waiting pending accepts were not kicked on FD closure.
>
> * Connection timeout was enforced for each connection attempt instead of
> applying to all attempts taken together.
>
> and possibly other problems. The full extent of all side-effects of
> mishandled race conditions and state conflicts is probably unknown.
>
>
> TODO (outside this project scope):
> Polish comm_close() to always reset "Select" state.
> Make connect_timeout documentation in squid.conf less ambiguous.
> Move prevalent conn_ debugging to the status() method?
> Polish Comm timeout handling to always reset .timeout on callback?
> Revise eventDelete() to delete between-I/O sleep timeout?
>
>
> HTH,
>
> Alex.

One performane optimization I can see:

  * if the callback_ is != NULL but has been cancelled by the receiving
Job. Your code now calls sendAnswer() and schedules the already
cancelled Call.
   Previously this was tested for in swanSong() to simply NULL the
callback_ pointer instead of scheduling a dead Call.
  NP: it would probably be a good idea to test for the
callback_->cancelled() case and debugs() report it instead of scheduling.

The result of omitting this check saves one if() test per connection by
adding the full overheads of memory, scheduling, and CPU for one
AsyncCall queue operation per connection. ConnOpener operates on the
server-side where multiple client-side Jobs may result in TIMEOUT or
ABORTED. This cancellation case is a small but common case.

+1. Other than the above performance tweak this looks fine for commit.

Amos
Received on Thu Jan 31 2013 - 03:18:35 MST

This archive was generated by hypermail 2.2.0 : Fri Feb 01 2013 - 12:00:14 MST