Re: ConnOpener leaks in 3.2.5

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 10 Jan 2013 14:58:08 -0700

On 01/10/2013 02:30 PM, Mike Mitchell wrote:
> I'm able to replicate a leak at will.
> When the client makes a request to a non-responsive server, the ConnOperner
> object leaks with a non-zero lock count.
>
> I've traced the problem to the use of 'new Pointer(this)' in ConnOpener::connect().
> When the connection is in progress, the SetSelect() call establishes a Write handler.
> When the write handler is called, it properly deletes the pointer which properly
> decrements the lock count.
>
> When a timeout occurs, the write handler is never called. Since it is never called,
> the pointer is never deleted.
>
> I've worked around the leak by adding the following code to ConnOpener::connect(),
> just before the comm_connect_addr() switch statement.
>
> // wipe out the InProgressConnectRetry write handler
> if (temporaryFd_ >= 0) {
> fde *F = &fd_table[temporaryFd_];
> if (F->write_handler == Comm::ConnOpener::InProgressConnectRetry && F->write_data != NULL) {
> Pointer *ptr = static_cast<Pointer*>(F->write_data);
> Comm::SetSelect(temporaryFd_, COMM_SELECT_WRITE, NULL, NULL, 0);
> if (ptr->valid()) {
> delete ptr;
> }
> }

IIRC, you need to delete ptr whether it points to something valid or
not. In this context, ptr should always point to a valid object, I
guess, but the "ptr->valid" condition should be removed before it gets
copied to a place where the object may be invalid.

A potential (at least) problem with this approach is that now you have
two places that dereference an unprotected pointer and then delete it.
If both pieces of code are executed, Squid will crash. You may claim
that it never happens, and that might be true today, but the whole
scheme is just a ticking time bomb.

> Can anyone come up with a better solution?

IIRC, this and related problems have been discussed at length on the
"ICAP connections under heavy loads" thread, and a comprehensive
solution has been sketched out there. It needs to be implemented.

Cheers,

Alex.

> From: Mike Mitchell
> Sent: Monday, December 31, 2012 12:59 PM
> To: squid-dev_at_squid-cache.org
> Subject: ConnOpener leaks in 3.2.5
>
> I'm seeing ConnOpener leaks in 3.2.5.
> I compiled with --enable-debug-cbdata and cachemgr reports most of the CommOpener objects as invalid, each with a lock count greater than 0.
> The lines look like:
> Pointer Type Locks Allocated by
> !0x20959018 29 2 comm/ConnOpener.h:74
> !0x1efba998 29 2 comm/ConnOpener.h:74
> !0x21bd4248 29 1 ../../src/comm/ConnOpener.h:74
>
> After 100,000 queries I have 2,000 CommOpener objects allocated with 1,900 in use. Of those, 1850 are invalid.
>
> Could someone that understands cbdata take a look?
> As near as I can tell only Comm::ConnOpener::InProgresConnectRetry() and Comm::ConnOpener::DelayedConnectRetry() duplicate the CommOpener object (by calling ptr->valid()).
> It looks like the original is destructed, but I'm not sure if that is correct.
> cbdataInternalFree() will decrement the lock count, but before it does it first checks and clears the "valid" flag.
> If the "valid" flag is not set, cbdataInternalFree asserts. If it is set, it clears the "valid" flag and then decrements
> the lock count.
>
> I'm very confused about the way cbdata is working. I'm guessing that the "valid" flag is getting cleared before all the locks are removed, which will prevent any more locks from being removed. The twisty maze of function templates is too much for me. Could someone please take a look?
>
> Mike Mitchell
>
>
>
>
>
Received on Thu Jan 10 2013 - 21:58:19 MST

This archive was generated by hypermail 2.2.0 : Fri Jan 11 2013 - 12:00:05 MST