RE: ConnOpener leaks in 3.2.5

From: Mike Mitchell <Mike.Mitchell_at_sas.com>
Date: Thu, 10 Jan 2013 23:18:20 +0000

You are right, thank you. I've installed the following patch from Amos Jeffries' squid-dev posting dated September 9th at 8:30:03 === modified file 'src/comm/ConnOpener.cc' --- src/comm/ConnOpener.cc 2012-08-28 19:12:13 +0000 +++ src/comm/ConnOpener.cc 2012-09-09 08:20:09 +0000 @@ -139,6 +139,10 @@ // it never reached fully open, so cleanup the FD handlers // Note that comm_close() sequence does not happen for partially open FD Comm::SetSelect(temporaryFd_, COMM_SELECT_WRITE, NULL, NULL, 0); + if (calls_.earlyAbort_ != NULL) { + calls_.earlyAbort_->cancel("Comm::ConnOpener::doneConnecting"); + calls_.earlyAbort_ = NULL; + } calls_.earlyAbort_ = NULL; if (calls_.timeout_ != NULL) { calls_.timeout_->cancel("Comm::ConnOpener::doneConnecting"); @@ -309,14 +313,14 @@ doneConnecting(COMM_ERR_CLOSING, io.xerrno); // NP: is closing or shutdown better? } -/** +/** Timeout during connection attempt. * Handles the case(s) when a partially setup connection gets timed out. - * NP: When commSetConnTimeout accepts generic CommCommonCbParams this can die. */ void Comm::ConnOpener::timeout(const CommTimeoutCbParams &) { - connect(); + debugs(5, 5, HERE << conn_ << ": * - ERR took too long to receive response."); + doneConnecting(COMM_TIMEOUT, errno); } /* Legacy Wrapper for the retry event after COMM_INPROGRESS I then moved the code from my recent posting to ConnOpener::timeout(), just before the doneConnecting() call. I removed the 'if (ptr->valid())' clause and now always delete the pointer. One day ConnOpener will be re-written and these changes won't be needed. Mike Mitchell ________________________________________ From: Alex Rousskov [rousskov_at_measurement-factory.com] Sent: Thursday, January 10, 2013 4:58 PM To: Mike Mitchell Cc: squid-dev_at_squid-cache.org Subject: Re: ConnOpener leaks in 3.2.5 On 01/10/2013 02:30 PM, Mike Mitchell wrote: > I'm able to replicate a leak at will. > When the client makes a request to a non-responsive server, the ConnOperner > object leaks with a non-zero lock count. > > I've traced the problem to the use of 'new Pointer(this)' in ConnOpener::connect(). > When the connection is in progress, the SetSelect() call establishes a Write handler. > When the write handler is called, it properly deletes the pointer which properly > decrements the lock count. > > When a timeout occurs, the write handler is never called. Since it is never called, > the pointer is never deleted. > > I've worked around the leak by adding the following code to ConnOpener::connect(), > just before the comm_connect_addr() switch statement. > > // wipe out the InProgressConnectRetry write handler > if (temporaryFd_ >= 0) { > fde *F = &fd_table[temporaryFd_]; > if (F->write_handler == Comm::ConnOpener::InProgressConnectRetry && F->write_data != NULL) { > Pointer *ptr = static_cast<Pointer*>(F->write_data); > Comm::SetSelect(temporaryFd_, COMM_SELECT_WRITE, NULL, NULL, 0); > if (ptr->valid()) { > delete ptr; > } > } IIRC, you need to delete ptr whether it points to something valid or not. In this context, ptr should always point to a valid object, I guess, but the "ptr->valid" condition should be removed before it gets copied to a place where the object may be invalid. A potential (at least) problem with this approach is that now you have two places that dereference an unprotected pointer and then delete it. If both pieces of code are executed, Squid will crash. You may claim that it never happens, and that might be true today, but the whole scheme is just a ticking time bomb. > Can anyone come up with a better solution? IIRC, this and related problems have been discussed at length on the "ICAP connections under heavy loads" thread, and a comprehensive solution has been sketched out there. It needs to be implemented. Cheers, Alex. > From: Mike Mitchell > Sent: Monday, December 31, 2012 12:59 PM > To: squid-dev_at_squid-cache.org > Subject: ConnOpener leaks in 3.2.5 > > I'm seeing ConnOpener leaks in 3.2.5. > I compiled with --enable-debug-cbdata and cachemgr reports most of the CommOpener objects as invalid, each with a lock count greater than 0. > The lines look like: > Pointer Type Locks Allocated by > !0x20959018 29 2 comm/ConnOpener.h:74 > !0x1efba998 29 2 comm/ConnOpener.h:74 > !0x21bd4248 29 1 ../../src/comm/ConnOpener.h:74 > > After 100,000 queries I have 2,000 CommOpener objects allocated with 1,900 in use. Of those, 1850 are invalid. > > Could someone that understands cbdata take a look? > As near as I can tell only Comm::ConnOpener::InProgresConnectRetry() and Comm::ConnOpener::DelayedConnectRetry() duplicate the CommOpener object (by calling ptr->valid()). > It looks like the original is destructed, but I'm not sure if that is correct. > cbdataInternalFree() will decrement the lock count, but before it does it first checks and clears the "valid" flag. > If the "valid" flag is not set, cbdataInternalFree asserts. If it is set, it clears the "valid" flag and then decrements > the lock count. > > I'm very confused about the way cbdata is working. I'm guessing that the "valid" flag is getting cleared before all the locks are removed, which will prevent any more locks from being removed. The twisty maze of function templates is too much for me. Could someone please take a look? > > Mike Mitchell > > > > >
Received on Thu Jan 10 2013 - 23:19:26 MST

This archive was generated by hypermail 2.2.0 : Fri Jan 11 2013 - 12:00:05 MST