Comm closing while connecting

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 19 Sep 2008 10:46:06 -0600

Hello,

    An assertion exposed another hole in current comm_close API
documentation and implementation. I wrongly assumed that comm_close
cannot happen before comm_connect. I think a similar assumption was done
when comm_close was originally written -- we do not even call connect
handlers from comm_close. We call accept, read, and write handlers with
ERR_COMM_CLOSING but not the connect handler (which violates AsyncCall
"callbacks must be called" API, BTW).

There are at least two cases where a connect may happen during the
closing sequence in the current code:

1. A connect call is scheduled. Some external force closes the
descriptor. The call is dialed and the job proceeds working with a
closing descriptor, hitting an assertion. I have not observed this, but
I am sure it is or will be possible (e.g., during shutdown).

2. Connect timeouts. Timeout call is scheduled. Connect succeeds.
Connect handler call is scheduled. Timeout handler is dialed. The
handler does not destroy the job but calls comm_close and then, say,
tries another peer. The connect handler is dialed and the job proceeds
working with a closing descriptor, hitting an assertion. I have observed
this; see the stack trace quoted below.

I see two solutions:

Step back: comm_close must call the connect handler with
ERR_COMM_CLOSING. All connect handlers need to be modified to handler
ERR_COMM_CLOSING. Most will just quit, as usual, but those without a
close handler will need to do something special (e.g., peerProbeConnect
and FTP data connection). This is a step back because we want to remove
ERR_COMM_CLOSING eventually. However, it solves the problem and complies
with AsyncCall API.

Step forward: Comm must not call the connect handler for closing
descriptors. Current code will work except peerProbeConnect and ftp data
connection that may need to add close handlers. This solution violates
AsyncCall API. This is a step forward because when Comm API is improved,
we will probably not call read, write, accept, and connect handlers for
closing descriptors.

Which solution do you prefer?

Thank you,

Alex.

(gdb) where
#0 0x88330aa7 in kill () from /lib/libc.so.6
#1 0x88330a44 in raise () from /lib/libc.so.6
#2 0x8832f754 in abort () from /lib/libc.so.6
#3 0x08102de4 in xassert (msg=0x823b368 "!fd_table[fd].closing()",
    file=0x823b1d3 "comm.cc", line=362) at debug.cc:574
#4 0x081aad33 in comm_read (fd=31, buf=0x17f80000 "", size=4094,
    callback=@0xbfbfe950) at comm.cc:362
#5 0x0817fa58 in StoreEntry::delayAwareRead (this=0x9ceb3c8, fd=31,
    buf=0x17f80000 "", len=4095, callback={p_ = 0x18333a00}) at store.cc:261
#6 0x08137b7d in HttpStateData::maybeReadVirginBody (this=0x11889f24)
    at http.cc:1252
#7 0x081395aa in HttpStateData::sendRequest (this=0x11889f24) at http.cc:1764
#8 0x08139b7e in httpStart (fwd=0x1184f060) at http.cc:1827
#9 0x08116adb in FwdState::dispatch (this=0x1184f060) at forward.cc:1003
#10 0x08115f2a in FwdState::connectDone (this=0x1184f060, aServerFD=31,
    status=COMM_OK, xerrno=0) at forward.cc:733
#11 0x08115655 in fwdConnectDoneWrapper (server_fd=31, status=COMM_OK,
    xerrno=0, data=0x1184f060) at forward.cc:388
#12 0x081b61f1 in CommConnectCbPtrFun::dial (this=0x17fb659c)
    at CommCalls.cc:141
#13 0x081b567f in CommCbFunPtrCallT<CommConnectCbPtrFun>::fire (
    this=0x17fb6580) at CommCalls.h:293
#14 0x080cd18f in AsyncCall::make (this=0x17fb6580) at AsyncCall.cc:36
#15 0x080cc496 in AsyncCallQueue::fireNext (this=0x83c8c90)
    at AsyncCallQueue.cc:53
#16 0x080cc2fa in AsyncCallQueue::fire (this=0x83c8c90) at AsyncCallQueue.cc:39
#17 0x0810d82a in EventLoop::dispatchCalls (this=0xbfbfece0)
    at EventLoop.cc:154
#18 0x0810d76a in EventLoop::runOnce (this=0xbfbfece0) at EventLoop.cc:131
#19 0x0810d600 in EventLoop::run (this=0xbfbfece0) at EventLoop.cc:95
#20 0x0815926f in main (argc=2, argv=0xbfbfed74) at main.cc:1329

(gdb) p fdd_table[fd]
$8 = {close_file = 0x82219d8 "forward.cc", close_line = 761}
Received on Fri Sep 19 2008 - 16:46:13 MDT

This archive was generated by hypermail 2.2.0 : Sat Sep 20 2008 - 12:00:04 MDT