Re: Comm closing while connecting

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 20 Sep 2008 15:40:21 -0600

On Sat, 2008-09-20 at 20:46 +1200, Amos Jeffries wrote:
> Alex Rousskov wrote:
> > Hello,
> >
> > An assertion exposed another hole in current comm_close API
> > documentation and implementation. I wrongly assumed that comm_close
> > cannot happen before comm_connect. I think a similar assumption was done
> > when comm_close was originally written -- we do not even call connect
> > handlers from comm_close. We call accept, read, and write handlers with
> > ERR_COMM_CLOSING but not the connect handler (which violates AsyncCall
> > "callbacks must be called" API, BTW).
> >
> > There are at least two cases where a connect may happen during the
> > closing sequence in the current code:
> >
> > 1. A connect call is scheduled. Some external force closes the
> > descriptor. The call is dialed and the job proceeds working with a
> > closing descriptor, hitting an assertion. I have not observed this, but
> > I am sure it is or will be possible (e.g., during shutdown).
> >
> > 2. Connect timeouts. Timeout call is scheduled. Connect succeeds.
> > Connect handler call is scheduled. Timeout handler is dialed. The
> > handler does not destroy the job but calls comm_close and then, say,
> > tries another peer. The connect handler is dialed and the job proceeds
> > working with a closing descriptor, hitting an assertion. I have observed
> > this; see the stack trace quoted below.
> >
> > I see two solutions:
> >
> > Step back: comm_close must call the connect handler with
> > ERR_COMM_CLOSING. All connect handlers need to be modified to handler
> > ERR_COMM_CLOSING. Most will just quit, as usual, but those without a
> > close handler will need to do something special (e.g., peerProbeConnect
> > and FTP data connection). This is a step back because we want to remove
> > ERR_COMM_CLOSING eventually. However, it solves the problem and complies
> > with AsyncCall API.
> >
> > Step forward: Comm must not call the connect handler for closing
> > descriptors. Current code will work except peerProbeConnect and ftp data
> > connection that may need to add close handlers. This solution violates
> > AsyncCall API. This is a step forward because when Comm API is improved,
> > we will probably not call read, write, accept, and connect handlers for
> > closing descriptors.
> >
> > Which solution do you prefer?
>
> I'd prefer adding the close handlers.

That is the "step forward" option, right?

> Can you explain why not calling
> connect handlers when no connect ever takes place is an issue?

There is no issue if there is a close handler. Without a close handler,
the code like peerProbeConnect may wait for Comm response forever
(because that code does not have any other event or callback to wait
for).

> IMO its just pedantics at his point. Sounds like you can quite easily
> have a sub-point to the API that connect handlers are never called on
> closure, maybe expand that in future as things gets cleaner.

API consistency may be pedantic, but I wanted to poll for better options
and wanted to check that I am not going to be pummeled for breaking that
consistency (even if the breakage is in the "right direction" so to
speak).

I have implemented "step forward" option and am testing it.

Thank you,

Alex.

> >
> > (gdb) where
> > #0 0x88330aa7 in kill () from /lib/libc.so.6
> > #1 0x88330a44 in raise () from /lib/libc.so.6
> > #2 0x8832f754 in abort () from /lib/libc.so.6
> > #3 0x08102de4 in xassert (msg=0x823b368 "!fd_table[fd].closing()",
> > file=0x823b1d3 "comm.cc", line=362) at debug.cc:574
> > #4 0x081aad33 in comm_read (fd=31, buf=0x17f80000 "", size=4094,
> > callback=@0xbfbfe950) at comm.cc:362
> > #5 0x0817fa58 in StoreEntry::delayAwareRead (this=0x9ceb3c8, fd=31,
> > buf=0x17f80000 "", len=4095, callback={p_ = 0x18333a00}) at store.cc:261
> > #6 0x08137b7d in HttpStateData::maybeReadVirginBody (this=0x11889f24)
> > at http.cc:1252
> > #7 0x081395aa in HttpStateData::sendRequest (this=0x11889f24) at http.cc:1764
> > #8 0x08139b7e in httpStart (fwd=0x1184f060) at http.cc:1827
> > #9 0x08116adb in FwdState::dispatch (this=0x1184f060) at forward.cc:1003
> > #10 0x08115f2a in FwdState::connectDone (this=0x1184f060, aServerFD=31,
> > status=COMM_OK, xerrno=0) at forward.cc:733
> > #11 0x08115655 in fwdConnectDoneWrapper (server_fd=31, status=COMM_OK,
> > xerrno=0, data=0x1184f060) at forward.cc:388
> > #12 0x081b61f1 in CommConnectCbPtrFun::dial (this=0x17fb659c)
> > at CommCalls.cc:141
> > #13 0x081b567f in CommCbFunPtrCallT<CommConnectCbPtrFun>::fire (
> > this=0x17fb6580) at CommCalls.h:293
> > #14 0x080cd18f in AsyncCall::make (this=0x17fb6580) at AsyncCall.cc:36
> > #15 0x080cc496 in AsyncCallQueue::fireNext (this=0x83c8c90)
> > at AsyncCallQueue.cc:53
> > #16 0x080cc2fa in AsyncCallQueue::fire (this=0x83c8c90) at AsyncCallQueue.cc:39
> > #17 0x0810d82a in EventLoop::dispatchCalls (this=0xbfbfece0)
> > at EventLoop.cc:154
> > #18 0x0810d76a in EventLoop::runOnce (this=0xbfbfece0) at EventLoop.cc:131
> > #19 0x0810d600 in EventLoop::run (this=0xbfbfece0) at EventLoop.cc:95
> > #20 0x0815926f in main (argc=2, argv=0xbfbfed74) at main.cc:1329
> >
> > (gdb) p fdd_table[fd]
> > $8 = {close_file = 0x82219d8 "forward.cc", close_line = 761}
> >
> >
> >
>
>
Received on Sat Sep 20 2008 - 21:40:28 MDT

This archive was generated by hypermail 2.2.0 : Sun Sep 21 2008 - 12:00:06 MDT