Re: ICAP connections under heavy loads

From: Alexander Komyagin <komyagin_at_altell.ru>
Date: Tue, 04 Sep 2012 13:10:04 +0400

On Fri, 2012-08-31 at 11:05 -0600, Alex Rousskov wrote:
> On 08/31/2012 09:01 AM, Alexander Komyagin wrote:
> > Alex, I figured it out, finally! The bug was in comm_connect_addr()
> > function (I suppose it is kernel-dependent though).
> >
> > Consider following call trace:
> > 1) Xaction starts ConnOpener in order to create new connection to ICAP
> > 2) ConnOpener calls comm_connect_addr()
> > 3) comm_connect_addr() initiates connection through connect() and
> > returns COMM_INPROGRESS
> > ...since our ICAP service is too busy connection will eventually timeout
> > (connectTimeout_), so...
> > 4) Comm::ConnOpener::timeout() is called
> > 5) Comm::ConnOpener::timeout calls connect()
> > 6) connect() calls comm_connect_addr()
> > 7) comm_connect_addr() calls getsockopt(SOL_SOCKET, SO_ERROR) to check
> > current connection status
> > 8) getsockopt(..) returns errno 0 <--- BUG
>
> Actually, the bug is in step #5. If we timed out, we should not try to
> connect again.

In steps (5) and (6) it's not the OS connect() call, but
Comm::ConnOpener::connect(), so technically it's OK to call it on
timeout as long as we are sure that it is OK.

I completely agree with you Alex, that logically ConnOpener::timeout()
shall better not call ConnOpener::connect(), though it's just a bad
design and not the actual source of the problem. Problem is that current
comm subsystem design allows upper-level code to successfully perform
comm_connect_addr() calls without waiting for Select() notification.

>
> Step #8 seems correct -- from the OS point of view, internal Squid
> timeout is not an error. If you fix step #5, step #8 will not happen though.
>
>
> > 9) comm_connect_addr() returns COMM_OK so ConnOpener would think that
> > connection was successfully established and pass the connection back to
> > Xaction object, then we would have noteCommRead and noteCommWrote
> > exceptions and so on...
> >
> > According to `man connect`, when connect() returns errno EINPROGRESS:
> > EINPROGRESS
> > The socket is nonblocking and the connection cannot be completed
> > immediately. It is possible to select(2) or poll(2) for
> > completion by selecting the socket for writing. After select(2)
> > indicates writability, use getsockopt(2) to read the SO_ERROR
> > option at level SOL_SOCKET to determine whether connect()
> > completed successfully (SO_ERROR is zero) or unsuccessfully
> > (SO_ERROR is one of the usual error codes listed here,
> > explaining the reason for the failure).
> >
> > Actually ConnOpener uses SetSelect(..) in order to wait before calling
> > comm_connect_addr() (and connect() ) again, but timeout in fact breaks
> > the rule calling getsockopt() right after connect().
>
> The above man page excerpt says to call getsockopt() after successful
> select() notification. I think that is what we do now.

Nope. On timeout it turns out that we're doing getsockopt() without
waiting for select() notification. That's the thing that shall be fixed
(maybe tricky) or correctly handled (as I did).

>
>
> * For the conn-open-timeout patch:
>
> Overall, I think this patch is fighting the side-effects of other bugs.
> Most importantly, the timeout handler should abort the ConnOpener job on
> the spot rather than go through one more select() try. I will comment on
> specific changes below, just in case, but please keep this overall
> problem in mind.
>
> > case COMM_INPROGRESS:
> > // check for timeout FIRST.
> > - if (squid_curtime - connectStart_ > connectTimeout_) {
> > + if (squid_curtime - connectStart_ >= connectTimeout_) {
> > debugs(5, 5, HERE << conn_ << ": * - ERR took too long already.");
>
> Please revert this change. In theory, we can be called back again before
> (squid_curtime - connectStart_) changes. Not a big deal, of course, (the
> equality should be a very rare event) but the old code was more accurate
> from logic/math point of view.
>
> Also, if the debug message were to print by how much we were late,
> printing zero would look strange.
>
> If there is some hidden side-effect or dependency here that I missed,
> please let me know (but note that I think this code must not be called
> from the timeout handler -- as discussed above, the timeout handler
> should abort instead of double checking whether there is a timeout).

I noticed that under lots of requests with *proper* ConnOpener timeout
handling (after my fix) sometimes Squid fails on assertion in fd.cc:109
- closing an FD that is not open. So I assumed there is some
inconsistency with ConnOpener timeouts.

Honestly, I didn't find the exact cause of the fail, but I found that
the underlying code in comm.cc checks the timeout by the strong
condition:

squid_curtime - fd_table[fd].writeStart) < Config.Timeout.write

So the strong condition in ConnOpener could cause that inconsistency.
After changing the condition to be weak (good from logic/math point of
view), assertion fails are gone!

>
>
> > ++failRetries_;
> >
> > // check for timeout FIRST.
> > - if (squid_curtime - connectStart_ > connectTimeout_) {
> > + if (squid_curtime - connectStart_ >= connectTimeout_) {
> > debugs(5, 5, HERE << conn_ << ": * - ERR took too long to receive response.");
> > calls_.earlyAbort_->cancel("Comm::ConnOpener::connect timed out");
> > doneConnecting(COMM_TIMEOUT, errno);
>
> Same here.
>
>
> > x = getsockopt(sock, SOL_SOCKET, SO_ERROR, &err, &errlen);
> >
> > if (x == 0)
> > + {
> > + if (err == 0)
> > + /*
> > + * reality is merely an illusion..
> > + * so we have to make sure that the socket is really connected
> > + */
> > + connect(sock, AI->ai_addr, AI->ai_addrlen);
> > + else
> > errno = err;
> > + }
>
> The witty comment does not explain why we need to call connect() twice
> and why calling connect twice (while ignoring the second call result!)
> will remove the illusion. This needs to be rewritten or removed. And,
> per earlier comments, I suspect you are fixing the wrong bug anyway.

Good question!
First, let me clarify the illusion trick. Here I'm using second
connect() only for double-checking connection status (common practice
due to nonblocking sockets portability issues on some platforms).

Assume ConnOpener had timed out and we've called comm_connect_addr()
again without waiting for select() notification.
1) In *bad* case getsockopt would return 0 and set errno to 0.
2) We detect this and call connect() again. Note that our socket has
O_NONBLOCK set.
3) If connection is not yet established, connect() simply would return
-1 and set errno to EALREADY, which is parsed as COMM_EINPROGRESS error.
Otherwise, connect() would return -1, errno = EISONN (COMM_OK).

In any case (not listed too), second call to connect() (if there were no
errors) shall correctly indicate the socket state (see Unix Network
Programming: The Sockets Networking Api, Volume 1).

Regarding ignoring of the second call result.. Second call to connect()
is performed only when there are no errors on socket, thus we're allowed
to work with it. So, it will either

- return 0 and set errno to 0 (fine, comm_connect_addr returns COMM_OK)

or

- return -1 and set errno to EALREADY (return COMM_INPROGRESS), EISCONN
(return COMM_OK) or some other error that would be correctly parsed
lately.

As you can see, we don't care about the value it returns, but the errno
is the point of interest.

>
> Said that, the code inside _SQUID_NEWSOS6_ guard does call connect()
> twice for that OS. You might want to research whether your environment
> needs a similar hack (but most likely not).

>
>
> * For the xaction-orphans patch:

OK. I'll rewrite it to align with your considerations when I have some
time.

>
> Thank you,
>
> Alex.
>

-- 
Best wishes,
Alexander Komyagin
Received on Tue Sep 04 2012 - 09:14:06 MDT

This archive was generated by hypermail 2.2.0 : Tue Sep 04 2012 - 12:00:07 MDT