Re: ICAP connections under heavy loads

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 04 Sep 2012 09:16:03 -0600

On 09/04/2012 03:10 AM, Alexander Komyagin wrote:
> On Fri, 2012-08-31 at 11:05 -0600, Alex Rousskov wrote:
>> On 08/31/2012 09:01 AM, Alexander Komyagin wrote:
>>> Alex, I figured it out, finally! The bug was in comm_connect_addr()
>>> function (I suppose it is kernel-dependent though).
>>>
>>> Consider following call trace:
>>> 1) Xaction starts ConnOpener in order to create new connection to ICAP
>>> 2) ConnOpener calls comm_connect_addr()
>>> 3) comm_connect_addr() initiates connection through connect() and
>>> returns COMM_INPROGRESS
>>> ...since our ICAP service is too busy connection will eventually timeout
>>> (connectTimeout_), so...
>>> 4) Comm::ConnOpener::timeout() is called
>>> 5) Comm::ConnOpener::timeout calls connect()
>>> 6) connect() calls comm_connect_addr()
>>> 7) comm_connect_addr() calls getsockopt(SOL_SOCKET, SO_ERROR) to check
>>> current connection status
>>> 8) getsockopt(..) returns errno 0 <--- BUG
>>
>> Actually, the bug is in step #5. If we timed out, we should not try to
>> connect again.
>
> In steps (5) and (6) it's not the OS connect() call, but
> Comm::ConnOpener::connect(),

Yes, I know.

> so technically it's OK to call it on
> timeout as long as we are sure that it is OK.

Not sure what "technically" quantifier means in this context, but IMO,
it is not correct and is a significant source of bugs and mess in Squid:
The original author knows what the code is supposed to be doing but uses
an unfortunate method name. As the time passes, right code is added to
the wrong method and eventually nobody can tell what the method is
supposed to be doing.

The connect() method should try to connect. We should not try to connect
after a timeout. Thus, we should not call the connect() method after a
timeout.

> I completely agree with you Alex, that logically ConnOpener::timeout()
> shall better not call ConnOpener::connect(), though it's just a bad
> design and not the actual source of the problem. Problem is that current
> comm subsystem design allows upper-level code to successfully perform
> comm_connect_addr() calls without waiting for Select() notification.

I do not want to argue whether "bad design" is "the source of the
problem" in this case. If fixing that "bad design" would fix the
problem, then we should do that rather than trying to recover from an
incorrect Comm calls order sequence. The latter has been proven very
difficult in the past, and I doubt this case is an exception.

>> Step #8 seems correct -- from the OS point of view, internal Squid
>> timeout is not an error. If you fix step #5, step #8 will not happen though.
>>
>>
>>> 9) comm_connect_addr() returns COMM_OK so ConnOpener would think that
>>> connection was successfully established and pass the connection back to
>>> Xaction object, then we would have noteCommRead and noteCommWrote
>>> exceptions and so on...
>>>
>>> According to `man connect`, when connect() returns errno EINPROGRESS:
>>> EINPROGRESS
>>> The socket is nonblocking and the connection cannot be completed
>>> immediately. It is possible to select(2) or poll(2) for
>>> completion by selecting the socket for writing. After select(2)
>>> indicates writability, use getsockopt(2) to read the SO_ERROR
>>> option at level SOL_SOCKET to determine whether connect()
>>> completed successfully (SO_ERROR is zero) or unsuccessfully
>>> (SO_ERROR is one of the usual error codes listed here,
>>> explaining the reason for the failure).
>>>
>>> Actually ConnOpener uses SetSelect(..) in order to wait before calling
>>> comm_connect_addr() (and connect() ) again, but timeout in fact breaks
>>> the rule calling getsockopt() right after connect().
>>
>> The above man page excerpt says to call getsockopt() after successful
>> select() notification. I think that is what we do now.
>
> Nope. On timeout it turns out that we're doing getsockopt() without
> waiting for select() notification. That's the thing that shall be fixed
> (maybe tricky) or correctly handled (as I did).

I misinterpreted your "the rule calling getsockopt() right after
connect()" as your definition of the rule and just re-stated the correct
rule. Sorry.

I do not take from-timeout-handler calls into account here because, as
discussed above, the properly fixed code will not have them. I agree
that the current code may violate the calling sequence during timeouts.

>> * For the conn-open-timeout patch:
>>
>> Overall, I think this patch is fighting the side-effects of other bugs.
>> Most importantly, the timeout handler should abort the ConnOpener job on
>> the spot rather than go through one more select() try. I will comment on
>> specific changes below, just in case, but please keep this overall
>> problem in mind.
>>
>>> case COMM_INPROGRESS:
>>> // check for timeout FIRST.
>>> - if (squid_curtime - connectStart_ > connectTimeout_) {
>>> + if (squid_curtime - connectStart_ >= connectTimeout_) {
>>> debugs(5, 5, HERE << conn_ << ": * - ERR took too long already.");
>>
>> Please revert this change. In theory, we can be called back again before
>> (squid_curtime - connectStart_) changes. Not a big deal, of course, (the
>> equality should be a very rare event) but the old code was more accurate
>> from logic/math point of view.
>>
>> Also, if the debug message were to print by how much we were late,
>> printing zero would look strange.
>>
>> If there is some hidden side-effect or dependency here that I missed,
>> please let me know (but note that I think this code must not be called
>> from the timeout handler -- as discussed above, the timeout handler
>> should abort instead of double checking whether there is a timeout).
>
> I noticed that under lots of requests with *proper* ConnOpener timeout
> handling (after my fix) sometimes Squid fails on assertion in fd.cc:109
> - closing an FD that is not open. So I assumed there is some
> inconsistency with ConnOpener timeouts.
>
> Honestly, I didn't find the exact cause of the fail, but I found that
> the underlying code in comm.cc checks the timeout by the strong
> condition:
>
> squid_curtime - fd_table[fd].writeStart) < Config.Timeout.write
>
> So the strong condition in ConnOpener could cause that inconsistency.
> After changing the condition to be weak (good from logic/math point of
> view), assertion fails are gone!

I would be against fixing a poorly understood bug by introducing
incorrect comparisons. When the timeout handler is fired, we should
proceed as if we timed out. When the timeout handler has not fired, we
should proceed as if we have not timed out. I see no need to double
check in either case (and I see how such additional checks can cause bugs).

>>> x = getsockopt(sock, SOL_SOCKET, SO_ERROR, &err, &errlen);
>>>
>>> if (x == 0)
>>> + {
>>> + if (err == 0)
>>> + /*
>>> + * reality is merely an illusion..
>>> + * so we have to make sure that the socket is really connected
>>> + */
>>> + connect(sock, AI->ai_addr, AI->ai_addrlen);
>>> + else
>>> errno = err;
>>> + }

>> The witty comment does not explain why we need to call connect() twice
>> and why calling connect twice (while ignoring the second call result!)
>> will remove the illusion. This needs to be rewritten or removed. And,
>> per earlier comments, I suspect you are fixing the wrong bug anyway.
>
> Good question!
> First, let me clarify the illusion trick. Here I'm using second
> connect() only for double-checking connection status (common practice
> due to nonblocking sockets portability issues on some platforms).

If calling connect(2) twice is needed for portability issues, let's
document that as such; ideally with references to documentation or
evidence that double calls are required and platform-specific guards.
And we must not ignore the connect(2) result in any case.

> Assume ConnOpener had timed out and we've called comm_connect_addr()
> again without waiting for select() notification.

> 1) In *bad* case getsockopt would return 0 and set errno to 0.

> 2) We detect this and call connect() again. Note that our socket has
> O_NONBLOCK set.

> 3) If connection is not yet established, connect() simply would return
> -1 and set errno to EALREADY, which is parsed as COMM_EINPROGRESS error.
> Otherwise, connect() would return -1, errno = EISONN (COMM_OK).

It could also return zero and not set the errno at all. We should not
analyze errno if connect() returned zero (more on that below).

> In any case (not listed too), second call to connect() (if there were no
> errors) shall correctly indicate the socket state (see Unix Network
> Programming: The Sockets Networking Api, Volume 1).

Which section? In Section 16.4 "Nonblocking connect: Daytime Client", I
see "3. Call connect again" mentioned as one possible solution "posted
to Usenet". It is not the solution the book uses or recommends. In fact,
it is formulated in rather uncertain "should fail" terms and it is
listed last. I suspect it may cause more problems than it will solve. If
we need a solution I would recommend trying solution #1 (call
getpeername) because we probably need the peer name anyway.

More importantly, it is a solution to the problem that Squid does not
have -- Squid does not rely on "socket is ready for both reading and
writing" condition to indicate a special connection state.

> Regarding ignoring of the second call result.. Second call to connect()
> is performed only when there are no errors on socket, thus we're allowed
> to work with it. So, it will either
>
> - return 0 and set errno to 0 (fine, comm_connect_addr returns COMM_OK)

Successful system calls do not set errno to zero. If you want to use
this trick, you have to zero errno yourself, before performing the
system call. comm_connect_addr() does set errno to zero, but it happens
way earlier. If somebody adds a debugs() call in-between, for example,
that zeroing may no longer work. If you insist on this double-connect(2)
trick, please zero errno just before the second connect() call.

Again, I hope that this trick is not needed to solve your problem, and I
am worried that it will cause more/different problems elsewhere. I would
recommend fixing CommOpener instead. If _that_ is not sufficient, we can
discuss appropriate low-level enhancements.

Thank you,

Alex.
Received on Tue Sep 04 2012 - 15:16:20 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 05 2012 - 12:00:09 MDT