Re: ICAP connections under heavy loads

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 31 Aug 2012 11:05:09 -0600

On 08/31/2012 09:01 AM, Alexander Komyagin wrote:
> Alex, I figured it out, finally! The bug was in comm_connect_addr()
> function (I suppose it is kernel-dependent though).
>
> Consider following call trace:
> 1) Xaction starts ConnOpener in order to create new connection to ICAP
> 2) ConnOpener calls comm_connect_addr()
> 3) comm_connect_addr() initiates connection through connect() and
> returns COMM_INPROGRESS
> ...since our ICAP service is too busy connection will eventually timeout
> (connectTimeout_), so...
> 4) Comm::ConnOpener::timeout() is called
> 5) Comm::ConnOpener::timeout calls connect()
> 6) connect() calls comm_connect_addr()
> 7) comm_connect_addr() calls getsockopt(SOL_SOCKET, SO_ERROR) to check
> current connection status
> 8) getsockopt(..) returns errno 0 <--- BUG

Actually, the bug is in step #5. If we timed out, we should not try to
connect again.

Step #8 seems correct -- from the OS point of view, internal Squid
timeout is not an error. If you fix step #5, step #8 will not happen though.

> 9) comm_connect_addr() returns COMM_OK so ConnOpener would think that
> connection was successfully established and pass the connection back to
> Xaction object, then we would have noteCommRead and noteCommWrote
> exceptions and so on...
>
> According to `man connect`, when connect() returns errno EINPROGRESS:
> EINPROGRESS
> The socket is nonblocking and the connection cannot be completed
> immediately. It is possible to select(2) or poll(2) for
> completion by selecting the socket for writing. After select(2)
> indicates writability, use getsockopt(2) to read the SO_ERROR
> option at level SOL_SOCKET to determine whether connect()
> completed successfully (SO_ERROR is zero) or unsuccessfully
> (SO_ERROR is one of the usual error codes listed here,
> explaining the reason for the failure).
>
> Actually ConnOpener uses SetSelect(..) in order to wait before calling
> comm_connect_addr() (and connect() ) again, but timeout in fact breaks
> the rule calling getsockopt() right after connect().

The above man page excerpt says to call getsockopt() after successful
select() notification. I think that is what we do now.

I do not think we are or should be calling connect() twice -- the OS
knows that we want to connect and does not need reminders. Note the
(!F->flags.called_connect) guard in comm.cc.

If you are looking for connect-related bugs, I recommend fixing step #5
and also examining this hack:

> x = connect(sock, AI->ai_addr, AI->ai_addrlen);
>
> // XXX: ICAP code refuses callbacks during a pending comm_ call
> // Async calls development will fix this.
> if (x == 0) {
> x = -1;
> errno = EINPROGRESS;
> }

I hope the above XXX code is no longer needed (and removing it will save
a lot of CPU cycles for everybody, not just ICAP!), but this needs to be
double checked using bzr logs to restore the exact reason behind that
hack and by examining the current ICAP code.

> In my environment we use uClibc and rsbac kernel, but I suppose that the
> problem may persist in other environment too.

> I've attached two small patches. One - to avoid connection loss in case
> client aborted the request before connection to the ICAP was established
> (remember that "BUG: Orphaned..." stuff I asked you before?).
> Another one is to actually fix the ICAP connection timeout (not only
> ICAP actually) problem. Comments will be appreciated.

* For the conn-open-timeout patch:

Overall, I think this patch is fighting the side-effects of other bugs.
Most importantly, the timeout handler should abort the ConnOpener job on
the spot rather than go through one more select() try. I will comment on
specific changes below, just in case, but please keep this overall
problem in mind.

> case COMM_INPROGRESS:
> // check for timeout FIRST.
> - if (squid_curtime - connectStart_ > connectTimeout_) {
> + if (squid_curtime - connectStart_ >= connectTimeout_) {
> debugs(5, 5, HERE << conn_ << ": * - ERR took too long already.");

Please revert this change. In theory, we can be called back again before
(squid_curtime - connectStart_) changes. Not a big deal, of course, (the
equality should be a very rare event) but the old code was more accurate
from logic/math point of view.

Also, if the debug message were to print by how much we were late,
printing zero would look strange.

If there is some hidden side-effect or dependency here that I missed,
please let me know (but note that I think this code must not be called
from the timeout handler -- as discussed above, the timeout handler
should abort instead of double checking whether there is a timeout).

> ++failRetries_;
>
> // check for timeout FIRST.
> - if (squid_curtime - connectStart_ > connectTimeout_) {
> + if (squid_curtime - connectStart_ >= connectTimeout_) {
> debugs(5, 5, HERE << conn_ << ": * - ERR took too long to receive response.");
> calls_.earlyAbort_->cancel("Comm::ConnOpener::connect timed out");
> doneConnecting(COMM_TIMEOUT, errno);

Same here.

> x = getsockopt(sock, SOL_SOCKET, SO_ERROR, &err, &errlen);
>
> if (x == 0)
> + {
> + if (err == 0)
> + /*
> + * reality is merely an illusion..
> + * so we have to make sure that the socket is really connected
> + */
> + connect(sock, AI->ai_addr, AI->ai_addrlen);
> + else
> errno = err;
> + }

The witty comment does not explain why we need to call connect() twice
and why calling connect twice (while ignoring the second call result!)
will remove the illusion. This needs to be rewritten or removed. And,
per earlier comments, I suspect you are fixing the wrong bug anyway.

Said that, the code inside _SQUID_NEWSOS6_ guard does call connect()
twice for that OS. You might want to research whether your environment
needs a similar hack (but most likely not).

* For the xaction-orphans patch:

First of all, please propose a commit message describing the purpose of
this change. It is difficult to extract that from this email. You can
include the proposed text as a patch preamble. Recall that the first
paragraph should be a one-line summary (for change log and such).

> + Comm::ConnOpener *cs;

Please add a doxygen comment describing the new member.

What does "s" stand for in "cs"? To improve code readability and
searchability, please rename "cs" into "opener" or similar. For example:

    /// a helper job that connects to the ICAP service and
    /// calls our "connector" callback; unused on pconns
    Comm::ConnOpener *opener;

It would be better to declare this just above the "connector"
declaration to keep opener and connector close to each other.

> + void noteAbort() { mustStop("externally aborted"); }

s/noteAbort()/noteRequestorGone()/
s/externally aborted/no longer needed/

This method should probably clear the callback_ member because the
requestor no longer needs to be notified. Clearing here will save a few
CPU cycles later.

Please move this method into .cc file (it cannot be inlined anyway).

Please do not forget to add a doxygen comment describing the purpose of
this new method.

> // our parent Jobs signal abort by cancelling their callbacks.
> if (callback_ == NULL || callback_->canceled())
> return;

The above comment will need to be adjusted. Something like this may work:

    // Some requestors do not use noteRequestorGone() API and
    // signal abort by cancelling their callbacks.

Ideally, you should look through other requestors code and make them
call noteRequestorGone() if appropriate. However, I hesitate insisting
on this wider code adjustment.

> + debugs(93,6, HERE << id << " about to notify ConnOpener!");

This debugs() should not be needed because it adds no additional
information: The job "id" is printed in "entering ..." debug message
leading to the swanSong() call, and a "will call ConnOpener" string (or
similar) will be printed by CallJobHere().

> +#include "comm/ConnOpener.h"

If you include it in Xaction.h, then exclude it from Xaction.cc.
However, it would be much better if you find a way to pre-declare
ConnOpener without including the whole header in Xaction.h.

Thank you,

Alex.
Received on Fri Aug 31 2012 - 17:05:20 MDT

This archive was generated by hypermail 2.2.0 : Sat Sep 01 2012 - 12:00:14 MDT