Re: ICAP connections under heavy loads

From: Alexander Komyagin <komyagin_at_altell.ru>
Date: Tue, 28 Aug 2012 18:56:19 +0400

On Mon, 2012-08-27 at 11:57 -0600, Alex Rousskov wrote:
> On 08/27/2012 03:29 AM, Alexander Komyagin wrote:
> > Hi! My setup is Squid 3.2.1 + C-ICAP Antivirus.
> >
> > I noticed that in simulated production setup (e.g. 1200 http requests
> > per sec) there are dozens connections established from squid to c-icap
> > server, and even after the last client request timeouts, these
> > connections are still opened/opening.
> >
> > tcp 0 0 127.0.0.1:1344 127.0.0.1:51352 ESTABLISHED
> > tcp 722 0 127.0.0.1:1344 127.0.0.1:51500 ESTABLISHED
> > tcp 0 0 127.0.0.1:51374 127.0.0.1:1344 ESTABLISHED
> > ...
>
> > Is this the intended behavior?
>
> ICAP connections are maintained separately from HTTP client connections
> and may be persistent. If you have N or more concurrent HTTP requests
> that need adaptation, it is likely that you will have N or more
> concurrent ICAP connections. Some or all of those connections will
> outlast the last HTTP client.
>
> The same is true for the HTTP server connections.
>
> In summary, there are at least three mostly independent pools of
> possibly persistent connections: HTTP client-side, HTTP server-side, and
> ICAP. There are squid.conf options to control how long an idle
> persistent connection is being kept in the pool.

OK, that's good.

It seems that I've found the problem related to the case when listen
queue of ICAP server is full and it can accept new connections no more.

This way Squid has a bunch (like 800 or so) of connections in SYN_SENT
state, and after their timeout we encounter exceptions in noteCommRead
(errno 110 and 104 for some) and noteCommWrote (errno 32) in the Xaction
object. And then Squids temporarily turns off the icap av. I noticed two
VERY strange things:

1) in noteCommRead connection->isOpen() returns true, and Xaction thinks
that the connection is established, but prior to the exception that
connection was marked as SYN_SENT in `netstat -lant` output.

2) client (httperf) has request timeout set to 5 sec, so all Xaction
objects should be destroyed by the time I got those exceptions (through
noteInitiatorAborted() ), right?
Then why I still do get noteComm* callbacks and it looks like
noteInitiatorAborted() was never called for corresponding "nasty"
Xaction objects?

It worth mentioning that most Xaction objects are "good" and I can see
noteInititiatorAborted() being successfully called for them. But those
"nasty" objects are really breaking the stuff.

>
>
> > Also there are a lot of "BUG: Orphaned connection" messages in my
> > cache.log under those conditions. I found out that they are caused by
> > Xaction destruction after client request timeout, getting
> > noteInitiatorAborted() before connection is established in the
> > corresponding ConnOpener job. I temporarily fixed that by notifying
> > ConnOpener (if any) in the Xaction::swanSong().
>
> Please consider opening a bug report and/or submitting a patch fixing
> the problem.
>
> IIRC, Amos has mentioned that he left the BUG message their to collect
> all buggy cases that he could not identify through code analysis. I
> recall seeing these BUG messages as well, but I do not remember whether
> they were caused by the same sequence of events.
>
>
> Thank you,
>
> Alex.
>

-- 
Best wishes,
Alexander Komyagin
Received on Tue Aug 28 2012 - 15:00:41 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 28 2012 - 12:00:39 MDT