Re: ICAP connections under heavy loads

From: Alexander Komyagin <komyagin_at_altell.ru>
Date: Wed, 29 Aug 2012 14:45:41 +0400

Thanks, Alex. My bad. Let me clarify the issue.

I have Squid 3.2.1 (transparent) + c-icap (e.g. clamav). Currently I'm
testing performance of this setup with httperf tool.

When the number of client http requests is so big that c-icap becomes
overloaded, some connections from Squid to icap are staying in SYN_SENT
tcp state because c-icap listen backlog is full. Corresponding Xaction
objects are not destructed after client request timeout (I use 5 secs
for httperf requests) causing a lot of noteCommRead and noteCommWrote
exceptions at the same time (in 2-4 mins after the test) - when r/w
operations on the socket time out.
As a consequence, Squid leaks FD's (for a while) and meaninglessly
switches icap status in minutes after the test.

I found out that all Xaction objects not being destructed are initiated
by HttpStateData ('avi' adaptation check). Looks like Squid is missing
HttpStateData cleanup when client request timeouts.

On Tue, 2012-08-28 at 09:26 -0600, Alex Rousskov wrote:
> On 08/28/2012 08:56 AM, Alexander Komyagin wrote:
>
> > It seems that I've found the problem related to the case when listen
> > queue of ICAP server is full and it can accept new connections no more.
>
> Hi Alexander,
>
> It may help if you summarize the problem before (or after) giving
> the details. It is clear that the ICAP server is overloaded, but I
> assume you are referring to some Squid-specific problem instead, and it
> is not clear which of the Squid events you describe below are a problem
> you are trying to address.
>
>
> > This way Squid has a bunch (like 800 or so) of connections in SYN_SENT
> > state, and after their timeout we encounter exceptions in noteCommRead
> > (errno 110 and 104 for some) and noteCommWrote (errno 32) in the Xaction
> > object. And then Squids temporarily turns off the icap av.
>
> I assume the above is _not_ the problem you are trying to describe but
> the correct behavior (i.e., the overloaded ICAP server is marked as
> "bad" and is avoided for a while, as expected). Please correct me if I
> am wrong.
>
>
> > I noticed two VERY strange things:
> >
> > 1) in noteCommRead connection->isOpen() returns true, and Xaction thinks
> > that the connection is established, but prior to the exception that
> > connection was marked as SYN_SENT in `netstat -lant` output.
>
> Connection::isOpen() means that the connection has a valid/opened socket
> descriptor and nothing more. If you prefer, it means that the connection
> is "not closed". It is more of a socket-level check rather than a
> TCP-level check.

In Xaction new connections are created with ConnOpener job. ConnOpener
sets connection FD iff he _really_ thinks the connection is now
established (comm_connect_addr returned COMM_OK). But according to
`netstat` that connection was always in SYN_SENT state. Maybe I just
missed the point where it became ESTABLISHED. I will try to check it
later.

>
>
> > 2) client (httperf) has request timeout set to 5 sec, so all Xaction
> > objects should be destroyed by the time I got those exceptions (through
> > noteInitiatorAborted() ), right?
>
> Yes (if your ICAP transaction timeout is much longer than 5 seconds),
> provided the HTTP client closes the connection in a way that Squid can
> notice immediately.
>
>
> > Then why I still do get noteComm* callbacks and it looks like
> > noteInitiatorAborted() was never called for corresponding "nasty"
> > Xaction objects?
> >
> > It worth mentioning that most Xaction objects are "good" and I can see
> > noteInititiatorAborted() being successfully called for them. But those
> > "nasty" objects are really breaking the stuff.
>
> I do not know what you mean by "nasty objects" or "breaking stuff" but
> it is possible that there are Squid bugs in detecting client termination
> conditions, propagating client termination notifications to ICAP, and/or
> processing received notifications, of course.
>
> To triage, trace a single HTTP client transaction and a single
> "nasty"(?) ICAP transaction using debug_options ALL,9 to see why
> noteInitiatorAborted() is not scheduled or is not acted upon. Does the
> HTTP code detect that the client has closed the connection? If yes, does
> it sent an abort notification to ICAP? If yes, does the ICAP code
> correctly reacts to that notification?
>
> Post that trace here if you need help analyzing it (but, as usual, there
> is no guarantee somebody will help, especially if the trace requires a
> lot of work to isolate the interesting bits).
>
>
> Thank you,
>
> Alex.
>
>

-- 
Best wishes,
Alexander Komyagin
Received on Wed Aug 29 2012 - 10:50:13 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 29 2012 - 12:00:19 MDT