Re: sometimes squid proxy enters state where very little work is done

From: Joe Ramey <ramey@dont-contact.us>
Date: Fri, 25 Oct 1996 13:39:56 -0500

   cc: squid-users@nlanr.net
   Date: Wed, 23 Oct 96 21:21:48 -0700
   From: Duane Wessels <wessels@nlanr.net>
   X-Mts: smtp

   ramey@csc.ti.com writes:

>minute. However, every day or two it gets into a state where the
>number of transactions per minute drop to nearly zero, and connection
>attempts to the proxy usually timeout. I'm guessing that this happens
>when we get to the file descriptor limit, though I have no hard proof

   The code is supposed to accomodate file descriptor shortages, but
   I haven't personally stress-tested it.

I've been monitoring this quite a bit lately. It does seem to handle
file descriptor shortage and recover well from it. I do find that we
are running out of file descriptors, and that our rate of successful
transactions per minute decreases when this happens, but I don't know
if we're running out of descriptors simply because of too much load or
because of some problem. However, since I've been watching this
closely, several times I've seen the process run of out descriptors,
slow down in processing of transactions, then somehow recover and
reclaim a large number of idle descriptors. I'm not sure what happens
to make it reclaim all these descriptors, nor do I know what is
happening the other times when I see it slow down continuously until
almost no work is done. I'm continuing to debug this though and hope
to have something useful to report soon.

   In general, I can think of only three reasons why the squid process would
   slow down or pause:

       1) a system call blocks
       2) it enters an "infinite" loop
       3) the OS doesn't/can't run the process enough (e.g. swapping)

I know it's not 1 or 3. The squid process is getting plenty of CPU
time, and the OS isn't swapping abnormally.

It could be some variation of 2. It's not in a tight loop, in that it
does continue to get some work done, it just gets work done at a much
lower rate than we usually see, especially considering that the CPU
utilization doesn't appear to go down. However, it could be that it's
spending much more time doing work that's not directly related to
processing any requests (e.g. looping through the file descriptors,
checking things, but not calling the handlers often enough? This
isn't it, but I mean it could be doing something like this).

   The first should be fairly easy to detect. If the process seems to
   have stopped on a blocked system call, there should be zero CPU usage.
   If you send it a USR2 signal you might expect the process to remain
   blocked and therefore see NO cache.log output.

   The second reason should be noticable by a high CPU usage for the
   process. Sending a USR2 signal may or may not produce additional
   debugging output.

   The third is probably more elusive. If you send a SIGUSR2 signal and
   see:

           96/10/21 22:59:11| comm_select: 0 sockets ready at 845960351
           96/10/21 22:59:12| comm_select: 0 sockets ready at 845960352
           96/10/21 22:59:13| comm_select: 0 sockets ready at 845960353
           96/10/21 22:59:14| comm_select: 0 sockets ready at 845960354
           96/10/21 22:59:15| comm_select: 0 sockets ready at 845960355
           96/10/21 22:59:16| comm_select: 0 sockets ready at 845960356

   Then you know the cache process is idle and running normally.

I tried this and saw that there were plenty of sockets waiting for service:

[24/Oct/1996:12:50:40 -0500] comm.c:707: comm_select: 20 sockets ready at 846179440
[24/Oct/1996:12:50:41 -0500] comm.c:707: comm_select: 24 sockets ready at 846179441
[24/Oct/1996:12:50:42 -0500] comm.c:707: comm_select: 28 sockets ready at 846179442
[24/Oct/1996:12:50:44 -0500] comm.c:707: comm_select: 35 sockets ready at 846179444
[24/Oct/1996:12:50:46 -0500] comm.c:707: comm_select: 38 sockets ready at 846179446
[24/Oct/1996:12:50:47 -0500] comm.c:707: comm_select: 44 sockets ready at 846179447
[24/Oct/1996:12:50:50 -0500] comm.c:707: comm_select: 61 sockets ready at 846179450
[24/Oct/1996:12:50:54 -0500] comm.c:707: comm_select: 62 sockets ready at 846179454
[24/Oct/1996:12:50:57 -0500] comm.c:707: comm_select: 61 sockets ready at 846179457
[24/Oct/1996:12:50:59 -0500] comm.c:707: comm_select: 63 sockets ready at 846179459
[24/Oct/1996:12:51:03 -0500] comm.c:707: comm_select: 68 sockets ready at 846179463

sometimes this number did decrease so there was some work being done,
but just not as much as we might expect.
Received on Fri Oct 25 1996 - 11:41:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:33:22 MST