Re: [squid-users] Squid loops on epoll/read/gettimeofday

From: Marcus Kool <marcus.kool_at_urlfilterdb.com>
Date: Fri, 02 Apr 2010 15:41:40 -0300

Henrik Nordström wrote:
> mån 2010-03-29 klockan 13:58 -0300 skrev Marcus Kool:
>
>> 0.000033 epoll_wait(6, {{EPOLLIN, {u32=23, u64=8800387989527}}}, 2400, 10) = 1
>> 0.000032 gettimeofday({1269878848, 223083}, NULL) = 0
>> 0.000031 read(27, 0xffd3de98, 256) = -1 EAGAIN (Resource temporarily unavailable)
>
> This is odd.. epoll_wait says fd 23 is ready for reading, but then Squid
> reads on fd 27.

strange indeed, but this is strace output with which I am not very familiar.
Strace should print the whole array that it uses as argument to
epoll_wait or just prints the first element ? (and the 2nd argument
could be FD 27)

> Can you please use cachemgr to view the current filedescriptors page to
> note what fd 23 & fd 27 is being used for?

I restarted Squid and now it does not loop...
The lsof output in the original email is of little help:
squid 13665 squid 20w REG 9,3 1478687 4718702 /local/squid/logs/access.log
squid 13665 squid 21r FIFO 0,6 1723502 pipe
squid 13665 squid 22w REG 9,3 30400812 4718715
/local/squid/cache/swap.state
squid 13665 squid 23r FIFO 0,6 1723506 pipe
squid 13665 squid 24w FIFO 0,6 1723503 pipe
squid 13665 squid 25u IPv4 1723505 TCP *:squid (LISTEN)
squid 13665 squid 26w FIFO 0,6 1723506 pipe
squid 13665 squid 27r FIFO 0,6 1723507 pipe FD 27 *****
squid 13665 squid 28w FIFO 0,6 1723507 pipe
squid 13665 squid 29u IPv4 1771231 TCP srv004:squid->xxx
(ESTABLISHED)

It seems that FD 27 and 28 point to the same object (as FD 23 and FD 26)

>> Note that FD 27 and FD 28 have the same NODE.
>> This pipe is used for what ???
>
> Are you using aufs? If so then it's an internal notification channel to
> wake up the main thread when disk I/O have completed.

Yes I use aufs.

>> The EAGAIN return code to read is strange. It suggest that the
>> read() could return data soon, but Squid is looping now for over 4 hours.
>
> Not strange if fd 27 is the aufs notification pipe.

Well, the strange thing is that it loops and uses 100% CPU.

> The strange thing is why it does not act on fd 23 as returned by
> epoll_wait.

The strace output lists the parameters at the call time and
we do not see the return values (only the function return value).

Is it possible that FD 27 has an EPOLLERR condition instead of a
EPOLLIN condition and the loop is caused by trying to do read() on FD 27 ?
(I do not see any check for a return of EPOLLERR in the sources).

-Marcus

> Regards
> Henrik
>
>
>
Received on Fri Apr 02 2010 - 18:41:48 MDT

This archive was generated by hypermail 2.2.0 : Sat Apr 03 2010 - 12:00:02 MDT