>>>>>>  Hi,
>>>>>>
>>>>>> I am currently facing a problem that I wasn't able to find a 
>>>>>> solution for in the mailing list or on the internet,
>>>>>> My squid is dying for 30 seconds every one hour at the same exact 
>>>>>> time, squid process will still be running,
>>>>>> I lose my wccp connectivity, the cache peers detect the squid as 
>>>>>> a dead sibling, and the squid cannot server any requests
>>>>>> The network connectivity of the sever is not affected (a ping to 
>>>>>> the squid's ip doesn't timeout)
>>>>>>
>>>>>> The problem doesn't start immediately when the squid is installed 
>>>>>> on the server (The server is dedicated as a squid)
>>>>>> It starts when the cache directories starts to fill up,
>>>>>> I have started my setup with 10 cache directors, the squid will 
>>>>>> start having the problem when the cache directories are above 50% 
>>>>>> filled
>>>>>> when i change the number of cache directory (9,8,...) the squid 
>>>>>> works for a while then the same problem
>>>>>> cache_dir aufs /cache1/squid 90000 140 256
>>>>>> cache_dir aufs /cache2/squid 90000 140 256
>>>>>> cache_dir aufs /cache3/squid 90000 140 256
>>>>>> cache_dir aufs /cache4/squid 90000 140 256
>>>>>> cache_dir aufs /cache5/squid 90000 140 256
>>>>>> cache_dir aufs /cache6/squid 90000 140 256
>>>>>> cache_dir aufs /cache7/squid 90000 140 256
>>>>>> cache_dir aufs /cache8/squid 90000 140 256
>>>>>> cache_dir aufs /cache9/squid 90000 140 256
>>>>>> cache_dir aufs /cache10/squid 80000 140 256
>>>>>>
>>>>>> I have 1 terabyte of storage
>>>>>> Finally I created two cache dircetories (One on each HDD) but the 
>>>>>> problem persisted
>>>>>
>>>>> You have 2 HDD?  but, but, you have 10 cache_dir.
>>>>>  We repeatedly say "one cache_dir per disk" or similar. In 
>>>>> particular one cache_dir per physical drive spindle (for "disks" 
>>>>> made up of multiple physical spindles) wherever possible with 
>>>>> physical drives/spindles mounting separately to ensure the 
>>>>> pairing. Squid performs a very unusual pattern of disk I/O which 
>>>>> stress them down to the hardware controller level and make this 
>>>>> kind of detail critical for anything like good speed. Avoiding 
>>>>> cache_dir object limitations by adding more UFS-based dirs to one 
>>>>> disk does not improve the situation.
>>>>>
>>>>> That is a problem which will be affecting your Squid all the time 
>>>>> though, possibly making the source of the pause worse.
>>>>>
>>>>> From teh description I believe it is garbage collection on the 
>>>>> cache directories. The pauses can be visible when garbage 
>>>>> collecting any caches over a few dozen GB. The squid default 
>>>>> "swap_high" and "swap_low" values are "5" apart, with at minimum 
>>>>> being a value of 0 apart. These are whole % points of the total 
>>>>> cache size, being erased from disk in a somewhat random-access 
>>>>> style across the cache area. I did mention uncommon disk I/O 
>>>>> patterns, right?
>>>>>
>>>>> To be sure what it is, you can use the "strace" tool to the squid 
>>>>> worker process (the second PID in current stable Squids) and see 
>>>>> what is running. But given the hourly regularity and past 
>>>>> experience with others on similar cache sizes, I'm almost certain 
>>>>> its the garbage collection.
>>>>>
>>>>> Amos
>>>>>
>>>>
>>>> Hi Amos,
>>>>
>>>> Thank you for your fast reply,
>>>> I have 2 HDD (450GB and 600GB)
>>>> df -h displays that i have 357Gb and 505GB available
>>>> In my last test, my cache dir where:
>>>> cache_swap_low 90
>>>> cache_swap_high 95
>>>
>>> This is not. For anything more than 10-20 GB I recommend setting it 
>>> to no more than 1 apart, possibly the same value if that works.
>>> Squid has a light but CPU-intensive and possibly long garbage 
>>> removal cycle above cache_swap_low, and a much more aggressive but 
>>> faster and less CPU intensive removal above cache_swap_high. On 
>>> large caches it is better in terms of downtime going straight to the 
>>> aggressive removal and clearing disk space fast, despite the 
>>> bandwidth cost replacing any items the light removal would have left.
>>>
>>>
>>> Amos
>>>
>> Hi Amos,
>>
>> I have changed the swap_high  90 and swap_low 90 with two cache dir 
>> (one for each HDD), i still have the same problem,
>> I did an strace (when the problem occured)
>> ------ ----------- ----------- --------- --------- ----------------
>>  23.06    0.004769           0     85681        96 write
>>  21.07    0.004359           0     24658         5 futex
>>  19.34    0.004001         800         5           open
>>   6.54    0.001352           0      5101      5101 connect
>>   6.46    0.001337           3       491           epoll_wait
>>   5.34    0.001104           0     51938      9453 read
>>   3.90    0.000806           0     39727           close
>>   3.54    0.000733           0     86400           epoll_ctl
>>   3.54    0.000732           0     32357           sendto
>>   2.02    0.000417           0     56721           recvmsg
>>   1.84    0.000381           0     24064           socket
>>   0.96    0.000199           0     56264           fcntl
>>   0.77    0.000159           0      6366       329 accept
>>   0.53    0.000109           0     24033           bind
>>   0.52    0.000108           0     30085           getsockname
>>   0.21    0.000044           0     11200           stat
>>   0.21    0.000044           0      6998       359 recvfrom
>>   0.09    0.000019           0      5085           getsockopt
>>   0.06    0.000012           0      2887           lseek
>>   0.00    0.000000           0        98           brk
>>   0.00    0.000000           0        16           dup2
>>   0.00    0.000000           0     10314           setsockopt
>>   0.00    0.000000           0         4           getdents
>>   0.00    0.000000           0         3           getrusage
>> ------ ----------- ----------- --------- --------- ----------------
>> 100.00    0.020685                560496     15343 total
>>
>> this is the strace of squid when it is working normally:
>> ------ ----------- ----------- --------- --------- ----------------
>>  24.88    0.015887           0    455793       169 write
>>  13.72    0.008764           0    112185           epoll_wait
>>  11.67    0.007454           0    256256     27158 read
>>   8.47    0.005408           0    169133           sendto
>>   6.94    0.004430           0    159596           close
>>   6.85    0.004373           0    387359           epoll_ctl
>>   6.42    0.004102           0     19651     19651 connect
>>   5.54    0.003538           0    290289           recvmsg
>>   3.81    0.002431           0    116515           socket
>>   3.53    0.002254           0    164750           futex
>>   1.68    0.001075           0    207688           fcntl
>>   1.53    0.000974           0     95228     23139 recvfrom
>>   1.29    0.000821           0     33408     12259 accept
>>   1.14    0.000726           0     46582           stat
>>   1.11    0.000707           0    110826           bind
>>   0.85    0.000544           0    137574           getsockname
>>   0.32    0.000204           0     21642           getsockopt
>>   0.26    0.000165           0     39502           setsockopt
>>   0.01    0.000007           0      8092           lseek
>>   0.00    0.000000           0       248           open
>>   0.00    0.000000           0         4           brk
>>   0.00    0.000000           0        88           dup2
>>   0.00    0.000000           0        14           getdents
>>   0.00    0.000000           0         6           getrusage
>> ------ ----------- ----------- --------- --------- ----------------
>> 100.00    0.063864               2832429     82376 total
>>
>> Do you have any suggestions to solve the issue, can I run the garbage 
>> collector more frequently, is it better to change the cache_dir type 
>> from aufs to something else?
>> Do you see the problem in the strace?
>>
>> Thank you,
>> Elie
>>
>>
> Hi,
>
> Please note that squid is facing the same problem even when their is 
> no activity or any clients connected to it
>
> Regards
> Elie
>
Hi,
here is the strace result
-----------------------------------------------------------------------------------------------------
connect(7595, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("2.22.50.75")}, 16) = -1 EINPROGRESS (Operation now 
in progress)
epoll_ctl(6, EPOLL_CTL_ADD, 7595, {EPOLLOUT|EPOLLERR|EPOLLHUP, 
{u32=7595, u64=13133095884688465323}}) = 0
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\375\254\21\0002r\316N\0\0\0\0002r\316N\0\0\0\0\262\245\257P\0\0\0\0"..., 
72) = 72
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\306\254\21\0000r\316N\0\0\0\0000r\316N\0\0\0\0\377\377\377\377\377\377\377\377"..., 
72) = 72
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\262\251\21\0\360\376\315N\0\0\0\0002r\316N\0\0\0\0\377\377\377\377\377\377\377\377"..., 
72) = 72
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\2\255\21\0002r\316N\0\0\0\0003r\316N\0\0\0\0\321\376\316N\0\0\0\0"..., 
72) = 72
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\3\255\21\0002r\316N\0\0\0\0003r\316N\0\0\0\0002\347\340N\0\0\0\0"..., 
72) = 72
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\371\254\21\0002r\316N\0\0\0\0002r\316N\0\0\0\0\377\377\377\377\377\377\377\377"..., 
72) = 72
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\n\255\21\0003r\316N\0\0\0\0003r\316N\0\0\0\0\263\245\257P\0\0\0\0"..., 
72) = 72
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\v\255\21\0003r\316N\0\0\0\0003r\316N\0\0\0\0\323\370\317N\0\0\0\0"..., 
72) = 72
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\f\255\21\0003r\316N\0\0\0\0003r\316N\0\0\0\0003\377\365N\0\0\0\0"..., 
72) = 72
futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, 
FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, 
"\2\0\0\0\334\251\21\0@\\\316N\0\0\0\0003r\316N\0\0\0\0\377\377\377\377\377\377\377\377"..., 
72) = 72
read(165, "!", 256)                     = 1
close(379)                              = 0
epoll_ctl(6, EPOLL_CTL_DEL, 9525, {0, {u32=9525, 
u64=25160128873375029}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 9525, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=9525, 
u64=13133094175291483445}}) = 0
epoll_ctl(6, EPOLL_CTL_MOD, 9525, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, 
{u32=9525, u64=23686740342482229}}) = 0
epoll_ctl(6, EPOLL_CTL_MOD, 4554, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, 
{u32=4554, u64=549755818442}}) = 0
close(68)                               = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 68
fcntl(68, F_GETFD)                      = 0
fcntl(68, F_SETFD, FD_CLOEXEC)          = 0
setsockopt(68, SOL_IP, 0x13 /* IP_??? */, [1], 4) = 0
bind(68, {sa_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("95.141.62.17")}, 16) = 0
fcntl(68, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(68, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
setsockopt(68, SOL_TCP, TCP_NODELAY, [1], 4) = 0
getsockname(68, {sa_family=AF_INET, sin_port=htons(60303), 
sin_addr=inet_addr("95.141.62.17")}, [16]) = 0
socket(PF_NETLINK, SOCK_RAW, 0)         = 379
bind(379, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(379, {sa_family=AF_NETLINK, pid=11594, groups=00000000}, 
[12]) = 0
sendto(379, "\24\0\0\0\26\0\1\3t\232\324N\0\0\0\0\0\0\0\0", 20, 0, 
{sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
recvmsg(379, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=00000000}, 
msg_iov(1)=[{"0\0\0\0\24\0\2\0t\232\324NJ-\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 
4096}], msg_controllen=0, msg_flags=0}, 0) = 108
recvmsg(379, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=00000000}, 
msg_iov(1)=[{"@\0\0\0\24\0\2\0t\232\324NJ-\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 
4096}], msg_controllen=0, msg_flags=0}, 0) = 128
recvmsg(379, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=00000000}, 
msg_iov(1)=[{"\24\0\0\0\3\0\2\0t\232\324NJ-\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"..., 
4096}], msg_controllen=0, msg_flags=0}, 0) = 20
close(379)                              = 0
connect(68, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("2.22.50.65")}, 16) = -1 EINPROGRESS (Operation now 
in progress)
epoll_ctl(6, EPOLL_CTL_ADD, 68, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=68, 
u64=13133092173836714052}}) = 0
epoll_ctl(6, EPOLL_CTL_MOD, 3658, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, 
{u32=3658, u64=549755817546}}) = 0
sendto(10, "\33\221\1\0\0\1\0\0\0\0\0\0\nliveupdate\vgocyberl"..., 44, 
0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 44
sendto(10, "\275\251\1\0\0\1\0\0\0\0\0\0\2tc\2v3\6cache3\1c\4pack"..., 
48, 0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 48
sendto(10, "\267\355\1\0\0\1\0\0\0\0\0\0\3crl\6thawte\3com\0\0\34\0\1", 
32, 0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 32
sendto(10, "\312\237\1\0\0\1\0\0\0\0\0\0\2p4\rbhzfrwddqtb66\20cg"..., 
79, 0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 79
sendto(10, "\261\216\1\0\0\1\0\0\0\0\0\0\4apps\10facebook\3com\0\0"..., 
35, 0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 35
sendto(10, "\237\10\1\0\0\1\0\0\0\0\0\0\6sn121w\6snt121\4mail\4"..., 45, 
0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 45
sendto(10, "FR\1\0\0\1\0\0\0\0\0\0\3api\6js-kit\3com\0\0\1\0\1", 32, 0, 
{sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 32
sendto(10, "\207$\1\0\0\1\0\0\0\0\0\0\1l\naddthiscdn\3com\0\0\1"..., 34, 
0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 34
sendto(10, "\265#\1\0\0\1\0\0\0\0\0\0\5usage\7hosting\7toolb"..., 60, 0, 
{sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 60
sendto(10, "hZ\1\0\0\1\0\0\0\0\0\0\3www\ntargetgulf\3com\0"..., 36, 0, 
{sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 36
sendto(10, "\355\345\1\0\0\1\0\0\0\0\0\0\10services\4apps\7condu"..., 
43, 0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 43
sendto(10, "O\254\1\0\0\1\0\0\0\0\0\0\0060-jw-w\7channel\10face"..., 45, 
0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 45
sendto(10, "\330\310\1\0\0\1\0\0\0\0\0\0\2t1\7gstatic\3com\0\0\1\0\1", 
32, 0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 32
sendto(10, "\16Z\1\0\0\1\0\0\0\0\0\0\4mail\3aol\3com\0\0\1\0\1", 30, 0, 
{sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 30
sendto(10, "\327\216\1\0\0\1\0\0\0\0\0\0\3www\10bisara7a\3com\0\0\1"..., 
34, 0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 34
sendto(10, "o\201\1\0\0\1\0\0\0\0\0\0\003317\7channel\10faceboo"..., 42, 
0, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 42
sendto(10, "\344?\1\0\0\1\0\0\0\0\0\0\7email03\fsecureserve"..., 42, 0, 
{sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.249.209.102")}, 16) = 42
sendto(21, 
"\0\0\0\n\2\0\0|\0\0\0\4\0\0\0\0\0\1\0\30\0017\360\6\0\0\0002\0P\0\0"..., 132, 
0, NULL, 0) = 132
sendto(21, 
"\0\0\0\n\2\0\0|\0\0\0\4\0\0\0\0\0\1\0\30\1<\360\6\0\0\0\21\0P\0\0"..., 
132, 0, NULL, 0) = 132
open("/cache2/pool6/squid/13/AB", 
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 379
getdents(379, /* 254 entries */, 32768) = 8112
getdents(379, /* 0 entries */, 32768)   = 0
close(379)                              = 0
read(165, "!", 256)                     = 1
----------------------------------------------------------------------------------------------------------------------------------------------------
Squid is freezing at this point
Please advise
Thank you
Elie
Received on Tue Nov 29 2011 - 08:44:08 MST
This archive was generated by hypermail 2.2.0 : Tue Nov 29 2011 - 12:00:03 MST