>>>>>  Hi,
>>>>>
>>>>> I am currently facing a problem that I wasn't able to find a 
>>>>> solution for in the mailing list or on the internet,
>>>>> My squid is dying for 30 seconds every one hour at the same exact 
>>>>> time, squid process will still be running,
>>>>> I lose my wccp connectivity, the cache peers detect the squid as a 
>>>>> dead sibling, and the squid cannot server any requests
>>>>> The network connectivity of the sever is not affected (a ping to 
>>>>> the squid's ip doesn't timeout)
>>>>>
>>>>> The problem doesn't start immediately when the squid is installed 
>>>>> on the server (The server is dedicated as a squid)
>>>>> It starts when the cache directories starts to fill up,
>>>>> I have started my setup with 10 cache directors, the squid will 
>>>>> start having the problem when the cache directories are above 50% 
>>>>> filled
>>>>> when i change the number of cache directory (9,8,...) the squid 
>>>>> works for a while then the same problem
>>>>> cache_dir aufs /cache1/squid 90000 140 256
>>>>> cache_dir aufs /cache2/squid 90000 140 256
>>>>> cache_dir aufs /cache3/squid 90000 140 256
>>>>> cache_dir aufs /cache4/squid 90000 140 256
>>>>> cache_dir aufs /cache5/squid 90000 140 256
>>>>> cache_dir aufs /cache6/squid 90000 140 256
>>>>> cache_dir aufs /cache7/squid 90000 140 256
>>>>> cache_dir aufs /cache8/squid 90000 140 256
>>>>> cache_dir aufs /cache9/squid 90000 140 256
>>>>> cache_dir aufs /cache10/squid 80000 140 256
>>>>>
>>>>> I have 1 terabyte of storage
>>>>> Finally I created two cache dircetories (One on each HDD) but the 
>>>>> problem persisted
>>>>
>>>> You have 2 HDD?  but, but, you have 10 cache_dir.
>>>>  We repeatedly say "one cache_dir per disk" or similar. In 
>>>> particular one cache_dir per physical drive spindle (for "disks" 
>>>> made up of multiple physical spindles) wherever possible with 
>>>> physical drives/spindles mounting separately to ensure the pairing. 
>>>> Squid performs a very unusual pattern of disk I/O which stress them 
>>>> down to the hardware controller level and make this kind of detail 
>>>> critical for anything like good speed. Avoiding cache_dir object 
>>>> limitations by adding more UFS-based dirs to one disk does not 
>>>> improve the situation.
>>>>
>>>> That is a problem which will be affecting your Squid all the time 
>>>> though, possibly making the source of the pause worse.
>>>>
>>>> From teh description I believe it is garbage collection on the 
>>>> cache directories. The pauses can be visible when garbage 
>>>> collecting any caches over a few dozen GB. The squid default 
>>>> "swap_high" and "swap_low" values are "5" apart, with at minimum 
>>>> being a value of 0 apart. These are whole % points of the total 
>>>> cache size, being erased from disk in a somewhat random-access 
>>>> style across the cache area. I did mention uncommon disk I/O 
>>>> patterns, right?
>>>>
>>>> To be sure what it is, you can use the "strace" tool to the squid 
>>>> worker process (the second PID in current stable Squids) and see 
>>>> what is running. But given the hourly regularity and past 
>>>> experience with others on similar cache sizes, I'm almost certain 
>>>> its the garbage collection.
>>>>
>>>> Amos
>>>>
>>>
>>> Hi Amos,
>>>
>>> Thank you for your fast reply,
>>> I have 2 HDD (450GB and 600GB)
>>> df -h displays that i have 357Gb and 505GB available
>>> In my last test, my cache dir where:
>>> cache_swap_low 90
>>> cache_swap_high 95
>>
>> This is not. For anything more than 10-20 GB I recommend setting it 
>> to no more than 1 apart, possibly the same value if that works.
>> Squid has a light but CPU-intensive and possibly long garbage removal 
>> cycle above cache_swap_low, and a much more aggressive but faster and 
>> less CPU intensive removal above cache_swap_high. On large caches it 
>> is better in terms of downtime going straight to the aggressive 
>> removal and clearing disk space fast, despite the bandwidth cost 
>> replacing any items the light removal would have left.
>>
>>
>> Amos
>>
> Hi Amos,
>
> I have changed the swap_high  90 and swap_low 90 with two cache dir 
> (one for each HDD), i still have the same problem,
> I did an strace (when the problem occured)
> ------ ----------- ----------- --------- --------- ----------------
>  23.06    0.004769           0     85681        96 write
>  21.07    0.004359           0     24658         5 futex
>  19.34    0.004001         800         5           open
>   6.54    0.001352           0      5101      5101 connect
>   6.46    0.001337           3       491           epoll_wait
>   5.34    0.001104           0     51938      9453 read
>   3.90    0.000806           0     39727           close
>   3.54    0.000733           0     86400           epoll_ctl
>   3.54    0.000732           0     32357           sendto
>   2.02    0.000417           0     56721           recvmsg
>   1.84    0.000381           0     24064           socket
>   0.96    0.000199           0     56264           fcntl
>   0.77    0.000159           0      6366       329 accept
>   0.53    0.000109           0     24033           bind
>   0.52    0.000108           0     30085           getsockname
>   0.21    0.000044           0     11200           stat
>   0.21    0.000044           0      6998       359 recvfrom
>   0.09    0.000019           0      5085           getsockopt
>   0.06    0.000012           0      2887           lseek
>   0.00    0.000000           0        98           brk
>   0.00    0.000000           0        16           dup2
>   0.00    0.000000           0     10314           setsockopt
>   0.00    0.000000           0         4           getdents
>   0.00    0.000000           0         3           getrusage
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.020685                560496     15343 total
>
> this is the strace of squid when it is working normally:
> ------ ----------- ----------- --------- --------- ----------------
>  24.88    0.015887           0    455793       169 write
>  13.72    0.008764           0    112185           epoll_wait
>  11.67    0.007454           0    256256     27158 read
>   8.47    0.005408           0    169133           sendto
>   6.94    0.004430           0    159596           close
>   6.85    0.004373           0    387359           epoll_ctl
>   6.42    0.004102           0     19651     19651 connect
>   5.54    0.003538           0    290289           recvmsg
>   3.81    0.002431           0    116515           socket
>   3.53    0.002254           0    164750           futex
>   1.68    0.001075           0    207688           fcntl
>   1.53    0.000974           0     95228     23139 recvfrom
>   1.29    0.000821           0     33408     12259 accept
>   1.14    0.000726           0     46582           stat
>   1.11    0.000707           0    110826           bind
>   0.85    0.000544           0    137574           getsockname
>   0.32    0.000204           0     21642           getsockopt
>   0.26    0.000165           0     39502           setsockopt
>   0.01    0.000007           0      8092           lseek
>   0.00    0.000000           0       248           open
>   0.00    0.000000           0         4           brk
>   0.00    0.000000           0        88           dup2
>   0.00    0.000000           0        14           getdents
>   0.00    0.000000           0         6           getrusage
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.063864               2832429     82376 total
>
> Do you have any suggestions to solve the issue, can I run the garbage 
> collector more frequently, is it better to change the cache_dir type 
> from aufs to something else?
> Do you see the problem in the strace?
>
> Thank you,
> Elie
>
>
Hi,
Please note that squid is facing the same problem even when their is no 
activity or any clients connected to it
Regards
Elie
Received on Tue Nov 29 2011 - 06:05:04 MST
This archive was generated by hypermail 2.2.0 : Tue Nov 29 2011 - 12:00:03 MST