Re: [squid-users] Squid losing connectivity for 30 seconds from Elie Merhej on 2011-11-23 (squid-users)

From: Elie Merhej <emerhej_at_wise.net.lb>
Date: Wed, 23 Nov 2011 12:11:06 +0200

>> Hi,
>>
>> I am currently facing a problem that I wasn't able to find a solution
>> for in the mailing list or on the internet,
>> My squid is dying for 30 seconds every one hour at the same exact
>> time, squid process will still be running,
>> I lose my wccp connectivity, the cache peers detect the squid as a
>> dead sibling, and the squid cannot server any requests
>> The network connectivity of the sever is not affected (a ping to the
>> squid's ip doesn't timeout)
>>
>> The problem doesn't start immediately when the squid is installed on
>> the server (The server is dedicated as a squid)
>> It starts when the cache directories starts to fill up,
>> I have started my setup with 10 cache directors, the squid will start
>> having the problem when the cache directories are above 50% filled
>> when i change the number of cache directory (9,8,...) the squid works
>> for a while then the same problem
>> cache_dir aufs /cache1/squid 90000 140 256
>> cache_dir aufs /cache2/squid 90000 140 256
>> cache_dir aufs /cache3/squid 90000 140 256
>> cache_dir aufs /cache4/squid 90000 140 256
>> cache_dir aufs /cache5/squid 90000 140 256
>> cache_dir aufs /cache6/squid 90000 140 256
>> cache_dir aufs /cache7/squid 90000 140 256
>> cache_dir aufs /cache8/squid 90000 140 256
>> cache_dir aufs /cache9/squid 90000 140 256
>> cache_dir aufs /cache10/squid 80000 140 256
>>
>> I have 1 terabyte of storage
>> Finally I created two cache dircetories (One on each HDD) but the
>> problem persisted
>
> You have 2 HDD? but, but, you have 10 cache_dir.
> We repeatedly say "one cache_dir per disk" or similar. In particular
> one cache_dir per physical drive spindle (for "disks" made up of
> multiple physical spindles) wherever possible with physical
> drives/spindles mounting separately to ensure the pairing. Squid
> performs a very unusual pattern of disk I/O which stress them down to
> the hardware controller level and make this kind of detail critical
> for anything like good speed. Avoiding cache_dir object limitations by
> adding more UFS-based dirs to one disk does not improve the situation.
>
> That is a problem which will be affecting your Squid all the time
> though, possibly making the source of the pause worse.
>
> From teh description I believe it is garbage collection on the cache
> directories. The pauses can be visible when garbage collecting any
> caches over a few dozen GB. The squid default "swap_high" and
> "swap_low" values are "5" apart, with at minimum being a value of 0
> apart. These are whole % points of the total cache size, being erased
> from disk in a somewhat random-access style across the cache area. I
> did mention uncommon disk I/O patterns, right?
>
> To be sure what it is, you can use the "strace" tool to the squid
> worker process (the second PID in current stable Squids) and see what
> is running. But given the hourly regularity and past experience with
> others on similar cache sizes, I'm almost certain its the garbage
> collection.
>
> Amos
>

Hi Amos,

Thank you for your fast reply,
I have 2 HDD (450GB and 600GB)
df -h displays that i have 357Gb and 505GB available
In my last test, my cache dir where:
cache_swap_low 90
cache_swap_high 95
maximum_object_size 512 MB
maximum_object_size_in_memory 20 KB
cache_dir aufs /cache1/squid 320000 480 256
cache_dir aufs /cache2/squid 480000 700 256

Is this Ok?

Thank you

Elie Merhej
Received on Wed Nov 23 2011 - 10:11:20 MST

This archive was generated by hypermail 2.2.0 : Wed Nov 23 2011 - 12:00:04 MST