Re: [squid-users] Huge Cache / Memory Usage

From: Sunny <sunyucong_at_gmail.com>
Date: Wed, 15 Dec 2010 22:19:05 -0800

Thanks amos,

I agree with all your points above. Here's something more that I
couldn't find answer anywhere.

1. in cache_dir Mbs L1 L2 Does L1 L2 still mattes anymore? 16 256 is
enough or do I have to do something like 64 768 ?

2. max/min object size? do you have recommendation of these values for
large cache targeted to reduce internet latency and save bandwidth?

3. more like a suggestion, can we make SQUID not accept HTTP until
store is loaded and ready to serve? Currently it will just hang there
resulting client hang too when they are used as parent caches.

Cheers.

On Wed, Dec 15, 2010 at 5:52 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> On 16/12/10 08:17, Sunny wrote:
>>
>> Hi there,
>>
>> I am working on building a cache with squid 3.1.9.  I've got two
>> machine with 4G ram and two 500G disk each. I want to make cache as
>> large as possible to maximize utilization of my two big disk.
>>
>> However, I soon found out I am being extremely limited by memory. lots
>> of swapping starts to happen when my cache exceed 9M objects. Also
>
> Rule #1 with Squid is: don't let it swap.
> In my experience the bits that get swapped out tend to be the long term
> index entries. Which are searched on every request so swapping X MB in and
> out again on every request causes a major performance penalty.
>
>> everytime I want to restart cache, it would spend a hour just to
>> rescan all the entities into memory. and it just keep taking longer.
>
> This is a sign that your index is not being saved to disk cleanly. Check the
> startup process for messages indicating a "(DIRTY)" cache startup. If
> present that needs to be investigated and fixed.
>
>  The fix is usually to allow Squid sufficient time when shutting down to
> disconnect from clients, helper apps and flush it's entire index into the
> swap.state journal. It can then load much faster on startup instead of
> re-scanning the whole disk.
>
>
>> And From iostat -x -d , my two disk utilization is often below 5%
>> during scan and serving, which is kind of a waste.
>
> This is probably a good thing. Less wait time for IO, as long as the disks
> don't have to spin up from idle.
>
>>
>> from some doc, I found statement that squid needs 14M (on 64 bit) for
>> each 1G on disk. If that's the case, to fill 500G disk I would need
>> ~8G ram just to hold the metadata.
>>
>> So my question is:
>>
>> 1. Is this statement true? Can squid somehow lookup directly on the
>> disk to imporve disk utilization and reduce memory needs?
>
> Yes it is true. As an estimated value based on ~64KB avg object size. Your
> specific stuation may vary.
>
> You just said above it took an hour to scan the disk loading all the URLs
> into memory. This would need to be done on every single request. All the
> index holds is the URL the name of file it came from and the headers needed
> to calculate age + variant ID.
>
>> 2. How big the cache people usually have? I think having a 500G cache
>> will definitely improve hit ratio and byte hit ratio, is that true?
>
> I just had to fix problems uncovered and annoying people with TBs of total
> cache. Your 500GB is not that big nowdays, but above average.
>
>> 3. what other optimization is needed for building huge cache?
>
> Disable atime on the cache_dir drive(s). The cache files are easily and
> often changed or fetched fresh from the network. Opinions differ about FS
> journaling.
>
> Spreading it over many disk spindles. Each cache_dir is managed separately
> and a bunch can pull/push objects in parallel with less waiting. Multiple
> cache_dir on a single drive *will* clash with their IO instructions. And
> there is a 2^31 objects per cache_dir hard limit. So multiple smaller disks
> serve better than single huge ones.
>
> It's up to you how much you want to test and tune all this.
>
> Amos
> --
> Please be using
>  Current Stable Squid 2.7.STABLE9 or 3.1.9
>  Beta testers wanted for 3.2.0.3
>
Received on Thu Dec 16 2010 - 06:19:32 MST

This archive was generated by hypermail 2.2.0 : Thu Dec 16 2010 - 12:00:03 MST