Re: [squid-users] Cache distribution algorithm. Please enlighten me.

From: Joao Pedro Clemente <jpcl@dont-contact.us>
Date: Mon, 27 May 2002 15:34:31 +0100 (WET DST)

On Mon, 27 May 2002, Henrik Nordstrom wrote:
> Joao Clemente wrote:
> >
> > Ok, "du -sh *" at my squid/cache folder gives me the following output
> >
> > 598M 00
> > 606M 01
> > 377M 02
> > 1.1M 03
> > 1.1M 04
> > (...)
> >
> > So, can someone explain me what distribution algorith is used? Is this suposed to be like this?
> > It seems like when setting up squid.conf , we could say:
> > "Ok, 1 want ... hmmm.. 1 Gb cached, so I'll only need 3 top directories..."
>
> Only as many L1 directories as needed to fit the number of objects you
> have cached is used.
>
> The number of objects depends on your cache_dir size, and the actual
> object size distribution. Many small objects is more than a few large
> objects even if they account for the same amount of data..

Thank you for your reply, but or my english is very very bad or my
original question remains unanswered: why such a "unbalanced" distribution
of objects over the directories, and is it suposed to be like this?

What you said is that cache_dir size limits the number of objects in
cache, and that each object has a overhead associated so many small
objects vs less bigger objects increases you cache usage. Ok. I agree.

But why isn't the load distributed evenly over L1 and L2 directories?
The behavior seems to be the same in both cases:

Fill up 00/00 directory (till some amount of data) , then fill up
00/01 directory (till same ammount of data), and so on..
What I want to figure out is why is this not balanced across all
directories. Is there an advantage? I can't see if a change in the number
of L1 or L2 directories would change something (performance speaking) or
not. You see, regardless if I have 1 or 100 L1 directoryes, it seem that
for instance (and keeping the default L2 number), if I had a cache size
of 500 Mb only the first L1 directory would be used and I would have
99*256 (default L2 number, IIRC) = 25344 empty directories...

So, is this behaviour better than balancing objects over all directories?
For instance, why not use something kind of :
TOTAL_DIR_NUMBER = 16 L1 * 256 L2 = 4096 directories.
1rst object goes to 00/00, 2nd goes to 00/01, third goes to 00/02 ...
4097nth object would go again to 00/00 ...
(just as an example, this obviously can't be this simple 'cause of cache
removal polities and all that, but you would get some sort of balanced
load over all cache tree..

See what I'm asking? Thanks once again for your reply!

Joao Clemente

IST - Portugal
Received on Mon May 27 2002 - 08:38:59 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:08:14 MST