Re: [squid-users] Cache distribution algorithm. Please enlightenme.

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Mon, 27 May 2002 21:50:18 +0200

Joao Pedro Clemente wrote:

> Thank you for your reply, but or my english is very very bad or my
> original question remains unanswered: why such a "unbalanced" distribution
> of objects over the directories, and is it suposed to be like this?

Yes, it is intentional.

> Fill up 00/00 directory (till some amount of data) , then fill up
> 00/01 directory (till same ammount of data), and so on..

Correct. Each L2 directory is filled with L2 objects. Then the next L2
directory is used and so on.

> What I want to figure out is why is this not balanced across all
> directories. Is there an advantage? I can't see if a change in the number
> of L1 or L2 directories would change something (performance speaking) or
> not. You see, regardless if I have 1 or 100 L1 directoryes, it seem that
> for instance (and keeping the default L2 number), if I had a cache size
> of 500 Mb only the first L1 directory would be used and I would have
> 99*256 (default L2 number, IIRC) = 25344 empty directories...

If you have too few L1 directories then the usage will wrap around
causing 00/00 to receive twice as many objects as intended (then 00/01
etc). This causes the size if these L2 directory to increase, making
filesystem operations slower (Standard UNIX style filesystems searches
linearly in a directory when looking for a file).

The sizing of the L1 and L2 parameters work like this:

  1. L2 is to be sized to make the size of each L2 directory suitable,
not too large, and not too small.

  2. Then there needs to be sufficient amount of L1*L2 directories to
make room for all the objects you need to store.

> So, is this behaviour better than balancing objects over all directories?

Yes.

> For instance, why not use something kind of :
> TOTAL_DIR_NUMBER = 16 L1 * 256 L2 = 4096 directories.
> 1rst object goes to 00/00, 2nd goes to 00/01, third goes to 00/02 ...
> 4097nth object would go again to 00/00 ...

This would require all your L1*L2 directories to be active in your
filesystem directory cache, consuming a fair bit of main memory just for
caching all the directories. We do not wish to touch more directories
than needed to minimize this overhead on memory usage...

At the same time, we do not want a single directory to grow too large,
consuming too much CPU time for simple directory operations..

> (just as an example, this obviously can't be this simple 'cause of cache
> removal polities and all that, but you would get some sort of balanced
> load over all cache tree..

Why would one want a balanced load all over the tree? What matters is
performance, not that du reports a nice even distribution, right?

Regards
Henrik
Received on Mon May 27 2002 - 15:57:54 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:08:14 MST