Re: Limiting the number of small objects

From: John Dilley <jad@dont-contact.us>
Date: Tue, 21 Mar 2000 10:09:49 -0800

> On Tue, 21 Mar 2000, Bert Driehuis wrote:
>
> > Well, yes and no. The problem as I see it is that unless you move the
> > big objects around in the store_list, you have to skip them anyway on
> > each expiration pass. My gut feeling tells me that skipping all those
> > entries each second would be prohibitively expensive (for a 100GB disk
> > cache with 600000 objecs, you would be able to store 100000 1MB objects
> > and 500000 small ones, if I calculate this right, and you would probably
> > wind up scanning all 100000 every time).
>
> This new replacement algorithm couldn't use the 'store_list'
> structure. It would have to use the heap structure instead. Squid
> already has support for non-LRU algorithms, but you have to give
> --enable-help-repalcement on the configure command.
>
> It should be just a matter of defining a new "key" function for
> your size-based algorithm. The GDS algorithm already accounts
> for size, but probably not as strongly as you'd like.

It sounds like you want to keep objects in cache to utilize the available
disk space to hold large objects to keep from hauling them over your network
again. I do not recommend flushing out small objects -- they might be very
popular; by removing them you will add needless latency to their retrieval.

Instead, what you want is a frequency-based replacement algorithm. This will
keep popular, recently-referenced objects in cache regardless of their size.
As Duane mentions there is already one available in Squid, called LFUDA (use
the flag --enable-heap-replacement to use this policy). The average object
size will be significantly larger than when using the GDSF policy (the other
heap policy), which attempts to optimize hit rate by keeping more (therefore
smaller) objects. Give LFUDA a try and see if it does what you want.

If you want to encourage the storage of large objects you can create a new
heap key generation function. You might want to look at the LFUDA and GDSF
keygen functions, and then go from there. Be careful, though, that you do
not keep a few monster objects to the detriment of your overall cache
performance. The size distribution is very "heavy tailed" meaning a few very
large objects can consume a significant fraction of cache storage
(significant relative to their frequency of occurance or reference). Some of
these large objects will be referenced only once or infrequently so you have
to think carefully about how long you want to keep them in cache ...

Good luck to you! Please summarize your results. Best regards,

-- jad --
John Dilley
jad@akamai.com
Received on Tue Mar 21 2000 - 11:04:39 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:22 MST