Re: LFU in squid

From: John Dilley <jad@dont-contact.us>
Date: Sun, 25 Apr 1999 17:54:21 -0700

> > Note that as far as I know Squid does not support pure LRU. Garbage
> > collection happens on per-bucket basis.

> Hmm.. last time I looked it uses what I would call "Modified LRU with a
> bias to expired objects". The code comments speaks about buckets, but
> the code only uses the LRU index.

> storeMaintainSwapSpace:
> Scan the LRU tail and remove "expired" objects.
> Scanning is limited both by range (100 to 500 object depending on store
> max/min difference), and number of objects removed (10 to 80, linear to
> the amount of objects scanned).

        This is correct as of Squid 2 (I've modified the replacement
policy in 2.1.PATCH2). In Squid 1 (1.1.20 and presumably others) the
replacement policy was implemented by bucket scanning; those comments
still remain but are not valid.

> If it would happen that the LRU tail gets filled with objects which
> are not expired, then the cache size will increase and the LRU age
> (storeExpiredReferencedAge) will creep down causing the objects to
> "expire".

        This behavior is interesting -- and not so useful. In the
performance evaluation I did following the Squid 2 replacement policy
modification it appears that the call to storeExpiredReferencedAge is
fairly expensive. It is called once per replacement decision to bias
against expired objects, but it involves two floating point divides, a
multiply, and an exponentiation operation (pow() function call).

        In the work we did here we implemented two LFU-variants. We
used a heap for executing the policy efficiently. We also implemented a
version of LRU that used the heap for comparison. These are not true
LFU, since we do not keep metadata across object lifetimes -- when an
object is removed from cache we remove its metadata. Proper LFU keeps
object metadata. We are considering how to do this, and I'd be happy to
discuss it further if someone wants to hack. Basic idea: keep the
object metadata in a separate hash table and garbage collect it...

> It would be interesting to know how much this (current) derivation from
> LRU actually improves things compared to how much resources (CPU power)
> it requires on a heavily loaded cache.. My suspicion is that it actually
> performs LRU with a significant higher resource usage.

        You're absolutely correct about the CPU usage. I have written a
technical report that discusses this in a fair amount of detail. I
presented a work in progress paper at WCW'99 and have now completed the
work (there were a couple of puzzles I had to solve first). The paper
is in final review and will be published externally in a little while.
If anyone on this list would like to give me comments on it before
publication (i.e., some time this week) I would be more than happy to
send a pre-publication version to you, and would appreciate the eyes...

        One other thing we noted in the experiments is that the heap
based LRU policy achieved a significantly lower hit rate than the list
based one (only the list one used storeExpiredReferencedAge). So, it
costs more but achieves a little better hit rate. Nevertheless the
frequency-based policies behaved quite a bit better than LRU. We intend
to release the code changes to the Squid cache community. With the heap
implementation it is trivial to investment and implement new replacement
policies. Lots of people say it doesn't matter (because disks are so
cheap and you can improve the cache in many other ways). But my feeling
is if you can improve your resource utilization through the replacement
policy, why not?

        One final note -- the heap implementation requires O(log N) time
(where N is the number of objects in the heap) every time an object is
hit or evicted. The list implementation requires O(1) to move an object
to the head of the list or evict from the tail. We expected the new
policies would take a CPU hit, although since CPU is not the bottleneck
we felt it was worthwhile. To our surprise, the heap policies require
less CPU than LRU due to storeExpiredReferencedAge, and since we are
able to keep more objects in cache. That was quite pleasing.

        I'll have more information for the caching community shortly,
including source code (unless my hands get cut off). A copy of the
paper will appear on my home page. Best regards,

                             -- jad --
                          John Dilley <jad@hpl.hp.com>
Received on Tue Jul 29 2003 - 13:15:57 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:06 MST