Re: LFU in squid from John Dilley on 1999-04-26 (squid-dev)

From: John Dilley <jad@dont-contact.us>
Date: Mon, 26 Apr 1999 07:18:44 -0700

> > This is correct as of Squid 2 (I've modified the replacement
> > policy in 2.1.PATCH2). In Squid 1 (1.1.20 and presumably others) the
> > replacement policy was implemented by bucket scanning; those comments
> > still remain but are not valid.

> Is it true that there is a list of objects in Squid 2 now? I looked at
> 1.1.20 and it obviously has no list at all.

Yep, it's true, it's type is dlink_list. Metadata objects have
a dlink_node pointer in them. The replacement policy uses a global
dlink_list store_list to maintain recency information. Objects are
added to the head; moved to the head when the are referenced; and
evicted from the tail (but not strictly, as per the previous mail.)

> > In the work we did here we implemented two LFU-variants. We
> > used a heap for executing the policy efficiently. We also implemented a
> > version of LRU that used the heap for comparison. These are not true
> > LFU, since we do not keep metadata across object lifetimes -- when an
> > object is removed from cache we remove its metadata. Proper LFU keeps
> > object metadata. We are considering how to do this, and I'd be happy to
> > discuss it further if someone wants to hack. Basic idea: keep the
> > object metadata in a separate hash table and garbage collect it...

> When you say proper LFU, do you mean perfect LFU, ie we keep the refcount
> even if the object is purged?

No, as I explained further down in the message we did not keep
the object metadata across evictions. This would be interesting to do.
Our trace-based simulation indicates a better hit rate would be achieved
although the improvement is not dramatic. (Not as great as going from a
recency-based to a mostly-frequency-based policy.)

> > You're absolutely correct about the CPU usage. I have written a
> > technical report that discusses this in a fair amount of detail. I
> > presented a work in progress paper at WCW'99 and have now completed the
> > work (there were a couple of puzzles I had to solve first). The paper
> > is in final review and will be published externally in a little while.
> > If anyone on this list would like to give me comments on it before
> > publication (i.e., some time this week) I would be more than happy to
> > send a pre-publication version to you, and would appreciate the eyes...

> Can I get a copy? You can send it by email or give me the URL. Thanks in
> advance.

I'll post the URL to the list when it's ready for external
distribution. In the mean time I can send you a pre-pub review copy if
you were agreeing to send me comments this week on it (were you? ;-)

> Does your LFU-variant have any aging policy? I know that if you don't
> do that the old but high refcount objects will hardly be purged, so
> the effective cache size keeps shrinking over a long period of time.

Yes, we use a dynamic aging mechanism. LFU does indeed suffer
from cache pollution given the turnover in the popular object set. The
dynamic aging mechanism does not require parameterization (i.e., there
are no constants to tune as there are with LFU-Aging).

> If you have aging policy, can you tell me how it works? Plus, what do
> you mean by variant? How does it deviate from the traditional memory
> paging LFU, ie refcount is lost when object is purged and new objects
> are always allow to get in.

The policy and the work that led up to its definition are in a
paper by Martin Arlitt et al to be published at this year's WISP. The
dynamic aging factor is basically the value "L" from GDS if you're
familiar with that policy.

> Actually, my colleague (the guy in CC:) also wrote a simulator using heap
> with a comparison function pointer to implement LFU, LRU, GD-size and
> other policies.

> What size of cache do you simulate? I don't know whether this is true when
> the cache is 64GB - the size of cache COMPAQ sold for 200K.

The size of the cache ranged from 16 MB (a RAM cache) to 1 TB
(the size of the unique content set of the trace). The full results are
in the upcoming paper.

> Where is your HP?

I should have included my full signature. I'm at our central
research lab in Palo Alto in the Internet Systems and Applications Lab.
Best regards,

-- jad --

        John Dilley <jad@hpl.hp.com>
        Hewlett-Packard Laboratories
        1501 Page Mill Road MS 1U-17
        Palo Alto, CA 94304 // USA

Voice +1 650 857-8146
Fax +1 650 857-5100

http://www.hpl.hp.com/personal/John_Dilley/
Received on Tue Jul 29 2003 - 13:15:57 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:06 MST