Re: Squid store replacement policies

From: John Dilley <jad@dont-contact.us>
Date: Mon, 24 Apr 2000 17:15:59 -0700

> > Other interesting algorithms use a "cost" function where cost can be
> > determined by several factors including server response time or
bandwidth
> > cost.
>
> All this is rather static information known when the object is added in
> the first place isn't it (and probably deduceable from the StoreEntry)?

Not as far as I'm aware. I haven't looked at StoreEntry recently but the
cost of retrieving the object (msec, effective bw to origin server) was not
present last time I did.

> Also, if Squid knows about it then should be quite easy to look it up.
> However, it also affects what needs to be stored in the persistent index
> in order to be able to rebuild the policy after a restart..

Yes, however be warned that the cost functions I have seen have not been
particularly stable. You can compute a mean response time, but it is a
relatively poor predictor of the response time you will get on the next
request - since that is determined by current server load, network load, and
so forth. I am not comfortable using such parameters in a replacement policy
since they are by definition unknown (only a guess). Whereas the size,
frequency, and time of last request are known by the cache.

> The API definition so far have two walkers. One which returns the
> objects to be deleted, and another which simply walks the list of
> objects without removing them. I did not originally include any
> information on in which order the second walker should return the
> objects, but I have since then desided that the API will define this
> walker to return the objects in an order suitable for rebuilding the
> policy. It's primary use is when writing clean persistent indexes.

Right. Doing this efficiently with a heap may be non-trivial, but doing it
adequately is easy.

BTW, a heap does have a definition of "next" if you start at the top and
walk all the way through to the end without someone intervening. It's just
that the objects will not be returned in strictly increasing order unless
you pop them from the top and reorder the heap each time.

I once did an experiment where I calculated how far off a heap was from
being ordered and called that the "skew" of the heap. I found that the heaps
I was looking at could be quite skewed; and that the maximum key was more
expensive to find than I had previously thought (assuming the min key is at
the top of the heap).

> If other types of walks are needed this can be added by simply adding
> another method in the policy API, but I am currently trying to keep the
> size of the API down as much as possible to simplify the policy
> implementation.

Good suggestion. Note that other ways to walk the heap may require
extracting all metadata and sorting it or something, which is not a cheap
operation. I would discourage this. Providing two walks (an inorder walk
which removes each object from the heap and a walk in some arbitrary order
that returns a pointer to each object without adjusting its position) seems
adequate and safe (i.e., the semantics are clearly defined). Regards,

--jad--
Received on Mon Apr 24 2000 - 18:17:02 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:24 MST