Re: Squid store replacement policies

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Tue, 25 Apr 2000 08:48:22 -0600 (MDT)

On Tue, 25 Apr 2000, Henrik Nordstrom wrote:

> Alex Rousskov wrote:
>
> > Just to be a pain, I would note that the proposed semantics seems to be
> > incomplete. A "walk" is not [necessarily] an atomic operation. One has
> > to define (at least) what happens if objects get
> > referenced/added/deleted/etc. while the walk is in progress. I am not
> > even talking about things like re-configure when an entire "drive" may
> > disappear from the store while somebody is slowly searching for an
> > object that matches some "interesting" criteria...
>
> Right. It should be documented that walks may only be safely performed
> atomicly without any other store operations taking place, or the result
> will be undefined. I am only looking at replacing the two walks
> currently done in Squid.

If that is the choice, I suggest that it is not only documented, but
_enforced_. That is, policy/store API should
        assert(number-of-pending-index-walkers == 0)
in functions that modify the store. Why should the result be undefined?
The result should be an assertion! Otherwise, I am sure people will
write code that _sometimes_ violates the atomicity requirement leading
to very subtle and hard-to-find errors. Such violations are usually due
to a hidden side effects that humans cannot follow
efficiently:
        
        w = p->walker();
        while (e = w->next()) {
                myF(e);
        }

        where,
        myF is my function, but it calls xF that was written by X
        xF calls yF written by Y
        as a side effect, once in a while, yF calls policy->purge()

Note that maintaining the above assertions is trivial and cheap. We just
need to count all index walkers currently released to the users;
something that is wise to do anyway!

Such assertions should probably be placed in an
        void aboutToModify(Policy)
function that every policy must call before modifications (the latter
also can be enforced if desired).

The "atomicity" requirement will make the "purge" walker cumbersome to
implement, because that walker is suppose to purge objects from the
cache. In my experience, it is better not to provide a "purge" walker
at all I found it much easier to add a
        entry = policy->nextVictim()
interface instead.

Here is a resulting cache-this-entry method, simplified:

bool Cache::cacheEntry(CacheEntry *e) {
        if (theIndex->find(e))
                return false; // already cached

        if (!thePolicy->canAdmit(e))
                return false;

        if (theCapacity < e->size())
                return false; // will not fit

        // purge old entries to free space if needed
        while (theCapacity - theSize < e->size()) {
                CacheEntry *v = thePolicy->nextVictim();
                purgeEntry(v);
        }

        theIndex->add(e);
        theSize += e->size();
        thePolicy->noteAdmitted(e);
        return true;
}

N.B. In your model, "theIndex" and "thePolicy" may be the same, I am not
sure.

Note that the only purpose of any walker (aka iterator) is to maintain
some "state" of the iteration process. In removal policies that I am
aware of, maintaining a state (especially in an object separate from the
policy itself!) is either (a) not needed or (b) extremely difficult.
because the relative "value" of a cached object usually either (a) does
not change when other objects are purged or (b) may change in
hard-to-predict ways when other objects are purged. For example, with
LRU or FIFO, the "state" is just a head/tail of the queue, maintained by
the policy. On the other hand, with algorithms that use object size and
disk placement, a removal of a single object may change the "value" of
many objects at once; the "state" is then essentially an entire store
table. If there is no state we want to maintain in a separate object,
there is no need for a walker

Finally, please note that "atomicity" would make deliberately-slow
things like incremental building of a cache digest impossible. Thus, the
Cache Digest and similar chunk-at-a-time users will have to be disabled
if new store API is in place (I assume that with the new API, there will
be no direct access to ``store table'').

A solution may be to introduce a third kind of a walker: a preemtable
walker that does not guarantee uniqueness of returned entries,
repeatable order of walks, or 100% coverage. Some policies may refuse to
provide such a walker, effectively disabling algorithms that need them.
API users would need to check for that. I suspect that this third walker
may have the same interface as the "index" walker.

Thanks,

Alex.
Received on Tue Apr 25 2000 - 08:48:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:24 MST