Re: Squid store replacement policies

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Wed, 26 Apr 2000 23:42:09 -0600 (MDT)

On Wed, 26 Apr 2000, Henrik Nordstrom wrote:

> Might be doable, but I prefer to abstract how the index actually is
> stored from the policy. Why:
>
> a) The method of the actual storage depends on the storage type/fs being
> used.

"May depend", you mean. Since in your API a FS can create its own
policies and reject custom ones, that's already taken care of to a
certain extent. If more flexibility is needed, a "fd" parameter can be
replaced with some other (more sophisticated) object that gives
read/write access to the store.

It seems to me that the only (highly unlikely) case where there will be
a need for such a sophisticated object is when the FS does not want the
policy to use a contiguous block for metadata. Virtually everything else
should be covered by a "fd" or a simple object with basic read/write
methods such as already implemented Packer.

> b) The user might want to switch between different policies while
> preserving the cache. The cost of such a switch should be no more than
> the loss of the policy order.

Already supported by the API. If the old order is lost (for whatever
reason), it can be recovered using stored FS index file.

Note that the policy should store just its metadata (if any), not FS
index. FS stores its metadata independent of the policy in use. The
latter may, in some cases, increase storage requirements a little bit,
but in most cases the difference will be negligible, if any.

Moreover, a FS (or policy?) can have a configuration parameter to store
or not to store the policy metadata, providing the user with a trade-off
between storage space and rebuild time.

> Also, from a restart performance point of view it might be required to
> interleave the clean index loading or storing of several policies "at
> the same time".

Not a big deal. Load/store methods can have incremental semantics if
needed.
 
> > Finally, if the stored policy "index" is lost or corrupted (e.g., load()
> > returns false), the policy can be rebuild from scratch using already
> > available add() method.
>
> If you can find the objects somehow else yes.

Well, you better be able to! It is not a good idea to rely on the
_removal policy_ to rebuild the cache in case of the index loss. Again,
we are back to what that "policy" object is suppose to represent. IMHO,
you agree to call it a "removal policy", but then give it a bunch of
tasks and requirements that a removal policy should not care about or
cannot satisfy (in general). I understand that you are trying to
minimize the short term changes, but I do not think that should be the
first priority when you design an API like the policy API.
 
> > I see no _interface_ differences between "hot object cache" removal
> > policy and "file system" removal policy. "Hot object cache" is just
> > another "file system" and should have the same interface.
>
> Problem is that it isn't. The "hot object cache" is actually a shadow
> cache. A single object can exist both in the "hot object cache" and in a
> disk store at the same time, with the same identity (the StoreEntry).

You are missing the encapsulation point again, I think. The _policy_ and
even the _file system_ objects, in general, should not know the
differences you are quoting above! Only the "Cache/Store" object that
maintains all FSystems would know enough to treat memory FS differently.
The FS and policy interface will remain exactly the same!
 
> The "hot object cache" policy indexes MemoryObject structures
> (StoreEntry->mem_obj), while the "disk object cache" indexes objects
> (StoreEntries). What is removed when you purge the "hot object cache" is
> only the MemoryObject, not the whole StoreEntry. I do not feel
> particulary happy about breaking all this up only because we are
> abstracting the removal policy.

The "hot object cache" policy indexes StoreEntries (that may have
MemoryObject structures). What is removed when you purge the "hot object
cache" is the StoreEntry. It is removed from the "hot memory index"
leaving "file store index" and its entries unchanged.

You are abstracting both the removal and FS objects. IIRC, one of the
oldest performance bugs in Squid was that hot objects were managed by
the "reverse LRU" policy, making memory cache almost useless. That bug
would not exist if the same LRU policy that manages the disk store would
be used for managing memory store.

> Having two or more stores is not the big issue. The issue is in how to
> best connect these together in an efficient manner without races.

Agree. The FS manager object (still do not know what its name is) should
be able to handle that. There should be fewer hard-to-detect race
conditions (or at least fewer bugs) when all FSs, including memory one
use the same interface.

> ...
> Look at the API definition again. The createPolicy is a global function,
> not a method on a individual policy.

That's fine. I do not see why it should be a part of the API though. All
functions listed in the API should be something that each policy must
implement OR something that users must use. "Not a method on a
individual policy" should not be the part of the API unless you require
users to call that function for some reason.

> Yes, it is part of the API for policies.
> No, it isn't part of the API implementaiton for A policy.

Strange. createPolicy() is not used by policies, is not implemented by
policies, and, IMO, should not be called by policy's users. Still you
want to have it in the API!

This is not a big deal, of course.
 
> > Fine, although I think it would be a more flexible design if a policy
> > can return more than one victim at a time. It will noticeably simplify
> > implementation when a single FS can work with several policies, some of
> > which cluster and some do not.
>
> At the cost of complicating the implementation of all FS:es.

No complications at all if you require the policy to call the protected
purge-from-policy method of the FS instead of returning a [list of]
objects to be removed.
 
> > 7. We should add a "policy->type()" method that returns the type
> > of the policy so that FS can check compatibility and do ugly
> > casts.
>
> Maybe, however it is not needed at this time. The FS will know the type
> as it created the policy in the first place, second I don't see what
> kinds of ugly casts you are talking about here.

Do not forget that a user may specify an alternative removal policy for
a file system. A file system should have some way of deciding whether
that policy is compatible with the FS.

As for casts, they will look like this

        if (type == "T1")
                policy = createT1Policy(...);
        else
                policy = createT2Policy(...);

        ... /* many "select calls" later */ ...

        if (policy->type() == "T1")
                ((T1Policy*)policy)->t1SpecificMethod(...);
        else
        if (policy->type() == "T2")
                ((T2Policy*)policy)->t2SpecificMethod(...);
        ...

You would laugh, but I know one paranoid person who actually uses a cast
method that just returns "this", but asserts if the actual type does not
match the cast type:

        const T1Policy *p1 = (T1Policy*)policy->cast("T1");
        p1->t1SpecificMethod(...);

        or just

        ((T1Policy*)policy->cast("T1"))->t1SpecificMethod(...);

The above protects from bugs like

        if (policy->type() == "T1")
                ((T2Policy*)policy)->t2SpecificMethod(...);

Some compilers have built-in support for this ugliness, but it is not
portable yet.

> Or are you talking about defining some more abstract compatibility
> types? I don't see how we could do that at this time as we do not know
> the requirements of such compatibility types.. most likely we will not
> be able to find these requirements, and the FS:es has to rely on other
> methods anyway.

No, I would be happy with just a descriptive character string as a type,
to start with.
 
> > 8. We may need to add a "policy->fileSystem(fs) method that tells
> > the policy that is now being adopted by the specified fs.
> > This method should be called only once and only with a
> > non-null fs parameter.
>
> Why?

Because a policy may need to get additional information from the file
system. For example, a policy may behave differently depending on how
"full" or "fragmented" the file system is. Besides, it is great for
debugging/diagnostic purposes. Compare:

        "entry XYZ is less valuable than ZXY"
with
        "/dev/sd0d: entry XYZ is less valuable than ZXY, using
        aggressive removal tactic because FS is 98% full"

Alex.
Received on Wed Apr 26 2000 - 23:42:43 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:25 MST