Re: Squid store replacement policies from Alex Rousskov on 2000-04-26 (squid-dev)

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Wed, 26 Apr 2000 14:01:15 -0600 (MDT)

On Wed, 26 Apr 2000, Henrik Nordstrom wrote:

> Maybe. Not needed at this time. Can be added at a later date if needed.

... at an expense of modifying all already implemented (by that time)
policies (unless you plan on supporting some inheritance primitives in
your C++ emulation approach, but it does not look like you want to go
that far).

My API design rule-of-thumb is to plan one or two steps ahead, but no
further. :)

> > 3. I do not understand why a [removal] policy is creating
> > "index" walkers. What does the removal policy have to do with
> > iterating the cache?
>
> It has to do with the writing of clean index files.
>
> In Squid there are two kinds indexes:
>
> a) The store index. This is a global index indexed on MD5 hash.
>
> b) The removal policy. One index per cache_dir object store, and one for
> the memory "hot object" cache. This is indexed by the removal policy in
> ways depending on the policy.
>
> Why there is a index walker to the removal policy is because ordering
> might be important when rebuilding this index. A good example is the LRU
> policy which can easily be rebuild by simply inserting the objects in
> the same order as they would have been purged. Other policies might need
> other orderings in the index rebuilding process.

OK. Now I probably understand what is going on: you are trying to
implement the storage/rebuild of the removal policy "order", without the
policy's knowledge, by demanding a for-storage iterator from the policy.
I would argue against such a complicated and rigid design!

If the sole purpose of the iterator is to save the policy "index" (if
any!) and later load it, then it should be up to the policy how to do
it:
policy->store(fd)
policy->load(fd)
where the file descriptor is maintained by the file system module that
owns the policy. No need for a public iterator interface.

Finally, if the stored policy "index" is lost or corrupted (e.g., load()
returns false), the policy can be rebuild from scratch using already
available add() method.

Note that with this store/load design [and no purge walker] we
completely remove the assumption/requirement that the policy has some
kind of well-ordered "index" as meta-data. We simply leave it up to the
policy to decide on this internal matters (which is good because the
rest of the code has more important things to deal with).

> The global "store" index should also have a
> iterator which would be used in digest generation and such activities.

Great! And, if you agree with the above, that global iterator will have
nothing to do with the removal policies (and it should not, in general).

> "Hot object cache". This will probably not be implemented in the first
> attempt due to index storage problems. The removal policy indexes is
> currently kept in StoreEntry respective MemoryObject, and I am not sure
> on how to best abstract this without running into other problems. For
> the time beeing I will concentrate on the disk object removal policies.
> When this is done I will consider the hot object cache and reevaluate
> the API.

I see no _interface_ differences between "hot object cache" removal
policy and "file system" removal policy. "Hot object cache" is just
another "file system" and should have the same interface. Only
"Cache/Store" object would know that it is a "special" volatile file
system that has "close" copies of objects that could be stored
elsewhere. The only somewhat "interesting" problem is to allow for one
object to reside in both "disk" and "memory" file systems. A separate
"global" index for memory with unique filenos is one solution.

Sometime in the future, we might have more than one "Hot cache" even.
For example, a hot cache for HTTP objects and a hot cache for streaming
media objects or whatever...

> > Why storage directory must create a policy of choice? I
> > would think that a fs (storage directory) can be asked to
> > create a removal policy OR can be asked to use a policy
> > preferred by the user. A directory can refuse in at most one
> > of the above cases, of course.
>
> The selection might be done based on input from the user (arguments to
> cache_dir), but some storage models have specific demands on the policy
> or might even implement the policy themselves if so is needed.

Sure, my description above allows for that:

    a) try to configure file system with the user-specified
       policy, if any; warn user if failed
    b) if (a) failed or skipped, ask fs for a "preferred"
       policy and use that to configure the fs (must succeed)

> The functions in this API are only called by the "filesystems". It is
> not a generic API you could use anywhere in the code.

OK, although I do not see where such a restriction is coming from and
where it is documented.

> > 5. The "createPolicy(type)" method seems like a global function
> > that must know how to create policies of every possible type.
> > Such a function cannot be a part of the policy API: a given
> > policy should have no clue how other policies are created.
>
> createPolicy is a "create" or "new" method. In C++ it would read
> something like
>
> policy = new StoragePolicy("type")

That's exactly what I am arguing against! In C++. there would be no such
function. Instead there will be per-policy constructors:

    policy = new StoragePolicyOfTypeT1(...)
    policy = new StoragePolicyOfTypeT2(...)
    policy = new StoragePolicyOfTypeT3(...)

The one-for-all createPolicy(type) function is needed only when we parse
squid.conf. That piece of code has nothing to do with policy API
(policies do not know/care if it exists or not). IT will most likely
look something like

    if (type == "T1")
        return createT1Policy();
    if (type == "T2")
        return createT2Policy();

We only should require that a policy implements a "constructor" method
with whatever parameters it feels necessary.

> How about
>
> RemovalPolicy *createRemovalPolicy(char *type, char *arguments)
>
> Both type and arguments are typically from cache_dir configuration in
> squid.conf, and there is no way we can foresee the different types of
> configuration data a policy might need.

This function is just fine inside a squid.conf parser, but it is not a
part of the API. API is for people who write policies or use policies.
This function does not implement any part of a given policy. Each policy
must provide its own "create" constructor. Whoever writes squid.conf
parser will use those constructors to create user-specified policies.

> Filesystems are free to cluster removals if they feel like it. The
> removal policy don't care. It will simply return one object at a time
> when asked by the filesystem on what to remove.

Fine, although I think it would be a more flexible design if a policy
can return more than one victim at a time. It will noticeably simplify
implementation when a single FS can work with several policies, some of
which cluster and some do not.

> If the filesystem has certain demands on how objects are removed then it
> should only accept to create policies which is known to fulfill these
> requirements, or hardcode it's own selection.

Agreed.

    7. We should add a "policy->type()" method that returns the type
       of the policy so that FS can check compatibility and do ugly
       casts.

    8. We may need to add a "policy->fileSystem(fs) method that tells
       the policy that is now being adopted by the specified fs.
       This method should be called only once and only with a
       non-null fs parameter.

Thanks,

Alex.
Received on Wed Apr 26 2000 - 14:01:49 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:24 MST