Re: Squid store replacement policies

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Wed, 26 Apr 2000 20:28:14 +0200

Alex Rousskov wrote:
>
> Sorry, more interface ramble:

No problem.

> 0a. prog-guide-8.html#ss8.4 talks about "storage policy", which
> seems to be a very broad/vague term; it would be better to
> define what exactly that policy is, so people do not try to
> use it for something it is not designed for. An informal
> explanation of the unfortunate "storage policy" term is
> probably not good enough.

Agreed.

> 0b. If "Policy" type is for a replacement policy, it should be
> called "RepltPolicy" or some-such. Other policies may be
> admission policy (what to cache), dataplacement policy (where
> to cache) and, perhaps, peer selection policy, etc.

The exact naming of the types for the structures have not yet been
decided. Sorry for the confusion. The names used in the current text are
only descriptive.

> 0c. The replacement policy interface is actually not designed to
> replace anything, it is designed to remove only. Thus, it is
> more appropriate to call it a removal policy

Good point. I'll make that change.

> 1. It seems appropriate for some policies to benefit from
> information about the [incoming] object that is causing us to
> look for victims. A "purgeInit(Entry *reason)" interface
> should replace "purgeInit()". "Reason" can be null, of
> course.

Perhaps. However, in the current implementation this will always be
NULL. The first use of the purge iterator will look somewhat like
storeDirMaintainSwapSpace and it does not know which objects(s) that
caused the purge to start, only that there is too much data in the
filesystem.

> 2. All walkers should be able to return a constant pointer to
> the policy that created them. The pointer must not be cached
> though because a policy may disappear at virtually any time.
> The pointer should be constant because we cannot modify the
> policy while we iterate it (based on the current API).

Maybe. Not needed at this time. Can be added at a later date if needed.

> 3. I do not understand why a [removal] policy is creating
> "index" walkers. What does the removal policy have to do with
> iterating the cache?

It has to do with the writing of clean index files.

In Squid there are two kinds indexes:

a) The store index. This is a global index indexed on MD5 hash.

b) The removal policy. One index per cache_dir object store, and one for
the memory "hot object" cache. This is indexed by the removal policy in
ways depending on the policy.

Why there is a index walker to the removal policy is because ordering
might be important when rebuilding this index. A good example is the LRU
policy which can easily be rebuild by simply inserting the objects in
the same order as they would have been purged. Other policies might need
other orderings in the index rebuilding process.

> It seems to me that file system type (or whoever maintains an
> index of cached entries) should be responsible for that.

The "filesystem" makes use of both the store index and the removal
policy. The global store index to know where objects are located, and
the removal policy knows in which order they should be removed.

Technically the "filesystem" makes direct use of the policy, and
provides feedback to the store index. Feedback to the store index is

a) The created object name is NNNN
b) This object is now deleted

> A "Cache" type (or whoever manages all file systems) may also
> return an iterator that is capable of iterating through all
> file systems. Both per-fs iterators and "global" iterators
> are useful.

See my previous posting. The global "store" index should also have a
iterator which would be used in digest generation and such activities.

> 4. What is the relation of the policy and file system? Who
> creates policies and how do they interact with the storage
> index?

"Filesystems" create removal policies.

The generic object store mainains the stoage object index. Only input
from the filesystem to the global storage index is the "disk object
name", i.e. the file number, and feedback when an object is purged.

> 4a API says ``storage directory or memory creates a policy of
> choice for maintaining it's objects.''
>
> What is "memory" in the description above?

"Hot object cache". This will probably not be implemented in the first
attempt due to index storage problems. The removal policy indexes is
currently kept in StoreEntry respective MemoryObject, and I am not sure
on how to best abstract this without running into other problems. For
the time beeing I will concentrate on the disk object removal policies.
When this is done I will consider the hot object cache and reevaluate
the API.

> Why storage directory must create a policy of choice? I
> would think that a fs (storage directory) can be asked to
> create a removal policy OR can be asked to use a policy
> preferred by the user. A directory can refuse in at most one
> of the above cases, of course.

The selection might be done based on input from the user (arguments to
cache_dir), but some storage models have specific demands on the policy
or might even implement the policy themselves if so is needed. A pure
cyclic object store is a good example where the storage and replacement
policy are tightly coupled to each other and you cannot select another
policy since it is part of the storage structure model.

> For example, a Unix FS can work with [at least] FIFO or LRU
> removal policies, but may "prefer" a GDSZ policy or whatever.

A UNIX FS can work with any removal policy I would say, except some
obscure policies which are not policies but sideeffects of something
else.

> If storage directory creates a policy of choice, there should
> be no global "createPolicy(type)" interface because that
> storage directory would know how to create a policy of a
> given type (also see (5) below).

The functions in this API are only called by the "filesystems". It is
not a generic API you could use anywhere in the code.

> 4b The presence of the "createPolicy(type)" call implies that
> there is one policy instance for all active file systems. Is
> that intentional/desired?

No, createPolicy returns the created policy instance. Any number of
instances may exists, and in theory a given object store (cache_dir) can
even make use of several policies even if that is not how it is intended
to be used at this time.

> A more flexible design would be to have a per-fs removal
> policy and a global removal policy which chooses which fs
> should give up an object.

It is a per-fs removal policy. The global policy is currently only
considers object insertions, not removals. The effect is mostly the
same.

> A single global removal policy will work, but may be costly
> to implement because data from a fs index would have to be
> copied to that global policy structures, increasing memory
> requirements and possibilities for mis-communication bugs.

I am not considering a single global policy. It does not make sense
given that there might be a mix of different object filesystems with
different demands on replacement policies.

> 5. The "createPolicy(type)" method seems like a global function
> that must know how to create policies of every possible type.
> Such a function cannot be a part of the policy API: a given
> policy should have no clue how other policies are created.

createPolicy is a "create" or "new" method. In C++ it would read
something like

   policy = new StoragePolicy("type")

> Constructors for each policy types are likely to have
> additional parameters and configuration options. The API must
> just require that policy of type T implements a
> "createTPolicy(...)" public constructor, with parameter list
> to be specified by that particular policy.

Currently no parameters to the policy are being considered, but as you
say it might be needed. Good point.

> Moreover, I doubt "createPolicy(type)" will survive for long
> because people would want to supply configuration parameters
> for particular policies via squid.conf. Something like
> createPolicy(PolicyCfg *cfg)
> may survive longer.

How about

RemovalPolicy *createRemovalPolicy(char *type, char *arguments)

Both type and arguments are typically from cache_dir configuration in
squid.conf, and there is no way we can foresee the different types of
configuration data a policy might need.

> 6. Is it prudent to plan for removal policies that remove more
> than one object at a time? Clustering removal operations can
> be a significant optimization for some file systems...
> Current interface makes it impossible to remove two objects
> at once when removing the first object is sufficient from
> space requirements point of view.

Filesystems are free to cluster removals if they feel like it. The
removal policy don't care. It will simply return one object at a time
when asked by the filesystem on what to remove.

If the filesystem has certain demands on how objects are removed then it
should only accept to create policies which is known to fulfill these
requirements, or hardcode it's own selection.

/Henrik
Received on Wed Apr 26 2000 - 12:48:41 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:24 MST