Re: Squid store replacement policies

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Thu, 27 Apr 2000 22:13:02 +0200

Alex Rousskov wrote:

> "May depend", you mean. Since in your API a FS can create its own
> policies and reject custom ones, that's already taken care of to a
> certain extent. If more flexibility is needed, a "fd" parameter can be
> replaced with some other (more sophisticated) object that gives
> read/write access to the store.

No, it entirely depends on the FS type. Few FS types except the ufs
based (ufs, aufs, diskd) will use unix filedescriptors in the first
place. Yes, implementing a additional FS interface for reading/writing
generic data (not objects) might do the trick, but we do not have this
part yet.

> It seems to me that the only (highly unlikely) case where there will be
> a need for such a sophisticated object is when the FS does not want the
> policy to use a contiguous block for metadata. Virtually everything else
> should be covered by a "fd" or a simple object with basic read/write
> methods such as already implemented Packer.

Perhaps, but I do not wish to complicate the implementation with that at
this time. Maybe you can convince me later.

> Already supported by the API. If the old order is lost (for whatever
> reason), it can be recovered using stored FS index file.

So you are proposing we introduce a requirement on yet another
persistent index, or even a third runtime index?

Squid has two runtime indexes:

a) The global main store index, indexing all the known objects on MD5
hashes.

b) The removal policy priority, kept on a per cache_dir basis.

What we want to preserve on a shutdown is a index of what is kept in
each of the cache_dir's, while preserving as much as possible of the
ordering in the removal policy.

> Note that the policy should store just its metadata (if any), not FS
> index. FS stores its metadata independent of the policy in use. The
> latter may, in some cases, increase storage requirements a little bit,
> but in most cases the difference will be negligible, if any.

And how do you connect the two together in an efficient manner?

> Moreover, a FS (or policy?) can have a configuration parameter to store
> or not to store the policy metadata, providing the user with a trade-off
> between storage space and rebuild time.

> Well, you better be able to! It is not a good idea to rely on the
> _removal policy_ to rebuild the cache in case of the index loss. Again,
> we are back to what that "policy" object is suppose to represent.

I am not. I am relying on the cache_dir to be able to rebuild the
policy, and then on the policy to be able to maintain runtime
information about what is contained in the cache_dir.

> IMHO, you agree to call it a "removal policy", but then give it a bunch of
> tasks and requirements that a removal policy should not care about or
> cannot satisfy (in general). I understand that you are trying to
> minimize the short term changes, but I do not think that should be the
> first priority when you design an API like the policy API.

If you have any ideas on how the indexes should be implemented then
please say so. I will however go ahead and implement the policy API as
it is now. Changing it later is not a big deal, abstracting it out of
the main code is the most important part at this time.

> You are missing the encapsulation point again, I think. The _policy_ and
> even the _file system_ objects, in general, should not know the
> differences you are quoting above! Only the "Cache/Store" object that
> maintains all FSystems would know enough to treat memory FS differently.
> The FS and policy interface will remain exactly the same!

I fully agree, but I don't have the time to address that at this time.
The problem is not with the policy implementation, but with index
storage and reverse lookups of them.

> The "hot object cache" policy indexes StoreEntries (that may have
> MemoryObject structures). What is removed when you purge the "hot object
> cache" is the StoreEntry. It is removed from the "hot memory index"
> leaving "file store index" and its entries unchanged.

Again no. The "hot object cache" currently indexes MemoryObjects which
happens to have a StoreEntry. A StoreEntry cannot be indexed by the "hot
object cache" if it doesn't have a MemoryObject.

> You are abstracting both the removal and FS objects. IIRC, one of the
> oldest performance bugs in Squid was that hot objects were managed by
> the "reverse LRU" policy, making memory cache almost useless. That bug
> would not exist if the same LRU policy that manages the disk store would
> be used for managing memory store.

The "hot object cache" is not at all a "hot object cache", and I very
much dislike how it is implemented. But it is not the issue I am
currently trying to address. This will be dealt with later.

> Agree. The FS manager object (still do not know what its name is) should
> be able to handle that. There should be fewer hard-to-detect race
> conditions (or at least fewer bugs) when all FSs, including memory one
> use the same interface.

The "hot object cache" is not at all implemented as a FS type store. It
is completely different almost all aspects. It simply piggy-backs on
Squids object forwarding.

> That's fine. I do not see why it should be a part of the API though. All
> functions listed in the API should be something that each policy must
> implement OR something that users must use. "Not a method on a
> individual policy" should not be the part of the API unless you require
> users to call that function for some reason.

Is is part of the API for making use of policies. The API implemented by
the policy looks slightly different. Both are now documented.

> Strange. createPolicy() is not used by policies, is not implemented by
> policies, and, IMO, should not be called by policy's users. Still you
> want to have it in the API!

It is called by the policy's users to create the policy before use.

> Do not forget that a user may specify an alternative removal policy for
> a file system. A file system should have some way of deciding whether
> that policy is compatible with the FS.

It is the FS who gets the input from the user on which policy to create,
and it is the FS who creates the policy for it's objects.

> Because a policy may need to get additional information from the file
> system. For example, a policy may behave differently depending on how
> "full" or "fragmented" the file system is. Besides, it is great for
> debugging/diagnostic purposes. Compare:
>
> "entry XYZ is less valuable than ZXY"
> with
> "/dev/sd0d: entry XYZ is less valuable than ZXY, using
> aggressive removal tactic because FS is 98% full"

You have a point there. Might add it later on, but first I'd like to
find out how to handle the policy indexing of the "hot object cache".

/Henrik
Received on Thu Apr 27 2000 - 14:14:47 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:25 MST