Re: Squid store replacement policies from Alex Rousskov on 2000-04-29 (squid-dev)

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Sat, 29 Apr 2000 09:36:48 -0600 (MDT)

On Fri, 28 Apr 2000, Henrik Nordstrom wrote:

> Alex Rousskov wrote:
>
> > The number of indexes stays the same. The only difference is that each
> > FS stores/loads its part of the main store index and the policy stores
> > the order, if needed.
>
> So you are proposing adding yet another persistent index where the
> policy stores the order, and to somehow be able to store/load this
> separately from the store index. Please elaborate on how you intend that
> this should work.

Policy "index" does not have to be persistent. Whether the policy saves
its metadata or not is up to the policy.

The overall scheme is very simple: Cache_dir object simply notifies the
policy when it is the time to save/load the policy metadata by calling
store()/load() methods of the policy with ready-to-write/read file
descriptor or an equivalent object. When storing metadata the policy can
use information in cache_dir index to minimize the amount of storage. A
simple mechanism (e.g., a single timestamp for both cache_dir and and
policy data) can be added to ensure that two stored objects are in sync.
(See code sample below)

If the policy is replaced or cannot read its metadata for some other
reason, the policy metadata is restored by cache_dir using add() method.

The idea is that every object should maintain its own metadata. It is
better not to think of that metadata as of an "index" or any other
particular structure. It is up to the cache_dir and policy objects what
kind of structures to maintain (as long as they satisfy the API).

"Abstraction" does not give you much without "encapsulation".

> The way I have inteded it there will only be one persistent index, and
> it keeps both things. The policy tells the FS the order info while
> writing the index, and the FS loads the index and inserts it into both
> the store index and the policy.

I understand. IMHO, your scheme will work for todays needs but is
unnecessary complex and less flexible than the alternative. It
works, but violates basic OO design principle so it is likely to break
or cause headaches later.

Of course, you can always rewrite it again. However, keep in mind that
by providing a well-defined API, you encourage folks to develop
different policies and file systems. This is very different from the
current situation when even the primary authors of the code are having
trouble modifying it! With the API in place, before you know it,
rewriting the API would cost you too much. That is why thinking ahead
and obeying basic principles may be a good idea.

> Ok. I see where this are leading, but I am still not sure on how to
> connect the two together if the loading/storing of FS and policy index
> are separated from each other.

Simplifying a bit (e.g., policy data should be surrounded by markers
with size so that we can recover if policy fails to load; fd can be
replaces with Packer or equivalent):

     FS::store() {
        const int fd = open(fs_metadata_file_name); // or seek
        ... /* store fs metadata */ ...

        assert(policy);
        assert(policy->fs() == this); // paranoid
        policy->store(fd);

...
}

    FS::load() {
        const int fd = open(fs_metadata_file_name); // or seek
        ... /* load fs metadata */ ...

        assert(policy);
        assert(policy->fs() == this); // paranoid
        if (!policy->load(fd))
            rebuildPolicy();
        ...
    }

> With a greatly added risk of inconsistency. Policy references to
> non-existing objects are easy to handle, but store objects without
> policy references is a bit harder..

If each store/load operation is atomic, there should be no
inconsistencies. When loading/storing fs objects, they should be removed
from the global "Cache". That way, there is not need to lock anything,
and no updates would intervene. Sort of like umount operation before you
run fsck. I cannot think of a reasonable situation when one must have fs
available during load/store. A cache may loose a few hits, but nothing
major to worry about.

> That is fine, and I fully agree with you. However, I have to live with
> how it works at least for the moment. It is not very fruitful to design
> an API which cannot be implemented without first redesigning most else
> in Squid. If your design reasoning is applied recursively then mostly
> everything in Squid needs to be redesigned before anything can get
> implemented, and you are most likely better off starting a new project
> from scratch where you don't have that amount of old luggage to care
> for.

This should be yours and other Squid developers decision: spend X hours
now, and then 4*X hours later when a "complete rewrite" is needed. Or
spend 2*X hours now, rewriting whatever needs to be rewritten. Just keep
in mind, again, that the longer you wait, the more stuff will needs to
be fixed, and at a greater cost.

I am in a convenient and ugly position of giving abstract advice without
being able to actually contribute code.

> It must be called in the cache_dir creation phase, or there is nothing
> to connect the policy to. Each cache_dir is responsible for the parsing
> of that line. The policy user is the cache_dir, and is is the cache_dir
> (or "store FS") who is the parser.

Integrating parsing capabilities into a FS object may be OK. I tried
isolating parsing from "functioning" in Polygraph with partial success
(so I have FS and FS_Parser objects where FS_Parser does not have access
to the FS type). The parser may need to know the parsing context and has
nothing to do with actual functioning of an object being configured. So
isolating the two may be appropriate.

> > It may be better to allow users to configure FS with a pre-created
> > policy. The design where the policy can be created internally only is
> > excessively rigid, IMO.
>
> We have selected to have policies on a per cache_dir basis only. It you
> have objections to this then say so.

I do not object to having policies attached to cache_dirs. I am just
thinking that if cache_dir can be configured with a pre-created policy,
you get more flexibility at no cost.

    parse policy configuration
    create a policy object of the specified type (call constructorTn)
    configure the policy with the parsed policy configuration

    parse fs configuration
    create a fs object of appropriate type (call constructorTm )
    configure the fs with the parsed fs configuration

configure the fs with the policy

Alex.
Received on Sat Apr 29 2000 - 09:37:22 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:25 MST