Re: Squid store replacement policies

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Wed, 26 Apr 2000 22:56:44 +0200

Alex Rousskov wrote:

> ... at an expense of modifying all already implemented (by that time)
> policies (unless you plan on supporting some inheritance primitives in
> your C++ emulation approach, but it does not look like you want to go
> that far).

Rather minor issue. Will most likely have to do that a couple of times
anyway.

> My API design rule-of-thumb is to plan one or two steps ahead, but no
> further. :)

I try to plan so I can later take two or three steps ahead without doing
a complete redesign. I don't agree on adding things which isn't
neccesary. In this API design model backwards compatibility is quite
easy to maintain as it is clearly wisible which methods that are not
supported in a certain instance (will be NULL pointers).

> If the sole purpose of the iterator is to save the policy "index" (if
> any!) and later load it, then it should be up to the policy how to do
> it:
> policy->store(fd)
> policy->load(fd)
> where the file descriptor is maintained by the file system module that
> owns the policy. No need for a public iterator interface.

Might be doable, but I prefer to abstract how the index actually is
stored from the policy. Why:

a) The method of the actual storage depends on the storage type/fs being
used.

b) The user might want to switch between different policies while
preserving the cache. The cost of such a switch should be no more than
the loss of the policy order.

And right now I'll select to only make a clean cut of the policy code
based on what we have today, and that calls for both a purge and a index
walker.

Also, from a restart performance point of view it might be required to
interleave the clean index loading or storing of several policies "at
the same time".

However, I do have an open task for finding out how to best handle the
persistent indexes and store "transactions", and this might well change
the requirements somewhat.

> Finally, if the stored policy "index" is lost or corrupted (e.g., load()
> returns false), the policy can be rebuild from scratch using already
> available add() method.

If you can find the objects somehow else yes.

> Note that with this store/load design [and no purge walker] we
> completely remove the assumption/requirement that the policy has some
> kind of well-ordered "index" as meta-data. We simply leave it up to the
> policy to decide on this internal matters (which is good because the
> rest of the code has more important things to deal with).

Since we already have the "rest of the code", the situation is perhaps a
bit different than it would be if everything vere written from
scratch...

> > The global "store" index should also have a
> > iterator which would be used in digest generation and such activities.
>
> Great! And, if you agree with the above, that global iterator will have
> nothing to do with the removal policies (and it should not, in general).

The removal policy and the global index has nothing to do with each
other other than that the "filesystem" (or hot object cache when figured
out how to handle this) will connect both together.

> I see no _interface_ differences between "hot object cache" removal
> policy and "file system" removal policy. "Hot object cache" is just
> another "file system" and should have the same interface.

Problem is that it isn't. The "hot object cache" is actually a shadow
cache. A single object can exist both in the "hot object cache" and in a
disk store at the same time, with the same identity (the StoreEntry).

The "hot object cache" policy indexes MemoryObject structures
(StoreEntry->mem_obj), while the "disk object cache" indexes objects
(StoreEntries). What is removed when you purge the "hot object cache" is
only the MemoryObject, not the whole StoreEntry. I do not feel
particulary happy about breaking all this up only because we are
abstracting the removal policy.

> Only "Cache/Store" object would know that it is a "special" volatile file
> system that has "close" copies of objects that could be stored
> elsewhere. The only somewhat "interesting" problem is to allow for one
> object to reside in both "disk" and "memory" file systems. A separate
> "global" index for memory with unique filenos is one solution.

Having two or more stores is not the big issue. The issue is in how to
best connect these together in an efficient manner without races.

> Sometime in the future, we might have more than one "Hot cache" even.
> For example, a hot cache for HTTP objects and a hot cache for streaming
> media objects or whatever...

No difference.

> The one-for-all createPolicy(type) function is needed only when we parse
> squid.conf. That piece of code has nothing to do with policy API
> (policies do not know/care if it exists or not). IT will most likely
> look something like
>
> if (type == "T1")
> return createT1Policy();
> if (type == "T2")
> return createT2Policy();

Ok. I am not arguing with this. Neither is the API definition. I have
only defined this generic function, not how the policies registers
themselves for creation. I'll try to document this part as well.

> We only should require that a policy implements a "constructor" method
> with whatever parameters it feels necessary.

Look at the API definition again. The createPolicy is a global function,
not a method on a individual policy.

> This function is just fine inside a squid.conf parser, but it is not a
> part of the API. API is for people who write policies or use policies.
> This function does not implement any part of a given policy. Each policy
> must provide its own "create" constructor. Whoever writes squid.conf
> parser will use those constructors to create user-specified policies.

The squid.conf parser in this case is the filesystem code parsing the
cache_dir arguments.

So:

Yes, it is part of the API for policies.

No, it isn't part of the API implementaiton for A policy.

> Fine, although I think it would be a more flexible design if a policy
> can return more than one victim at a time. It will noticeably simplify
> implementation when a single FS can work with several policies, some of
> which cluster and some do not.

At the cost of complicating the implementation of all FS:es.

> 7. We should add a "policy->type()" method that returns the type
> of the policy so that FS can check compatibility and do ugly
> casts.

Maybe, however it is not needed at this time. The FS will know the type
as it created the policy in the first place, second I don't see what
kinds of ugly casts you are talking about here.

Or are you talking about defining some more abstract compatibility
types? I don't see how we could do that at this time as we do not know
the requirements of such compatibility types.. most likely we will not
be able to find these requirements, and the FS:es has to rely on other
methods anyway.

> 8. We may need to add a "policy->fileSystem(fs) method that tells
> the policy that is now being adopted by the specified fs.
> This method should be called only once and only with a
> non-null fs parameter.

Why?

/Henrik
Received on Wed Apr 26 2000 - 15:41:19 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:25 MST