Re: Hot object cache from Henrik Nordström on 2002-10-21 (squid-dev)

From: Henrik Nordström <hno@dont-contact.us>
Date: Mon, 21 Oct 2002 10:53:17 +0200 (CEST)

On 21 Oct 2002, Robert Collins wrote:

> > Having the assumtion that a FS layer must be able to swap in data while it
> > the same is beeing swapped out is not a trivial assumtion and quite likely
> > to be error prone.
>
> Actually, it's quite simple if you layer it right. The only assumption
> I'm making is that data that has been swapped out can be retrieved -
> which seems reasonable to me :}.

And not to me. An object currently beeing swapped out may have data in
various internal buffers etc. If your remove the fencing from the store
layer you will basically end up requiring another fencing to be
implemented in each FS store.

> > In this we also need to consider what happens when
> > index management is moved down to the FS layer as an integral part of the
> > on-disk storage.
>
> Yes, and we can deal with that when those changes are ready. IMO this
> will not impact the index issues. The storeSwapper will not be part of
> the fs layer or the store itself.

The move of index significantly changes what assumtions you can make about
the store. As adrian says the move of indexing down to the FS layer makes
it the FS responsibility to decide when an object will become visible for
swapins.

> > Saving the merged object as a new StoreEntry makes cache transactions much
> > more well defined. If we later find that this generates a considerable
> > amount of copying then lets try to address "in-place merges" then, but
> > before we do index management should have been moved down to the FS layer.
>
> Why? This is orthogonal.

Is it?

How are you planning on merging two ranges?

> > So no, I still do not see why range merging would need to allow more than
> > one client per active retrieval.
>
> Please define active retrieval.

Where data is currently beeing retreived from a protocol.

> I don't have a specific view of StoreEntry. It is very over used in
> squid, and that is one of the things I am refactoring, making it clearer
> what is a StoreEntry, what is a store client, what is a mem object etc.
>
> I think we need something that is returned when a client gets data about
> a cached object. THAT thing may as well be called StoreEntry.

Fine.

And I say it again in other words, do NOT violate the current assumtion
that once an object has been cached it won't be changed again. If you want
to change an object, create a new one.

> This requires the following logic:
> something in front of the two requests to iterate through both objects
> (missing data and current data(
> It should then save the result to a new storeEntry right?

Right.

> It must also grab *all* the data from the current store entry and copy
> it across.

Right.

> It will also race when multiple clients do this to the same object.

No, it won't. At worst it will cause both clients to receive the same data
and each store it into a separate new object.

> I really don't like this design. It seems a kludge to use what we've
> got, rather than putting something clean in place.

Yes, it is likely to change later when the FS layer is allowed to evolve
to be able to handle ranges. But doing anything more advanced until FS
layers can handle merging of ranges is not a good approach, and you will
still need the copying mode for FS layers who cannot handle merging of
ranges into the same object.

By starting out with a copy mode you will allow for properties of Range to
be explored without having to make assumptions on the capabilities of the
on-disk store.

And no, the data copying involved do not worry me at this time.

Both things I have mentioned in this thread

- No request merging

- Range merging by copying

Aim at reducing the amount of assumptions made about the store at this
time, to allow the store and indexing to be refactored in a simpler
environment.

I am not convinced we need request merging (to allow more than one client
to read the same object while it is being retreived) if range merging is
found to work well. We might possibly need it for reattaching to a object
still being retreived by delayed abort after the client have aborted it's
connection, but if range merging is found to work well I think we can rely
on that instead of delayed aborts. We have discussed this earlier, and I
still have the same view of things..

> > With this I propose that the point where objects are made public to other
> > clients is moved from "this reply seems to be cacheable" to "this reply
> > has been stored in full", and that we need to redefine how "delayed
> > aborts" is handled (these are kind of special in any event..)
>
> Agreed. Well, rather than "stored in full" I would say "is definately
> cacheable".

With the drawbacks that you must have support for request joining AND will
still have problems with stalled requests (less of a problem if delayed
aborts is disabled, but still there)

> Mm. This is a very simple thing to handle without needing another store
> client type - simply allowing the store swap logic to query what is
> available, and whether the object is still being appended to will allow
> that. The API already does the rest,

Does it?

> > I think it is better to define another form of "store client" for this
> > purpose, where data semantically is pushed to the client rather than
> > retrieved by the client. Same mechanism should probably be used by both
> > the on-disk and hot object caches.
>
> I don't think we need such a 'push client'. We can always add one if a
> pull client proves unable to do the job, but IMO we should try first.
> Either way there will be less code than there is now.

Perhaps, but you do recognise that the two (real client and swapout
"client") are quite different don't you?

Regards
Henrik
Received on Mon Oct 21 2002 - 02:53:21 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:16:58 MST