Re: Hot object cache

From: Henrik Nordström <hno@dont-contact.us>
Date: Mon, 21 Oct 2002 03:21:37 +0200 (CEST)

On 21 Oct 2002, Robert Collins wrote:

> > Corner example case:
> >
> > Server object is 1GB large, content-length unknown. When 20
> > KB has been retreived we get another request for the same object.
> >
> > First client is a modem user downloading at about 5K /second. The second
> > client is a DSL or other high speed client downloading at a significantly
> > higher rate (also consider opposite situation)
> >
> > We do not want to cache objects larger than 20MB.
> >
> >
> > How do we handle these two requests?
>
> This one *seems* trivial. storeGetPublic should not find an object that
> is not cachable. So the second client will get a new storeEntry.

Not today.. these objects start out as cacheable today, and only when it
becomes known the size is too large the new object is marked uncacheable.
This puts us into all kinds of problems if there is more than one client
attached when we later find out we can not / do not want to cache the
object..

You also have another corner case if the store aborts storage of the
object due to overload (or less likely I/O error).

I have another goal here that I think is important: Implementation of a
store I/O layer should be simple.

Having the assumtion that a FS layer must be able to swap in data while it
the same is beeing swapped out is not a trivial assumtion and quite likely
to be error prone. In this we also need to consider what happens when
index management is moved down to the FS layer as an integral part of the
on-disk storage.

> > And I do not see at all how the single/multi client decision relates to
> > ranges, neither either processing or merging..
>
> It relates to merging.
> If we have a set X that contains portions of object foo, in cache. Then
> requests for object foo will use the same storeEntry.

Maybe, but I would propose merging into a new StoreEntry for start..
Saving the merged object as a new StoreEntry makes cache transactions much
more well defined. If we later find that this generates a considerable
amount of copying then lets try to address "in-place merges" then, but
before we do index management should have been moved down to the FS layer.

So no, I still do not see why range merging would need to allow more than
one client per active retrieval.

> Now, if we have a single client logic, then we cannot satisfy more than
> one client from the cache, if data is missing. A multi client logic
> allows us to do that.

I think we may have some confusion on exacly what a "StoreEntry" is or
should be.

Please define your view of a "StoreEntry".

My view of a "StoreEntry" is "a active object". The fact that we use
"StoreEntry" for the in-core index I see as an artefact of the current
index design, and not a long term goal. In fact I would even prefer if we
got rid of "StoreEntry".

Without request joining support each client would use the previously
cached data and then request the missing pieces as needed to satisfy their
request. If both requests are successful then both will generate a cache
update, and if such updates is done into new objects then only one of them
will stay in the cache.

With this I propose that the point where objects are made public to other
clients is moved from "this reply seems to be cacheable" to "this reply
has been stored in full", and that we need to redefine how "delayed
aborts" is handled (these are kind of special in any event..)

> Ok. I'll enlarge on this this afternoon. Basic detail is:
> Remove all current storeSwapout API calls.
> Create a new API call i.e.: StoreSwapper *createSwapper(StoreEntry &,
> HttpReply &, request_t &).
> This call will create an object that will:
> Check the swapout policy and do nothing if the object is not to be disk
> cached. (which includes having range data for now).
> Otherwise:
> Start reading the object at offset 0, and will sequentially read through
> and swap out until the object is:
> * Aborted (remove disk object (for now)).
> * EOF is reached.

Ok, but I am not sure I agree on the desing principle of having swapout
via a second store client. I Don't really like having swapout store
clients just as any other client. Such "swapout" clients are different in
that they should not cause any retreivals, and should get aborted if the
"last" real client aborts.. (delayed aborts is another question.. no
matter what is done these requires special attention if supported..)

I think it is better to define another form of "store client" for this
purpose, where data semantically is pushed to the client rather than
retrieved by the client. Same mechanism should probably be used by both
the on-disk and hot object caches.

Regards
Henrik
Received on Sun Oct 20 2002 - 19:21:40 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:16:58 MST